0: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 5: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 12: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 0: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 5: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 0: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 0: START 2058110: Thu Nov 24 17:04:50 EET 2022 5: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 5: START 2058110: Thu Nov 24 17:04:50 EET 2022 12: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 12: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 12: START 2058110: Thu Nov 24 17:04:50 EET 2022 11: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 6: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 23: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 26: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 3: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 14: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 19: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 15: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 11: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 11: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 11: START 2058110: Thu Nov 24 17:04:50 EET 2022 7: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 29: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 31: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 2: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 22: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 24: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 10: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 4: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 8: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 16: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 25: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 30: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 9: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 17: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 28: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 27: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 21: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 18: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 13: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 1: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 23: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 23: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 23: START 2058110: Thu Nov 24 17:04:50 EET 2022 6: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 6: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 6: START 2058110: Thu Nov 24 17:04:50 EET 2022 20: Model parameters: d_model 640 ffw_size 2560 kv_size 64 n_heads 10 n_layers 10 3: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 3: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 3: START 2058110: Thu Nov 24 17:04:50 EET 2022 14: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 14: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 14: START 2058110: Thu Nov 24 17:04:50 EET 2022 19: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 19: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 19: START 2058110: Thu Nov 24 17:04:50 EET 2022 26: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 26: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 26: START 2058110: Thu Nov 24 17:04:50 EET 2022 15: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 15: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 15: START 2058110: Thu Nov 24 17:04:50 EET 2022 7: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 7: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 7: START 2058110: Thu Nov 24 17:04:50 EET 2022 24: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 24: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 24: START 2058110: Thu Nov 24 17:04:50 EET 2022 10: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 10: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 16: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 10: START 2058110: Thu Nov 24 17:04:50 EET 2022 16: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 16: START 2058110: Thu Nov 24 17:04:50 EET 2022 0: 0: 0: ======================= ROCm System Management Interface ======================= 0: ================================= Concise Info ================================= 0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0: 0 41.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: 2 38.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: 4 36.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: 6 36.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: ================================================================================ 0: ============================= End of ROCm SMI Log ============================== 5: 5: 5: ======================= ROCm System Management Interface ======================= 5: ================================= Concise Info ================================= 5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 5: 0 45.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: 2 45.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 3 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: 4 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: 6 42.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: ================================================================================ 5: ============================= End of ROCm SMI Log ============================== 12: 12: 12: ======================= ROCm System Management Interface ======================= 12: ================================= Concise Info ================================= 12: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 12: 0 45.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 12: 1 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 12: 2 43.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 12: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 12: 4 43.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 12: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 12: 6 45.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 12: 7 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 12: ================================================================================ 12: ============================= End of ROCm SMI Log ============================== 11: 11: 11: ======================= ROCm System Management Interface ======================= 11: ================================= Concise Info ================================= 11: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 11: 0 44.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 11: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 11: 2 40.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 11: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 11: 4 45.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 11: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 11: 6 37.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 11: 7 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 11: ================================================================================ 11: ============================= End of ROCm SMI Log ============================== 4: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 4: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 4: START 2058110: Thu Nov 24 17:04:51 EET 2022 25: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 25: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 25: START 2058110: Thu Nov 24 17:04:51 EET 2022 29: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 29: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 29: START 2058110: Thu Nov 24 17:04:51 EET 2022 8: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 8: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 2: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 2: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 8: START 2058110: Thu Nov 24 17:04:51 EET 2022 2: START 2058110: Thu Nov 24 17:04:51 EET 2022 30: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 30: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 30: START 2058110: Thu Nov 24 17:04:51 EET 2022 9: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 9: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 9: START 2058110: Thu Nov 24 17:04:51 EET 2022 22: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 22: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 22: START 2058110: Thu Nov 24 17:04:51 EET 2022 17: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 17: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 31: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 31: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 17: START 2058110: Thu Nov 24 17:04:51 EET 2022 31: START 2058110: Thu Nov 24 17:04:51 EET 2022 28: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 28: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 27: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 28: START 2058110: Thu Nov 24 17:04:51 EET 2022 27: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 27: START 2058110: Thu Nov 24 17:04:51 EET 2022 21: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 21: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 21: START 2058110: Thu Nov 24 17:04:51 EET 2022 18: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 18: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 18: START 2058110: Thu Nov 24 17:04:51 EET 2022 13: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 13: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 13: START 2058110: Thu Nov 24 17:04:51 EET 2022 1: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 1: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 1: START 2058110: Thu Nov 24 17:04:51 EET 2022 20: Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 10 --hidden-size 640 --num-attention-heads 10 --kv-channels 64 --ffn-hidden-size 2560 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 9_703_701 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-1 --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 9_703_701 --lr-warmup-samples 0 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_83m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_83m --load checkpoints_83m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gp 20: t2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2058110.json --zero-stage 0 20: START 2058110: Thu Nov 24 17:04:51 EET 2022 23: 23: 23: ======================= ROCm System Management Interface ======================= 23: ================================= Concise Info ================================= 23: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 23: 0 47.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 23: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 23: 2 43.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 23: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 23: 4 38.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 23: 5 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 23: 6 39.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 23: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 23: ================================================================================ 23: ============================= End of ROCm SMI Log ============================== 6: 6: 6: ======================= ROCm System Management Interface ======================= 6: ================================= Concise Info ================================= 6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 6: 0 44.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: 2 42.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: 4 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: 6 38.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 7 38.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: ================================================================================ 6: ============================= End of ROCm SMI Log ============================== 3: 3: 3: ======================= ROCm System Management Interface ======================= 3: ================================= Concise Info ================================= 3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 3: 0 45.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: 2 41.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: 4 40.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: 6 45.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: ================================================================================ 3: ============================= End of ROCm SMI Log ============================== 14: 14: 14: ======================= ROCm System Management Interface ======================= 14: ================================= Concise Info ================================= 14: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 14: 0 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 14: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 14: 2 42.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 14: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 14: 4 37.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 14: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 14: 6 42.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 14: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 14: ================================================================================ 14: ============================= End of ROCm SMI Log ============================== 19: 19: 19: ======================= ROCm System Management Interface ======================= 19: ================================= Concise Info ================================= 19: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 19: 0 46.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 19: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 19: 2 44.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 19: 3 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 19: 4 43.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 19: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 19: 6 42.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 19: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 19: ================================================================================ 19: ============================= End of ROCm SMI Log ============================== 26: 26: 26: ======================= ROCm System Management Interface ======================= 26: ================================= Concise Info ================================= 26: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 26: 0 50.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 26: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 26: 2 40.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 26: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 26: 4 44.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 26: 5 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 26: 6 38.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 26: 7 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 26: ================================================================================ 26: ============================= End of ROCm SMI Log ============================== 15: 15: 15: ======================= ROCm System Management Interface ======================= 15: ================================= Concise Info ================================= 15: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 15: 0 41.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 15: 1 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 15: 2 39.0c 100.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 15: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 15: 4 44.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 15: 5 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 15: 6 39.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 15: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 15: ================================================================================ 15: ============================= End of ROCm SMI Log ============================== 7: 7: 7: ======================= ROCm System Management Interface ======================= 7: ================================= Concise Info ================================= 7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 7: 0 49.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: 2 41.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: 4 48.0c 80.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: 6 39.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: ================================================================================ 7: ============================= End of ROCm SMI Log ============================== 10: 10: 10: ======================= ROCm System Management Interface ======================= 10: ================================= Concise Info ================================= 10: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 10: 0 42.0c 98.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 10: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 10: 2 42.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 10: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 10: 4 45.0c 81.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 10: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 10: 6 40.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 10: 7 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 10: ================================================================================ 10: ============================= End of ROCm SMI Log ============================== 24: 24: 24: ======================= ROCm System Management Interface ======================= 24: ================================= Concise Info ================================= 24: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 24: 0 43.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 24: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 24: 2 39.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 24: 3 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 24: 4 42.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 24: 5 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 24: 6 44.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 24: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 24: ================================================================================ 24: ============================= End of ROCm SMI Log ============================== 16: 16: 16: ======================= ROCm System Management Interface ======================= 16: ================================= Concise Info ================================= 16: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 16: 0 39.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 16: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 16: 2 33.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 16: 3 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 16: 4 46.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 16: 5 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 16: 6 45.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 16: 7 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 16: ================================================================================ 16: ============================= End of ROCm SMI Log ============================== 4: 4: 4: ======================= ROCm System Management Interface ======================= 4: ================================= Concise Info ================================= 4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 4: 0 40.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: 2 38.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 3 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: 4 42.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: 6 40.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: ================================================================================ 4: ============================= End of ROCm SMI Log ============================== 29: 29: 29: ======================= ROCm System Management Interface ======================= 29: ================================= Concise Info ================================= 29: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 29: 0 49.0c 98.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 29: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 29: 2 38.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 29: 3 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 29: 4 40.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 29: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 29: 6 38.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 29: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 29: ================================================================================ 29: ============================= End of ROCm SMI Log ============================== 25: 25: 25: ======================= ROCm System Management Interface ======================= 25: ================================= Concise Info ================================= 25: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 25: 0 44.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 25: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 25: 2 42.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 25: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 25: 4 47.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 25: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 25: 6 39.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 25: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 25: ================================================================================ 25: ============================= End of ROCm SMI Log ============================== 8: 8: 8: ======================= ROCm System Management Interface ======================= 8: ================================= Concise Info ================================= 8: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 8: 0 42.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 8: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 8: 2 42.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 8: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 8: 4 38.0c 100.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 8: 5 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 8: 6 41.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 8: 7 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 8: ================================================================================ 8: ============================= End of ROCm SMI Log ============================== 2: 2: 2: ======================= ROCm System Management Interface ======================= 2: ================================= Concise Info ================================= 2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 2: 0 41.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 1 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: 2 37.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: 4 42.0c 98.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: 6 43.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: ================================================================================ 2: ============================= End of ROCm SMI Log ============================== 30: 30: 30: ======================= ROCm System Management Interface ======================= 30: ================================= Concise Info ================================= 30: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 30: 0 44.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 30: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 30: 2 41.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 30: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 30: 4 40.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 30: 5 37.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 30: 6 38.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 30: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 30: ================================================================================ 30: ============================= End of ROCm SMI Log ============================== 9: 9: 9: ======================= ROCm System Management Interface ======================= 9: ================================= Concise Info ================================= 9: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 9: 0 46.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 9: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 9: 2 41.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 9: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 9: 4 44.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 9: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 9: 6 47.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 9: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 9: ================================================================================ 9: ============================= End of ROCm SMI Log ============================== 22: 22: 22: ======================= ROCm System Management Interface ======================= 22: ================================= Concise Info ================================= 22: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 22: 0 47.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 22: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 22: 2 45.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 22: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 22: 4 45.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 22: 5 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 22: 6 41.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 22: 7 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 22: ================================================================================ 22: ============================= End of ROCm SMI Log ============================== 31: 31: 31: ======================= ROCm System Management Interface ======================= 31: ================================= Concise Info ================================= 31: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 31: 0 45.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 31: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 31: 2 41.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 31: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 31: 4 44.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 31: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 31: 6 41.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 31: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 31: ================================================================================ 31: ============================= End of ROCm SMI Log ============================== 17: 17: 17: ======================= ROCm System Management Interface ======================= 17: ================================= Concise Info ================================= 17: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 17: 0 45.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 17: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 17: 2 40.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 17: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 17: 4 45.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 17: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 17: 6 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 17: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 17: ================================================================================ 17: ============================= End of ROCm SMI Log ============================== 27: 27: 27: ======================= ROCm System Management Interface ======================= 27: ================================= Concise Info ================================= 27: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 27: 0 43.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 27: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 27: 2 42.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 27: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 27: 4 39.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 27: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 27: 6 49.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 27: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 27: ================================================================================ 27: ============================= End of ROCm SMI Log ============================== 28: 28: 28: ======================= ROCm System Management Interface ======================= 28: ================================= Concise Info ================================= 28: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 28: 0 43.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 28: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 28: 2 40.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 28: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 28: 4 43.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 28: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 28: 6 39.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 28: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 28: ================================================================================ 28: ============================= End of ROCm SMI Log ============================== 21: 21: 21: ======================= ROCm System Management Interface ======================= 21: ================================= Concise Info ================================= 21: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 21: 0 45.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 21: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 21: 2 35.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 21: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 21: 4 43.0c 98.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 21: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 21: 6 43.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 21: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 21: ================================================================================ 21: ============================= End of ROCm SMI Log ============================== 18: 18: 18: ======================= ROCm System Management Interface ======================= 18: ================================= Concise Info ================================= 18: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 18: 0 41.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 18: 1 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 18: 2 43.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 18: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 18: 4 43.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 18: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 18: 6 46.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 18: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 18: ================================================================================ 18: ============================= End of ROCm SMI Log ============================== 13: 13: 13: ======================= ROCm System Management Interface ======================= 13: ================================= Concise Info ================================= 13: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 13: 0 44.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 13: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 13: 2 38.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 13: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 13: 4 40.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 13: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 13: 6 37.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 13: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 13: ================================================================================ 13: ============================= End of ROCm SMI Log ============================== 1: 1: 1: ======================= ROCm System Management Interface ======================= 1: ================================= Concise Info ================================= 1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 1: 0 43.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 1 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: 2 40.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 3 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: 4 40.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 5 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: 6 39.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: ================================================================================ 1: ============================= End of ROCm SMI Log ============================== 20: 20: 20: ======================= ROCm System Management Interface ======================= 20: ================================= Concise Info ================================= 20: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 20: 0 45.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 20: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 20: 2 41.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 20: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 20: 4 45.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 20: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 20: 6 44.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 20: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 20: ================================================================================ 20: ============================= End of ROCm SMI Log ============================== 5: Launching on nid005365 (5/32), master nid005360 port 9999, GPUs 8, CUDA: True 0: Launching on nid005360 (0/32), master nid005360 port 9999, GPUs 8, CUDA: True 23: Launching on nid005415 (23/32), master nid005360 port 9999, GPUs 8, CUDA: True 6: Launching on nid005366 (6/32), master nid005360 port 9999, GPUs 8, CUDA: True 15: Launching on nid005396 (15/32), master nid005360 port 9999, GPUs 8, CUDA: True 27: Launching on nid005419 (27/32), master nid005360 port 9999, GPUs 8, CUDA: True 10: Launching on nid005370 (10/32), master nid005360 port 9999, GPUs 8, CUDA: True 26: Launching on nid005418 (26/32), master nid005360 port 9999, GPUs 8, CUDA: True 3: Launching on nid005363 (3/32), master nid005360 port 9999, GPUs 8, CUDA: True 19: Launching on nid005411 (19/32), master nid005360 port 9999, GPUs 8, CUDA: True 8: Launching on nid005368 (8/32), master nid005360 port 9999, GPUs 8, CUDA: True 2: Launching on nid005362 (2/32), master nid005360 port 9999, GPUs 8, CUDA: True 17: Launching on nid005398 (17/32), master nid005360 port 9999, GPUs 8, CUDA: True 18: Launching on nid005399 (18/32), master nid005360 port 9999, GPUs 8, CUDA: True 28: Launching on nid005420 (28/32), master nid005360 port 9999, GPUs 8, CUDA: True 4: Launching on nid005364 (4/32), master nid005360 port 9999, GPUs 8, CUDA: True 12: Launching on nid005372 (12/32), master nid005360 port 9999, GPUs 8, CUDA: True 30: Launching on nid005422 (30/32), master nid005360 port 9999, GPUs 8, CUDA: True 20: Launching on nid005412 (20/32), master nid005360 port 9999, GPUs 8, CUDA: True 1: Launching on nid005361 (1/32), master nid005360 port 9999, GPUs 8, CUDA: True 25: Launching on nid005417 (25/32), master nid005360 port 9999, GPUs 8, CUDA: True 14: Launching on nid005395 (14/32), master nid005360 port 9999, GPUs 8, CUDA: True 7: Launching on nid005367 (7/32), master nid005360 port 9999, GPUs 8, CUDA: True 31: Launching on nid005423 (31/32), master nid005360 port 9999, GPUs 8, CUDA: True 29: Launching on nid005421 (29/32), master nid005360 port 9999, GPUs 8, CUDA: True 24: Launching on nid005416 (24/32), master nid005360 port 9999, GPUs 8, CUDA: True 16: Launching on nid005397 (16/32), master nid005360 port 9999, GPUs 8, CUDA: True 13: Launching on nid005373 (13/32), master nid005360 port 9999, GPUs 8, CUDA: True 11: Launching on nid005371 (11/32), master nid005360 port 9999, GPUs 8, CUDA: True 22: Launching on nid005414 (22/32), master nid005360 port 9999, GPUs 8, CUDA: True 21: Launching on nid005413 (21/32), master nid005360 port 9999, GPUs 8, CUDA: True 9: Launching on nid005369 (9/32), master nid005360 port 9999, GPUs 8, CUDA: True 0: using world size: 256, data-parallel-size: 256, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. 0: using torch.bfloat16 for parameters ... 0: ------------------------ arguments ------------------------ 0: abort_on_unmet_fused_kernel_constraints ......... False 0: accumulate_allreduce_grads_in_fp32 .............. True 0: adam_beta1 ...................................... 0.9 0: adam_beta2 ...................................... 0.999 0: adam_eps ........................................ 1e-08 0: adlr_autoresume ................................. False 0: adlr_autoresume_interval ........................ 1000 0: apply_query_key_layer_scaling ................... True 0: apply_residual_connection_post_layernorm ........ False 0: attention_dropout ............................... 0.1 0: attention_softmax_in_fp32 ....................... False 0: bert_binary_head ................................ True 0: bert_load ....................................... None 0: bf16 ............................................ True 0: bias_dropout_fusion ............................. True 0: bias_gelu_fusion ................................ True 0: biencoder_projection_dim ........................ 0 0: biencoder_shared_query_context_model ............ False 0: block_data_path ................................. None 0: checkpoint_activations .......................... True 0: checkpoint_in_cpu ............................... False 0: checkpoint_num_layers ........................... 1 0: clip_grad ....................................... 1.0 0: codecarbon_dir .................................. None 0: consumed_train_samples .......................... 0 0: consumed_train_tokens ........................... 0 0: consumed_valid_samples .......................... 0 0: contigious_checkpointing ........................ False 0: cpu_optimizer ................................... False 0: cpu_torch_adam .................................. False 0: curriculum_learning ............................. False 0: data_impl ....................................... mmap 0: data_parallel_size .............................. 256 0: data_path ....................................... ['/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document'] 0: dataloader_type ................................. single 0: DDP_impl ........................................ local 0: decoder_seq_length .............................. None 0: deepscale ....................................... False 0: deepscale_config ................................ None 0: deepspeed ....................................... True 0: deepspeed_activation_checkpointing .............. False 0: deepspeed_config ................................ ds_configs/2058110.json 0: deepspeed_mpi ................................... False 0: distribute_checkpointed_activations ............. False 0: distributed_backend ............................. nccl 0: embed_layernorm ................................. False 0: embedding_path .................................. None 0: encoder_seq_length .............................. 2048 0: eod_mask_loss ................................... False 0: eval_interval ................................... 1000 0: eval_iters ...................................... 1 0: eval_only ....................................... None 0: evidence_data_path .............................. None 0: exit_duration_in_mins ........................... None 0: exit_interval ................................... None 0: ffn_hidden_size ................................. 2560 0: finetune ........................................ False 0: fp16 ............................................ False 0: fp16_lm_cross_entropy ........................... False 0: fp32_residual_connection ........................ False 0: gigaflos_no_embeds .............................. 0 0: global_batch_size ............................... 256 0: glu_activation .................................. None 0: hidden_dropout .................................. 0.1 0: hidden_size ..................................... 640 0: hysteresis ...................................... 2 0: ict_head_size ................................... None 0: ict_load ........................................ None 0: img_dim ......................................... 224 0: indexer_batch_size .............................. 128 0: indexer_log_interval ............................ 1000 0: inference ....................................... False 0: init_method_std ................................. 0.02 0: init_method_xavier_uniform ...................... False 0: initial_loss_scale .............................. 4294967296 0: kill_switch_path ................................ kill-switch-1 0: kv_channels ..................................... 64 0: layer_norm_fusion ............................... True 0: layernorm_epsilon ............................... 1e-05 0: lazy_mpu_init ................................... None 0: load ............................................ checkpoints_83m 0: local_rank ...................................... None 0: log_batch_size_to_tensorboard ................... True 0: log_interval .................................... 10 0: log_learning_rate_to_tensorboard ................ True 0: log_level ....................................... None 0: log_level_replica ............................... None 0: log_loss_scale_to_tensorboard ................... True 0: log_num_zeros_in_grad ........................... False 0: log_params_norm ................................. False 0: log_path ........................................ None 0: log_timers_to_tensorboard ....................... True 0: log_validation_ppl_to_tensorboard ............... True 0: loss_on_targets_only ............................ False 0: loss_scale ...................................... 12.0 0: loss_scale_window ............................... 1000 0: lr .............................................. 0.0002 0: lr_decay_iters .................................. None 0: lr_decay_samples ................................ 9703701 0: lr_decay_style .................................. cosine 0: lr_decay_tokens ................................. None 0: lr_warmup_fraction .............................. None 0: lr_warmup_iters ................................. 0 0: lr_warmup_samples ............................... 0 0: make_vocab_size_divisible_by .................... 128 0: mask_prob ....................................... 0.15 0: masked_softmax_fusion ........................... True 0: max_position_embeddings ......................... 2048 0: mean_noise_span_length .......................... None 0: memory_centric_tiled_linear ..................... False 0: merge_file ...................................... gpt2/merges.txt 0: micro_batch_size ................................ 1 0: min_loss_scale .................................. 1.0 0: min_lr .......................................... 2e-05 0: mmap_warmup ..................................... False 0: no_load_optim ................................... None 0: no_load_rng ..................................... None 0: no_save_optim ................................... None 0: no_save_rng ..................................... None 0: noise_density ................................... None 0: num_attention_heads ............................. 10 0: num_channels .................................... 3 0: num_classes ..................................... 1000 0: num_layers ...................................... 10 0: num_layers_per_virtual_pipeline_stage ........... None 0: num_workers ..................................... 2 0: onnx_safe ....................................... None 0: openai_gelu ..................................... False 0: optimizer ....................................... adam 0: optimizer_fusion ................................ True 0: override_lr_scheduler ........................... False 0: pad_vocab_size_to ............................... None 0: params_dtype .................................... torch.bfloat16 0: partition_activations ........................... False 0: patch_dim ....................................... 16 0: pipeline_model_parallel_size .................... 1 0: position_embedding_type ......................... PositionEmbeddingType.absolute 0: pp_partition_method ............................. None 0: profile_backward ................................ False 0: query_in_block_prob ............................. 0.1 0: rampup_batch_size ............................... None 0: rank ............................................ 0 0: remote_device ................................... none 0: reset_attention_mask ............................ False 0: reset_position_ids .............................. False 0: retriever_report_topk_accuracies ................ [] 0: retriever_score_scaling ......................... False 0: retriever_seq_length ............................ 256 0: reweight_loss_based_on_position_frequency ....... False 0: sample_rate ..................................... 1.0 0: save ............................................ checkpoints_83m 0: save_interval ................................... 1000 0: scatter_gather_tensors_in_pipeline .............. True 0: scattered_embeddings ............................ False 0: seed ............................................ 1234 0: seq_length ...................................... 2048 0: sgd_momentum .................................... 0.9 0: short_seq_prob .................................. 0.1 0: skip_train_iteration_range ...................... None 0: split ........................................... 949,50,1 0: split_transformers .............................. False 0: sync_tp_duplicated_parameters ................... False 0: synchronize_each_layer .......................... False 0: tensor_model_parallel_size ...................... 1 0: tensorboard_dir ................................. tensorboard_83m 0: tensorboard_log_interval ........................ 1 0: tensorboard_queue_size .......................... 5 0: test_weighted_split_names ....................... None 0: test_weighted_split_paths ....................... None 0: test_weighted_split_paths_path .................. None 0: test_weighted_split_splits ...................... None 0: test_weighted_split_weights ..................... None 0: tile_factor ..................................... 1 0: titles_data_path ................................ None 0: tokenizer_name_or_path .......................... None 0: tokenizer_type .................................. GPT2BPETokenizer 0: train_iters ..................................... None 0: train_samples ................................... 9703701 0: train_tokens .................................... None 0: train_weighted_split_paths ...................... None 0: train_weighted_split_paths_path ................. None 0: universal_checkpoint ............................ False 0: use_bnb_optimizer ............................... False 0: use_checkpoint_lr_scheduler ..................... False 0: use_contiguous_buffers_in_ddp ................... True 0: use_cpu_initialization .......................... None 0: use_one_sent_docs ............................... False 0: use_pin_memory .................................. False 0: valid_num_workers ............................... 2 0: valid_weighted_split_names ...................... None 0: valid_weighted_split_paths ...................... None 0: valid_weighted_split_paths_path ................. None 0: valid_weighted_split_splits ..................... None 0: valid_weighted_split_weights .................... None 0: virtual_pipeline_model_parallel_size ............ None 0: vocab_extra_ids ................................. 0 0: vocab_file ...................................... gpt2/vocab.json 0: weight_decay .................................... 0.1 0: world_size ...................................... 256 0: zero_allgather_bucket_size ...................... 0.0 0: zero_contigious_gradients ....................... False 0: zero_reduce_bucket_size ......................... 0.0 0: zero_reduce_scatter ............................. False 0: zero_stage ...................................... 0 0: -------------------- end of arguments --------------------- 0: setting number of micro-batches to constant 1 0: > building GPT2BPETokenizer tokenizer ... 0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) 0: DeepSpeed general environment info: 0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] 0: torch version .................... 1.13.0+rocm5.2 0: torch cuda version ............... None 0: torch hip version ................ 5.2.21151-afdc89f8 0: nvcc version ..................... None 0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] 0: deepspeed info ................... 0.7.5, unknown, unknown 0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** 0: > initializing torch distributed ... 0: [2022-11-24 17:05:00,020] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl 31: > setting tensorboard ... 0: > initializing tensor model parallel with size 1 0: > initializing pipeline model parallel with size 1 0: > setting random seeds to 1234 ... 0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 0: > compiling dataset index builder ... 0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' 0: make: Nothing to be done for 'default'. 0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' 0: >>> done with dataset index builder. Compilation time: 0.048 seconds 0: WARNING: constraints for invoking optimized fused softmax kernel are not met. We default back to unfused kernel invocations. 0: > compiling and loading fused kernels ... 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] 0: Total number of unsupported CUDA function calls: 0 0: 0: 0: Total number of replaced kernel launches: 87 0: [1/1] c++ scaled_upper_triang_masked_softmax_hip.o scaled_upper_triang_masked_softmax_hip.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_upper_triang_masked_softmax_cuda.so 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] 0: Total number of unsupported CUDA function calls: 0 0: 0: 0: Total number of replaced kernel launches: 63 0: ninja: no work to do. 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] 0: Total number of unsupported CUDA function calls: 0 0: 0: 0: Total number of replaced kernel launches: 67 0: ninja: no work to do. 0: >>> done with compiling and loading fused kernels. Compilation time: 21.824 seconds 0: time to initialize megatron (seconds): 62.926 0: [after megatron is initialized] datetime: 2022-11-24 17:05:34 0: building GPT model ... 0: [2022-11-24 17:05:34,977] [INFO] [utils.py:827:see_memory_usage] Before Building Model 0: [2022-11-24 17:05:34,977] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB 0: [2022-11-24 17:05:34,977] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.82 GB, percent = 6.5% 0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None 0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi 0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63, ProcessCoord(pipe=0, data=64, model=0): 64, ProcessCoord(pipe=0, data=65, model=0): 65, ProcessCoord(pipe=0, data=66, model=0): 66, ProcessCoord(pipe=0, data=67, model=0): 67, ProcessCoord(pipe=0, data=68, model=0): 68, ProcessCoord(pipe=0, data=69, model=0): 0: 69, ProcessCoord(pipe=0, data=70, model=0): 70, ProcessCoord(pipe=0, data=71, model=0): 71, ProcessCoord(pipe=0, data=72, model=0): 72, ProcessCoord(pipe=0, data=73, model=0): 73, ProcessCoord(pipe=0, data=74, model=0): 74, ProcessCoord(pipe=0, data=75, model=0): 75, ProcessCoord(pipe=0, data=76, model=0): 76, ProcessCoord(pipe=0, data=77, model=0): 77, ProcessCoord(pipe=0, data=78, model=0): 78, ProcessCoord(pipe=0, data=79, model=0): 79, ProcessCoord(pipe=0, data=80, model=0): 80, ProcessCoord(pipe=0, data=81, model=0): 81, ProcessCoord(pipe=0, data=82, model=0): 82, ProcessCoord(pipe=0, data=83, model=0): 83, ProcessCoord(pipe=0, data=84, model=0): 84, ProcessCoord(pipe=0, data=85, model=0): 85, ProcessCoord(pipe=0, data=86, model=0): 86, ProcessCoord(pipe=0, data=87, model=0): 87, ProcessCoord(pipe=0, data=88, model=0): 88, ProcessCoord(pipe=0, data=89, model=0): 89, ProcessCoord(pipe=0, data=90, model=0): 90, ProcessCoord(pipe=0, data=91, model=0): 91, ProcessCoord(pipe=0, data=92, model=0): 92, Process 0: Coord(pipe=0, data=93, model=0): 93, ProcessCoord(pipe=0, data=94, model=0): 94, ProcessCoord(pipe=0, data=95, model=0): 95, ProcessCoord(pipe=0, data=96, model=0): 96, ProcessCoord(pipe=0, data=97, model=0): 97, ProcessCoord(pipe=0, data=98, model=0): 98, ProcessCoord(pipe=0, data=99, model=0): 99, ProcessCoord(pipe=0, data=100, model=0): 100, ProcessCoord(pipe=0, data=101, model=0): 101, ProcessCoord(pipe=0, data=102, model=0): 102, ProcessCoord(pipe=0, data=103, model=0): 103, ProcessCoord(pipe=0, data=104, model=0): 104, ProcessCoord(pipe=0, data=105, model=0): 105, ProcessCoord(pipe=0, data=106, model=0): 106, ProcessCoord(pipe=0, data=107, model=0): 107, ProcessCoord(pipe=0, data=108, model=0): 108, ProcessCoord(pipe=0, data=109, model=0): 109, ProcessCoord(pipe=0, data=110, model=0): 110, ProcessCoord(pipe=0, data=111, model=0): 111, ProcessCoord(pipe=0, data=112, model=0): 112, ProcessCoord(pipe=0, data=113, model=0): 113, ProcessCoord(pipe=0, data=114, model=0): 114, ProcessCoord(pipe=0, data=115, mo 0: del=0): 115, ProcessCoord(pipe=0, data=116, model=0): 116, ProcessCoord(pipe=0, data=117, model=0): 117, ProcessCoord(pipe=0, data=118, model=0): 118, ProcessCoord(pipe=0, data=119, model=0): 119, ProcessCoord(pipe=0, data=120, model=0): 120, ProcessCoord(pipe=0, data=121, model=0): 121, ProcessCoord(pipe=0, data=122, model=0): 122, ProcessCoord(pipe=0, data=123, model=0): 123, ProcessCoord(pipe=0, data=124, model=0): 124, ProcessCoord(pipe=0, data=125, model=0): 125, ProcessCoord(pipe=0, data=126, model=0): 126, ProcessCoord(pipe=0, data=127, model=0): 127, ProcessCoord(pipe=0, data=128, model=0): 128, ProcessCoord(pipe=0, data=129, model=0): 129, ProcessCoord(pipe=0, data=130, model=0): 130, ProcessCoord(pipe=0, data=131, model=0): 131, ProcessCoord(pipe=0, data=132, model=0): 132, ProcessCoord(pipe=0, data=133, model=0): 133, ProcessCoord(pipe=0, data=134, model=0): 134, ProcessCoord(pipe=0, data=135, model=0): 135, ProcessCoord(pipe=0, data=136, model=0): 136, ProcessCoord(pipe=0, data=137, model=0): 137, 0: ProcessCoord(pipe=0, data=138, model=0): 138, ProcessCoord(pipe=0, data=139, model=0): 139, ProcessCoord(pipe=0, data=140, model=0): 140, ProcessCoord(pipe=0, data=141, model=0): 141, ProcessCoord(pipe=0, data=142, model=0): 142, ProcessCoord(pipe=0, data=143, model=0): 143, ProcessCoord(pipe=0, data=144, model=0): 144, ProcessCoord(pipe=0, data=145, model=0): 145, ProcessCoord(pipe=0, data=146, model=0): 146, ProcessCoord(pipe=0, data=147, model=0): 147, ProcessCoord(pipe=0, data=148, model=0): 148, ProcessCoord(pipe=0, data=149, model=0): 149, ProcessCoord(pipe=0, data=150, model=0): 150, ProcessCoord(pipe=0, data=151, model=0): 151, ProcessCoord(pipe=0, data=152, model=0): 152, ProcessCoord(pipe=0, data=153, model=0): 153, ProcessCoord(pipe=0, data=154, model=0): 154, ProcessCoord(pipe=0, data=155, model=0): 155, ProcessCoord(pipe=0, data=156, model=0): 156, ProcessCoord(pipe=0, data=157, model=0): 157, ProcessCoord(pipe=0, data=158, model=0): 158, ProcessCoord(pipe=0, data=159, model=0): 159, ProcessCoor 0: d(pipe=0, data=160, model=0): 160, ProcessCoord(pipe=0, data=161, model=0): 161, ProcessCoord(pipe=0, data=162, model=0): 162, ProcessCoord(pipe=0, data=163, model=0): 163, ProcessCoord(pipe=0, data=164, model=0): 164, ProcessCoord(pipe=0, data=165, model=0): 165, ProcessCoord(pipe=0, data=166, model=0): 166, ProcessCoord(pipe=0, data=167, model=0): 167, ProcessCoord(pipe=0, data=168, model=0): 168, ProcessCoord(pipe=0, data=169, model=0): 169, ProcessCoord(pipe=0, data=170, model=0): 170, ProcessCoord(pipe=0, data=171, model=0): 171, ProcessCoord(pipe=0, data=172, model=0): 172, ProcessCoord(pipe=0, data=173, model=0): 173, ProcessCoord(pipe=0, data=174, model=0): 174, ProcessCoord(pipe=0, data=175, model=0): 175, ProcessCoord(pipe=0, data=176, model=0): 176, ProcessCoord(pipe=0, data=177, model=0): 177, ProcessCoord(pipe=0, data=178, model=0): 178, ProcessCoord(pipe=0, data=179, model=0): 179, ProcessCoord(pipe=0, data=180, model=0): 180, ProcessCoord(pipe=0, data=181, model=0): 181, ProcessCoord(pipe=0, da 0: ta=182, model=0): 182, ProcessCoord(pipe=0, data=183, model=0): 183, ProcessCoord(pipe=0, data=184, model=0): 184, ProcessCoord(pipe=0, data=185, model=0): 185, ProcessCoord(pipe=0, data=186, model=0): 186, ProcessCoord(pipe=0, data=187, model=0): 187, ProcessCoord(pipe=0, data=188, model=0): 188, ProcessCoord(pipe=0, data=189, model=0): 189, ProcessCoord(pipe=0, data=190, model=0): 190, ProcessCoord(pipe=0, data=191, model=0): 191, ProcessCoord(pipe=0, data=192, model=0): 192, ProcessCoord(pipe=0, data=193, model=0): 193, ProcessCoord(pipe=0, data=194, model=0): 194, ProcessCoord(pipe=0, data=195, model=0): 195, ProcessCoord(pipe=0, data=196, model=0): 196, ProcessCoord(pipe=0, data=197, model=0): 197, ProcessCoord(pipe=0, data=198, model=0): 198, ProcessCoord(pipe=0, data=199, model=0): 199, ProcessCoord(pipe=0, data=200, model=0): 200, ProcessCoord(pipe=0, data=201, model=0): 201, ProcessCoord(pipe=0, data=202, model=0): 202, ProcessCoord(pipe=0, data=203, model=0): 203, ProcessCoord(pipe=0, data=204, mode 0: l=0): 204, ProcessCoord(pipe=0, data=205, model=0): 205, ProcessCoord(pipe=0, data=206, model=0): 206, ProcessCoord(pipe=0, data=207, model=0): 207, ProcessCoord(pipe=0, data=208, model=0): 208, ProcessCoord(pipe=0, data=209, model=0): 209, ProcessCoord(pipe=0, data=210, model=0): 210, ProcessCoord(pipe=0, data=211, model=0): 211, ProcessCoord(pipe=0, data=212, model=0): 212, ProcessCoord(pipe=0, data=213, model=0): 213, ProcessCoord(pipe=0, data=214, model=0): 214, ProcessCoord(pipe=0, data=215, model=0): 215, ProcessCoord(pipe=0, data=216, model=0): 216, ProcessCoord(pipe=0, data=217, model=0): 217, ProcessCoord(pipe=0, data=218, model=0): 218, ProcessCoord(pipe=0, data=219, model=0): 219, ProcessCoord(pipe=0, data=220, model=0): 220, ProcessCoord(pipe=0, data=221, model=0): 221, ProcessCoord(pipe=0, data=222, model=0): 222, ProcessCoord(pipe=0, data=223, model=0): 223, ProcessCoord(pipe=0, data=224, model=0): 224, ProcessCoord(pipe=0, data=225, model=0): 225, ProcessCoord(pipe=0, data=226, model=0): 226, P 0: rocessCoord(pipe=0, data=227, model=0): 227, ProcessCoord(pipe=0, data=228, model=0): 228, ProcessCoord(pipe=0, data=229, model=0): 229, ProcessCoord(pipe=0, data=230, model=0): 230, ProcessCoord(pipe=0, data=231, model=0): 231, ProcessCoord(pipe=0, data=232, model=0): 232, ProcessCoord(pipe=0, data=233, model=0): 233, ProcessCoord(pipe=0, data=234, model=0): 234, ProcessCoord(pipe=0, data=235, model=0): 235, ProcessCoord(pipe=0, data=236, model=0): 236, ProcessCoord(pipe=0, data=237, model=0): 237, ProcessCoord(pipe=0, data=238, model=0): 238, ProcessCoord(pipe=0, data=239, model=0): 239, ProcessCoord(pipe=0, data=240, model=0): 240, ProcessCoord(pipe=0, data=241, model=0): 241, ProcessCoord(pipe=0, data=242, model=0): 242, ProcessCoord(pipe=0, data=243, model=0): 243, ProcessCoord(pipe=0, data=244, model=0): 244, ProcessCoord(pipe=0, data=245, model=0): 245, ProcessCoord(pipe=0, data=246, model=0): 246, ProcessCoord(pipe=0, data=247, model=0): 247, ProcessCoord(pipe=0, data=248, model=0): 248, ProcessCoord( 0: pipe=0, data=249, model=0): 249, ProcessCoord(pipe=0, data=250, model=0): 250, ProcessCoord(pipe=0, data=251, model=0): 251, ProcessCoord(pipe=0, data=252, model=0): 252, ProcessCoord(pipe=0, data=253, model=0): 253, ProcessCoord(pipe=0, data=254, model=0): 254, ProcessCoord(pipe=0, data=255, model=0): 255} 0: [2022-11-24 17:05:43,888] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer 0: stage=0 layers=17 0: 0: _to_float16 0: 1: EmbeddingPipe 0: 2: 0: 3: ParallelTransformerLayerPipe 0: 4: ParallelTransformerLayerPipe 0: 5: ParallelTransformerLayerPipe 0: 6: ParallelTransformerLayerPipe 0: 7: ParallelTransformerLayerPipe 0: 8: ParallelTransformerLayerPipe 0: 9: ParallelTransformerLayerPipe 0: 10: ParallelTransformerLayerPipe 0: 11: ParallelTransformerLayerPipe 0: 12: ParallelTransformerLayerPipe 0: 13: undo 0: 14: MixedFusedLayerNorm 0: 15: EmbeddingPipe 0: 16: float16_to_fp32 0: loss: CrossEntropy 0: [2022-11-24 17:05:43,981] [INFO] [utils.py:827:see_memory_usage] After Building Model 0: [2022-11-24 17:05:43,982] [INFO] [utils.py:828:see_memory_usage] MA 0.16 GB Max_MA 0.16 GB CA 0.17 GB Max_CA 0 GB 0: [2022-11-24 17:05:43,982] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.83 GB, percent = 6.5% 0: setting training iterations to 37905 0: > learning rate decay style: cosine 0: DeepSpeed is enabled. 0: [2022-11-24 17:05:43,983] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown 0: [2022-11-24 17:06:01,923] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False 0: [2022-11-24 17:06:01,923] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer 0: [2022-11-24 17:06:01,924] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer 0: [2022-11-24 17:06:01,928] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam 0: [2022-11-24 17:06:01,928] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer 0: [2022-11-24 17:06:01,972] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer 0: [2022-11-24 17:06:01,973] [INFO] [utils.py:828:see_memory_usage] MA 0.16 GB Max_MA 0.16 GB CA 0.17 GB Max_CA 0 GB 0: [2022-11-24 17:06:01,973] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 33.53 GB, percent = 6.7% 1: ninja: no work to do. 1: Time to load utils op: 0.18718862533569336 seconds 1: ninja: no work to do. 7: Time to load utils op: 0.10224294662475586 seconds 6: Time to load utils op: 0.10393691062927246 seconds 6: Time to load utils op: 0.10397648811340332 seconds 1: Time to load utils op: 0.11279129981994629 seconds 6: Time to load utils op: 0.10380363464355469 seconds 6: Time to load utils op: 0.1039130687713623 seconds 6: Time to load utils op: 0.10395669937133789 seconds 6: Time to load utils op: 0.10418224334716797 seconds 6: Time to load utils op: 0.10437798500061035 seconds 6: Time to load utils op: 0.1042931079864502 seconds 8: Time to load utils op: 0.10297274589538574 secondsTime to load utils op: 0.10309410095214844 seconds 8: 8: Time to load utils op: 0.10249543190002441 secondsTime to load utils op: 0.10251641273498535 seconds 8: 8: Time to load utils op: 0.10315752029418945 seconds 8: Time to load utils op: 0.1038045883178711 seconds 13: Time to load utils op: 0.3028717041015625 seconds 8: Time to load utils op: 0.10391449928283691 seconds 8: Time to load utils op: 0.10335850715637207 seconds 9: Time to load utils op: 0.10226750373840332 seconds 9: Time to load utils op: 0.10248756408691406 seconds 9: Time to load utils op: 0.10182929039001465 seconds 9: Time to load utils op: 0.10164475440979004 seconds 9: Time to load utils op: 0.10178947448730469 seconds 9: Time to load utils op: 0.10216045379638672 seconds 9: Time to load utils op: 0.10201382637023926 seconds 10: Time to load utils op: 0.10447025299072266 seconds 11: Time to load utils op: 0.10311412811279297 seconds 10: Time to load utils op: 0.10381317138671875 seconds 10: Time to load utils op: 0.10481381416320801 secondsTime to load utils op: 0.1036376953125 seconds 10: 10: Time to load utils op: 0.10414600372314453 seconds 10: Time to load utils op: 0.10373306274414062 seconds 10: Time to load utils op: 0.10469365119934082 seconds 10: Time to load utils op: 0.10455727577209473 seconds 12: Time to load utils op: 0.10288238525390625 seconds 12: Time to load utils op: 0.10216975212097168 seconds 12: Time to load utils op: 0.10218667984008789 seconds 12: Time to load utils op: 0.10237741470336914 secondsTime to load utils op: 0.10226798057556152 seconds 12: 12: Time to load utils op: 0.1025247573852539 seconds 13: Time to load utils op: 0.10190963745117188 seconds 12: Time to load utils op: 0.10261011123657227 seconds 13: Time to load utils op: 0.10174202919006348 seconds 11: Time to load utils op: 0.10254406929016113 seconds 12: Time to load utils op: 0.10295510292053223 seconds 13: Time to load utils op: 0.10173630714416504 seconds 13: Time to load utils op: 0.10188007354736328 seconds 13: Time to load utils op: 0.10192227363586426 seconds 13: Time to load utils op: 0.1020500659942627 seconds 13: Time to load utils op: 0.1020967960357666 seconds 14: Time to load utils op: 0.10320138931274414 seconds 14: Time to load utils op: 0.10330939292907715 seconds 11: Time to load utils op: 0.10263848304748535 seconds 14: Time to load utils op: 0.10244894027709961 seconds 11: Time to load utils op: 0.10269880294799805 seconds 14: Time to load utils op: 0.10236716270446777 seconds 14: Time to load utils op: 0.10248208045959473 secondsTime to load utils op: 0.10294055938720703 seconds 14: 14: Time to load utils op: 0.10316777229309082 seconds 14: Time to load utils op: 0.10289144515991211 seconds 11: Time to load utils op: 0.10261178016662598 seconds 11: Time to load utils op: 0.10268282890319824 seconds 11: Time to load utils op: 0.10268545150756836 seconds 11: Time to load utils op: 0.10296177864074707 seconds 15: Time to load utils op: 0.10262203216552734 seconds 20: Time to load utils op: 0.3039264678955078 seconds 15: Time to load utils op: 0.10362052917480469 seconds 15: Time to load utils op: 0.1037149429321289 secondsTime to load utils op: 0.1035163402557373 seconds 15: 15: Time to load utils op: 0.10350871086120605 secondsTime to load utils op: 0.10287904739379883 seconds 15: 15: Time to load utils op: 0.10277152061462402 seconds 15: Time to load utils op: 0.10393953323364258 seconds 22: Time to load utils op: 0.3029944896697998 seconds 16: Time to load utils op: 0.1036376953125 seconds 16: Time to load utils op: 0.10356974601745605 seconds 16: Time to load utils op: 0.10361504554748535 seconds 16: Time to load utils op: 0.1036984920501709 seconds 16: Time to load utils op: 0.10370540618896484 seconds 16: Time to load utils op: 0.10385298728942871 seconds 16: Time to load utils op: 0.10384273529052734 seconds 16: Time to load utils op: 0.10375785827636719 seconds 18: Time to load utils op: 0.10378289222717285 secondsTime to load utils op: 0.1038351058959961 secondsTime to load utils op: 0.10383439064025879 seconds 18: 18: 17: Time to load utils op: 0.10474872589111328 seconds 18: Time to load utils op: 0.10377907752990723 seconds 18: Time to load utils op: 0.10378503799438477 seconds 18: Time to load utils op: 0.10347747802734375 seconds 18: Time to load utils op: 0.10324501991271973 seconds 17: Time to load utils op: 0.10494279861450195 seconds 17: Time to load utils op: 0.1049802303314209 seconds 17: Time to load utils op: 0.10510659217834473 seconds 18: Time to load utils op: 0.10504603385925293 seconds 17: Time to load utils op: 0.10521864891052246 secondsTime to load utils op: 0.10524845123291016 seconds 17: 17: Time to load utils op: 0.1050715446472168 seconds 17: Time to load utils op: 0.10540318489074707 seconds 19: Time to load utils op: 0.10378575325012207 secondsTime to load utils op: 0.10369324684143066 seconds 19: 19: Time to load utils op: 0.10366153717041016 seconds 20: Time to load utils op: 0.10190057754516602 seconds 19: Time to load utils op: 0.10299491882324219 seconds 20: Time to load utils op: 0.10164737701416016 seconds 19: Time to load utils op: 0.1031947135925293 seconds 20: Time to load utils op: 0.10177779197692871 seconds 19: Time to load utils op: 0.10352253913879395 seconds 26: Time to load utils op: 0.30295300483703613 seconds 20: Time to load utils op: 0.1018979549407959 seconds 20: Time to load utils op: 0.10199356079101562 seconds 19: Time to load utils op: 0.10349178314208984 seconds 19: Time to load utils op: 0.10393333435058594 seconds 20: Time to load utils op: 0.1017923355102539 seconds 20: Time to load utils op: 0.10171151161193848 seconds 22: Time to load utils op: 0.10181975364685059 secondsTime to load utils op: 0.10191202163696289 seconds 22: 22: Time to load utils op: 0.10231924057006836 seconds 22: Time to load utils op: 0.10166358947753906 seconds 21: Time to load utils op: 0.10448598861694336 secondsTime to load utils op: 0.10406303405761719 seconds 21: Time to load utils op: 0.10445952415466309 seconds 21: 21: Time to load utils op: 0.1046290397644043 secondsTime to load utils op: 0.10465025901794434 seconds 21: 21: Time to load utils op: 0.10460853576660156 seconds 21: Time to load utils op: 0.10495924949645996 seconds 22: Time to load utils op: 0.1018519401550293 seconds 29: Time to load utils op: 0.303025484085083 seconds 21: Time to load utils op: 0.10500550270080566 seconds 22: Time to load utils op: 0.10206007957458496 seconds 22: Time to load utils op: 0.10216116905212402 seconds 23: Time to load utils op: 0.10352516174316406 seconds 23: Time to load utils op: 0.10272431373596191 seconds 24: Time to load utils op: 0.10255861282348633 secondsTime to load utils op: 0.10200786590576172 seconds 24: 24: Time to load utils op: 0.10174846649169922 seconds 24: Time to load utils op: 0.10183215141296387 seconds 24: Time to load utils op: 0.10214519500732422 seconds 24: Time to load utils op: 0.10316824913024902 seconds 24: Time to load utils op: 0.10346508026123047 seconds 24: Time to load utils op: 0.10242843627929688 seconds 23: Time to load utils op: 0.10433197021484375 seconds 23: Time to load utils op: 0.10398006439208984 seconds 23: Time to load utils op: 0.10438179969787598 secondsTime to load utils op: 0.10485601425170898 seconds 23: 23: Time to load utils op: 0.10501265525817871 seconds 23: Time to load utils op: 0.10429000854492188 seconds 26: Time to load utils op: 0.10209488868713379 secondsTime to load utils op: 0.1021568775177002 seconds 26: 26: Time to load utils op: 0.10208725929260254 seconds 26: Time to load utils op: 0.10201454162597656 seconds 26: Time to load utils op: 0.10179781913757324 seconds 25: Time to load utils op: 0.10284590721130371 seconds 25: Time to load utils op: 0.10308241844177246 secondsTime to load utils op: 0.10348176956176758 seconds 25: 26: Time to load utils op: 0.10225915908813477 secondsTime to load utils op: 0.10226774215698242 seconds 26: 25: Time to load utils op: 0.10380077362060547 seconds 25: Time to load utils op: 0.10377907752990723 seconds 25: Time to load utils op: 0.10326218605041504 seconds 25: Time to load utils op: 0.10387420654296875 seconds 25: Time to load utils op: 0.10345625877380371 seconds 27: Time to load utils op: 0.1034088134765625 secondsTime to load utils op: 0.10333657264709473 seconds 27: Time to load utils op: 0.10325169563293457 seconds 27: 27: Time to load utils op: 0.10366964340209961 seconds 27: Time to load utils op: 0.10329508781433105 seconds 27: Time to load utils op: 0.10368537902832031 seconds 27: Time to load utils op: 0.10341715812683105 seconds 27: Time to load utils op: 0.1035165786743164 seconds 29: Time to load utils op: 0.10158824920654297 seconds 29: Time to load utils op: 0.10201191902160645 seconds 29: Time to load utils op: 0.10183286666870117 seconds 29: Time to load utils op: 0.10178208351135254 seconds 29: Time to load utils op: 0.10180091857910156 seconds 29: Time to load utils op: 0.10211706161499023 seconds 28: Time to load utils op: 0.10280632972717285 seconds 28: Time to load utils op: 0.1032562255859375 seconds 28: Time to load utils op: 0.10300230979919434 seconds 29: Time to load utils op: 0.10249495506286621 seconds 28: Time to load utils op: 0.10303282737731934 secondsTime to load utils op: 0.10352587699890137 seconds 28: 28: Time to load utils op: 0.10298371315002441 seconds 28: Time to load utils op: 0.10310840606689453 seconds 28: Time to load utils op: 0.10339045524597168 seconds 30: Time to load utils op: 0.10342073440551758 seconds 30: Time to load utils op: 0.10336542129516602 seconds 30: Time to load utils op: 0.10319972038269043 seconds 31: Time to load utils op: 0.1035614013671875 seconds 30: Time to load utils op: 0.10369873046875 seconds 30: Time to load utils op: 0.1037607192993164 seconds 31: Time to load utils op: 0.10223770141601562 secondsTime to load utils op: 0.10327887535095215 seconds 31: 30: Time to load utils op: 0.10349392890930176 secondsTime to load utils op: 0.10338473320007324 seconds 30: 31: Time to load utils op: 0.1031942367553711 secondsTime to load utils op: 0.10247135162353516 seconds 31: 30: Time to load utils op: 0.10374045372009277 seconds 0: Time to load utils op: 0.30325818061828613 seconds 31: Time to load utils op: 0.10385894775390625 secondsTime to load utils op: 0.10301756858825684 seconds 31: Time to load utils op: 0.10273861885070801 seconds 31: 1: Time to load utils op: 0.0006113052368164062 seconds 1: Time to load utils op: 0.00043702125549316406 seconds 6: Time to load utils op: 0.00046253204345703125 seconds 6: Time to load utils op: 0.0003445148468017578 seconds 6: Time to load utils op: 0.00037288665771484375 seconds 6: Time to load utils op: 0.00034546852111816406 seconds 6: Time to load utils op: 0.0003609657287597656 seconds 6: Time to load utils op: 0.0003464221954345703 seconds 6: Time to load utils op: 0.0003807544708251953 seconds 6: Time to load utils op: 0.0003440380096435547 seconds 7: Time to load utils op: 0.000461578369140625 seconds 8: Time to load utils op: 0.0004429817199707031 seconds 8: Time to load utils op: 0.00034737586975097656 seconds 8: Time to load utils op: 0.0003273487091064453 seconds 8: Time to load utils op: 0.0003459453582763672 seconds 8: Time to load utils op: 0.0003306865692138672 seconds 8: Time to load utils op: 0.0004222393035888672 seconds 8: Time to load utils op: 0.0003268718719482422 seconds 8: Time to load utils op: 0.0003113746643066406 seconds 9: Time to load utils op: 0.00048351287841796875 seconds 9: Time to load utils op: 0.0003573894500732422 seconds 9: Time to load utils op: 0.0003662109375 seconds 9: Time to load utils op: 0.0003056526184082031 seconds 9: Time to load utils op: 0.0003237724304199219 seconds 9: Time to load utils op: 0.00031828880310058594 seconds 9: Time to load utils op: 0.00037479400634765625 seconds 10: Time to load utils op: 0.0005006790161132812 seconds 10: Time to load utils op: 0.00035858154296875 seconds 10: Time to load utils op: 0.0003349781036376953 seconds 10: Time to load utils op: 0.00034046173095703125 seconds 10: Time to load utils op: 0.0003685951232910156 seconds 10: Time to load utils op: 0.00036835670471191406 seconds 10: Time to load utils op: 0.00036263465881347656 seconds 10: Time to load utils op: 0.00033783912658691406 seconds 0: [2022-11-24 17:06:02,308] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 12: Time to load utils op: 0.00044155120849609375 seconds 0: [2022-11-24 17:06:02,309] [INFO] [utils.py:828:see_memory_usage] MA 0.16 GB Max_MA 0.16 GB CA 0.17 GB Max_CA 0 GB 0: [2022-11-24 17:06:02,309] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 33.53 GB, percent = 6.7% 13: Time to load utils op: 0.0005486011505126953 seconds 12: Time to load utils op: 0.0004131793975830078 seconds 11: Time to load utils op: 0.0005779266357421875 seconds 12: Time to load utils op: 0.0003788471221923828 secondsTime to load utils op: 0.0003914833068847656 seconds 12: 12: Time to load utils op: 0.00037217140197753906 seconds 12: Time to load utils op: 0.00037217140197753906 seconds 12: Time to load utils op: 0.0003085136413574219 seconds 12: Time to load utils op: 0.00033664703369140625 seconds 11: Time to load utils op: 0.00042724609375 seconds 13: Time to load utils op: 0.0003826618194580078 seconds 11: Time to load utils op: 0.00042510032653808594 seconds 13: Time to load utils op: 0.0004184246063232422 seconds 13: Time to load utils op: 0.00039386749267578125 seconds 13: Time to load utils op: 0.0003540515899658203 seconds 11: Time to load utils op: 0.0004277229309082031 seconds 13: Time to load utils op: 0.0003566741943359375 seconds 11: Time to load utils op: 0.0003941059112548828 secondsTime to load utils op: 0.0004410743713378906 seconds 11: 13: Time to load utils op: 0.000362396240234375 seconds 13: Time to load utils op: 0.00034999847412109375 seconds 11: Time to load utils op: 0.00043272972106933594 seconds 14: Time to load utils op: 0.0004603862762451172 secondsTime to load utils op: 0.0003619194030761719 seconds 14: 14: Time to load utils op: 0.0003666877746582031 seconds 14: Time to load utils op: 0.0003476142883300781 seconds 14: Time to load utils op: 0.0003707408905029297 seconds 14: Time to load utils op: 0.0003597736358642578 seconds 14: Time to load utils op: 0.00037741661071777344 seconds 11: Time to load utils op: 0.0005185604095458984 seconds 14: Time to load utils op: 0.00037407875061035156 seconds 15: Time to load utils op: 0.0004425048828125 seconds 15: Time to load utils op: 0.0003781318664550781 seconds 15: Time to load utils op: 0.00035762786865234375 seconds 15: Time to load utils op: 0.0003552436828613281 seconds 15: Time to load utils op: 0.0003643035888671875 seconds 16: Time to load utils op: 0.0005083084106445312 seconds 15: Time to load utils op: 0.0003476142883300781 seconds 16: Time to load utils op: 0.00031495094299316406 seconds 17: Time to load utils op: 0.0004391670227050781 seconds 16: Time to load utils op: 0.0003380775451660156 seconds 16: Time to load utils op: 0.00032591819763183594 seconds 16: Time to load utils op: 0.00038242340087890625 seconds 17: Time to load utils op: 0.00034689903259277344 seconds 16: Time to load utils op: 0.00036215782165527344 seconds 16: Time to load utils op: 0.00036787986755371094 seconds 17: Time to load utils op: 0.0004038810729980469 seconds 16: Time to load utils op: 0.00036716461181640625 seconds 15: Time to load utils op: 0.00035881996154785156 seconds 15: Time to load utils op: 0.00035834312438964844 seconds 17: Time to load utils op: 0.00034809112548828125 seconds 17: Time to load utils op: 0.00037670135498046875 seconds 17: Time to load utils op: 0.0004024505615234375 seconds 17: Time to load utils op: 0.0003528594970703125 seconds 17: Time to load utils op: 0.0003762245178222656 seconds 20: Time to load utils op: 0.0004892349243164062 seconds 18: Time to load utils op: 0.0003993511199951172 seconds 19: Time to load utils op: 0.00045180320739746094 seconds 18: Time to load utils op: 0.0004355907440185547 seconds 18: Time to load utils op: 0.00039124488830566406 seconds 18: Time to load utils op: 0.00038623809814453125 seconds 18: Time to load utils op: 0.00041103363037109375 seconds 19: Time to load utils op: 0.0003719329833984375 seconds 19: Time to load utils op: 0.0003485679626464844 seconds 18: Time to load utils op: 0.00031757354736328125 seconds 18: Time to load utils op: 0.0004067420959472656 seconds 19: Time to load utils op: 0.00035119056701660156 seconds 18: Time to load utils op: 0.0003898143768310547 seconds 19: Time to load utils op: 0.0003204345703125 seconds 22: Time to load utils op: 0.0005626678466796875 seconds 19: Time to load utils op: 0.00034332275390625 secondsTime to load utils op: 0.000377655029296875 seconds 20: Time to load utils op: 0.0003464221954345703 seconds 19: 19: Time to load utils op: 0.0003762245178222656 seconds 20: Time to load utils op: 0.0003504753112792969 seconds 20: Time to load utils op: 0.00036263465881347656 seconds 20: Time to load utils op: 0.00037932395935058594 seconds 20: Time to load utils op: 0.0003216266632080078 seconds 20: Time to load utils op: 0.0003345012664794922 seconds 20: Time to load utils op: 0.00032138824462890625 seconds 21: Time to load utils op: 0.0004596710205078125 seconds 21: Time to load utils op: 0.0003364086151123047 seconds 21: Time to load utils op: 0.0004291534423828125 secondsTime to load utils op: 0.0004093647003173828 seconds 21: 21: Time to load utils op: 0.0003898143768310547 seconds 21: Time to load utils op: 0.0003783702850341797 seconds 22: Time to load utils op: 0.0003573894500732422 seconds 22: Time to load utils op: 0.0003390312194824219 seconds 22: Time to load utils op: 0.0003266334533691406 seconds 21: Time to load utils op: 0.0003466606140136719 seconds 21: Time to load utils op: 0.0004150867462158203 seconds 22: Time to load utils op: 0.0003440380096435547 seconds 22: Time to load utils op: 0.0003409385681152344 seconds 22: Time to load utils op: 0.0003905296325683594 seconds 22: Time to load utils op: 0.00035190582275390625 seconds 23: Time to load utils op: 0.0004551410675048828 seconds 23: Time to load utils op: 0.00034928321838378906 seconds 23: Time to load utils op: 0.00038552284240722656 secondsTime to load utils op: 0.0003883838653564453 seconds 23: 23: Time to load utils op: 0.0003707408905029297 seconds 23: Time to load utils op: 0.0003638267517089844 seconds 26: Time to load utils op: 0.0005145072937011719 seconds 23: Time to load utils op: 0.0003616809844970703 seconds 23: Time to load utils op: 0.0003407001495361328 seconds 24: Time to load utils op: 0.000518798828125 seconds 24: Time to load utils op: 0.0003268718719482422 seconds 24: Time to load utils op: 0.0004930496215820312 seconds 24: Time to load utils op: 0.0003039836883544922 seconds 25: Time to load utils op: 0.00044989585876464844 seconds 24: Time to load utils op: 0.000324249267578125 seconds 25: Time to load utils op: 0.0003390312194824219 seconds 24: Time to load utils op: 0.0003952980041503906 seconds 24: Time to load utils op: 0.00038170814514160156 seconds 25: Time to load utils op: 0.00035500526428222656 seconds 24: Time to load utils op: 0.0003750324249267578 seconds 25: Time to load utils op: 0.00031256675720214844 seconds 26: Time to load utils op: 0.00034737586975097656 seconds 25: Time to load utils op: 0.00033974647521972656 seconds 26: Time to load utils op: 0.000339508056640625 seconds 26: Time to load utils op: 0.0003707408905029297 seconds 25: Time to load utils op: 0.00034880638122558594 seconds 26: Time to load utils op: 0.00034809112548828125 seconds 25: Time to load utils op: 0.00034046173095703125 secondsTime to load utils op: 0.000347137451171875 seconds 25: 26: Time to load utils op: 0.00036644935607910156 seconds 26: Time to load utils op: 0.0003502368927001953 seconds 26: Time to load utils op: 0.0003247261047363281 seconds 27: Time to load utils op: 0.0004258155822753906 seconds 27: Time to load utils op: 0.0003771781921386719 seconds 29: Time to load utils op: 0.0006246566772460938 seconds 27: Time to load utils op: 0.0003540515899658203 seconds 27: Time to load utils op: 0.0003647804260253906 seconds 27: Time to load utils op: 0.0003609657287597656 seconds 27: Time to load utils op: 0.00035452842712402344 seconds 28: Time to load utils op: 0.00046062469482421875 seconds 27: Time to load utils op: 0.00034332275390625 secondsTime to load utils op: 0.0003399848937988281 seconds 27: 28: Time to load utils op: 0.00032401084899902344 seconds 28: Time to load utils op: 0.00033545494079589844 seconds 28: Time to load utils op: 0.00036215782165527344 seconds 28: Time to load utils op: 0.0003592967987060547 seconds 28: Time to load utils op: 0.00032973289489746094 seconds 29: Time to load utils op: 0.0003516674041748047 seconds 28: Time to load utils op: 0.0003514289855957031 seconds 29: Time to load utils op: 0.0003757476806640625 seconds 28: Time to load utils op: 0.0003261566162109375 seconds 29: Time to load utils op: 0.0003800392150878906 seconds 29: Time to load utils op: 0.00034999847412109375 seconds 29: Time to load utils op: 0.00036644935607910156 seconds 29: Time to load utils op: 0.0004260540008544922 seconds 29: Time to load utils op: 0.00040721893310546875 seconds 30: Time to load utils op: 0.0005011558532714844 seconds 30: Time to load utils op: 0.00036644935607910156 seconds 30: Time to load utils op: 0.0003516674041748047 seconds 30: Time to load utils op: 0.0003783702850341797 seconds 30: Time to load utils op: 0.0003542900085449219 seconds 30: Time to load utils op: 0.00034332275390625 seconds 31: Time to load utils op: 0.0004143714904785156 seconds 30: Time to load utils op: 0.0003638267517089844 seconds 30: Time to load utils op: 0.0004146099090576172 seconds 31: Time to load utils op: 0.0003418922424316406 seconds 31: Time to load utils op: 0.00035071372985839844 seconds 31: Time to load utils op: 0.0003695487976074219 seconds 31: Time to load utils op: 0.00035834312438964844 seconds 31: Time to load utils op: 0.00035071372985839844 seconds 31: Time to load utils op: 0.00034046173095703125 seconds 31: Time to load utils op: 0.0003273487091064453 seconds 1: Time to load utils op: 0.2028505802154541 seconds 1: Time to load utils op: 0.2033090591430664 secondsTime to load utils op: 0.20256447792053223 seconds 1: 1: Time to load utils op: 0.20328474044799805 seconds 0: Time to load utils op: 0.20386385917663574 seconds 1: Time to load utils op: 0.20242691040039062 secondsTime to load utils op: 0.2023763656616211 seconds 1: 0: Time to load utils op: 0.2039327621459961 seconds 0: Time to load utils op: 0.2026958465576172 seconds 0: Time to load utils op: 0.20410847663879395 seconds 0: Time to load utils op: 0.20182037353515625 seconds 0: Time to load utils op: 0.20433425903320312 seconds 0: Time to load utils op: 0.20244050025939941 seconds 2: Time to load utils op: 0.20364999771118164 secondsTime to load utils op: 0.20330047607421875 seconds 2: 2: Time to load utils op: 0.20362567901611328 secondsTime to load utils op: 0.2038130760192871 seconds 2: 2: Time to load utils op: 0.20351386070251465 seconds 2: Time to load utils op: 0.20355558395385742 seconds 2: Time to load utils op: 0.20386886596679688 seconds 2: Time to load utils op: 0.2039339542388916 seconds 7: Time to load utils op: 0.4042685031890869 seconds 3: Time to load utils op: 0.20391511917114258 seconds 3: Time to load utils op: 0.20287346839904785 secondsTime to load utils op: 0.20297884941101074 seconds 3: 3: Time to load utils op: 0.20337653160095215 seconds 4: Time to load utils op: 0.20306181907653809 seconds 3: Time to load utils op: 0.20354080200195312 seconds 3: Time to load utils op: 0.20354080200195312 seconds 4: Time to load utils op: 0.20313143730163574 secondsTime to load utils op: 0.20348381996154785 secondsTime to load utils op: 0.20278549194335938 seconds 4: 4: 4: Time to load utils op: 0.20426011085510254 secondsTime to load utils op: 0.20412969589233398 seconds 4: 3: Time to load utils op: 0.20434021949768066 secondsTime to load utils op: 0.204054594039917 seconds 3: 4: Time to load utils op: 0.20400762557983398 seconds 4: Time to load utils op: 0.20319509506225586 seconds 9: Time to load utils op: 0.40308141708374023 seconds 5: Time to load utils op: 0.20430517196655273 seconds 5: Time to load utils op: 0.20383119583129883 secondsTime to load utils op: 0.2034909725189209 seconds 5: 5: Time to load utils op: 0.20336008071899414 seconds 5: Time to load utils op: 0.20427584648132324 seconds 5: Time to load utils op: 0.20400595664978027 seconds 5: Time to load utils op: 0.204911470413208 seconds 7: Time to load utils op: 0.20184111595153809 seconds 5: Time to load utils op: 0.2050471305847168 seconds 7: Time to load utils op: 0.2021167278289795 seconds 7: Time to load utils op: 0.20236968994140625 seconds 7: Time to load utils op: 0.2021632194519043 seconds 7: Time to load utils op: 0.2022111415863037 seconds 7: Time to load utils op: 0.20241332054138184 seconds 1: Time to load utils op: 0.00032830238342285156 seconds 1: Time to load utils op: 0.00036215782165527344 seconds 1: Time to load utils op: 0.00040841102600097656 seconds 1: Time to load utils op: 0.0003731250762939453 seconds 0: [2022-11-24 17:06:02,352] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 1: Time to load utils op: 0.0003719329833984375 seconds 1: Time to load utils op: 0.00040912628173828125 seconds 0: [2022-11-24 17:06:02,353] [INFO] [utils.py:828:see_memory_usage] MA 0.37 GB Max_MA 0.37 GB CA 0.48 GB Max_CA 0 GB 0: [2022-11-24 17:06:02,353] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 33.53 GB, percent = 6.7% 0: Time to load utils op: 0.0005517005920410156 seconds 0: Time to load utils op: 0.0005788803100585938 seconds 0: Time to load utils op: 0.0005762577056884766 seconds 0: Time to load utils op: 0.0004189014434814453 seconds 0: Time to load utils op: 0.0004131793975830078 seconds 0: Time to load utils op: 0.0005707740783691406 seconds 0: Time to load utils op: 0.00043654441833496094 seconds 2: Time to load utils op: 0.0004532337188720703 seconds 2: Time to load utils op: 0.000339508056640625 seconds 2: Time to load utils op: 0.0004284381866455078 seconds 2: Time to load utils op: 0.0003848075866699219 seconds 2: Time to load utils op: 0.00039386749267578125 seconds 2: Time to load utils op: 0.00038504600524902344 seconds 2: Time to load utils op: 0.00038909912109375 seconds 2: Time to load utils op: 0.0003819465637207031 seconds 9: Time to load utils op: 0.0003821849822998047 seconds 3: Time to load utils op: 0.0003647804260253906 secondsTime to load utils op: 0.00045752525329589844 secondsTime to load utils op: 0.00032806396484375 seconds 3: 3: 3: Time to load utils op: 0.0004076957702636719 seconds 3: Time to load utils op: 0.00036644935607910156 seconds 3: Time to load utils op: 0.00033020973205566406 seconds 7: Time to load utils op: 0.0004296302795410156 seconds 3: Time to load utils op: 0.000362396240234375 seconds 3: Time to load utils op: 0.00035381317138671875 seconds 4: Time to load utils op: 0.0004544258117675781 secondsTime to load utils op: 0.00048232078552246094 seconds 4: 4: Time to load utils op: 0.0003097057342529297 seconds 4: Time to load utils op: 0.0003311634063720703 seconds 4: Time to load utils op: 0.00037980079650878906 seconds 4: Time to load utils op: 0.0004112720489501953 seconds 4: Time to load utils op: 0.0003635883331298828 seconds 4: Time to load utils op: 0.0003521442413330078 seconds 5: Time to load utils op: 0.00042557716369628906 seconds 5: Time to load utils op: 0.00032973289489746094 seconds 5: Time to load utils op: 0.0003452301025390625 seconds 5: Time to load utils op: 0.00042247772216796875 seconds 7: Time to load utils op: 0.0003743171691894531 seconds 5: Time to load utils op: 0.0003788471221923828 seconds 5: Time to load utils op: 0.0003769397735595703 seconds 7: Time to load utils op: 0.0003769397735595703 seconds 7: Time to load utils op: 0.0003771781921386719 seconds 5: Time to load utils op: 0.00037217140197753906 seconds 5: Time to load utils op: 0.00036644935607910156 seconds 7: Time to load utils op: 0.0003819465637207031 seconds 7: Time to load utils op: 0.0003895759582519531 seconds 7: Time to load utils op: 0.00045800209045410156 seconds 0: [2022-11-24 17:06:02,400] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 0: [2022-11-24 17:06:02,400] [INFO] [utils.py:828:see_memory_usage] MA 0.37 GB Max_MA 0.37 GB CA 0.48 GB Max_CA 0 GB 0: [2022-11-24 17:06:02,400] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 33.67 GB, percent = 6.7% 0: [2022-11-24 17:06:02,436] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 0: [2022-11-24 17:06:02,437] [INFO] [utils.py:828:see_memory_usage] MA 0.46 GB Max_MA 0.46 GB CA 0.58 GB Max_CA 1 GB 0: [2022-11-24 17:06:02,437] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 33.67 GB, percent = 6.7% 0: [2022-11-24 17:06:02,469] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 0: [2022-11-24 17:06:02,469] [INFO] [utils.py:828:see_memory_usage] MA 0.46 GB Max_MA 0.46 GB CA 0.58 GB Max_CA 1 GB 0: [2022-11-24 17:06:02,469] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 33.67 GB, percent = 6.7% 0: [2022-11-24 17:06:02,503] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 0: [2022-11-24 17:06:02,503] [INFO] [utils.py:828:see_memory_usage] MA 0.46 GB Max_MA 0.46 GB CA 0.58 GB Max_CA 1 GB 0: [2022-11-24 17:06:02,503] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 33.67 GB, percent = 6.7% 0: [2022-11-24 17:06:02,534] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer 0: [2022-11-24 17:06:02,535] [INFO] [utils.py:828:see_memory_usage] MA 0.46 GB Max_MA 0.46 GB CA 0.58 GB Max_CA 1 GB 0: [2022-11-24 17:06:02,535] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 33.67 GB, percent = 6.7% 0: [2022-11-24 17:06:02,571] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer 0: [2022-11-24 17:06:02,572] [INFO] [utils.py:828:see_memory_usage] MA 0.47 GB Max_MA 0.47 GB CA 0.58 GB Max_CA 1 GB 0: [2022-11-24 17:06:02,572] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 33.67 GB, percent = 6.7% 0: [2022-11-24 17:06:02,603] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer 0: [2022-11-24 17:06:02,603] [INFO] [utils.py:828:see_memory_usage] MA 0.47 GB Max_MA 0.47 GB CA 0.58 GB Max_CA 1 GB 0: [2022-11-24 17:06:02,603] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 33.67 GB, percent = 6.7% 0: [2022-11-24 17:06:02,604] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam 0: [2022-11-24 17:06:02,604] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler 0: [2022-11-24 17:06:02,604] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = 0: [2022-11-24 17:06:02,604] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002, 0.0002, 0.0002], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 0: [2022-11-24 17:06:02,604] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: 0: [2022-11-24 17:06:02,604] [INFO] [config.py:1011:print] activation_checkpointing_config { 0: "partition_activations": false, 0: "contiguous_memory_optimization": false, 0: "cpu_checkpointing": false, 0: "number_checkpoints": null, 0: "synchronize_checkpoint_boundary": false, 0: "profile": false 0: } 0: [2022-11-24 17:06:02,604] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} 0: [2022-11-24 17:06:02,604] [INFO] [config.py:1011:print] amp_enabled .................. False 0: [2022-11-24 17:06:02,604] [INFO] [config.py:1011:print] amp_params ................... False 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] autotuning_config ............ { 0: "enabled": false, 0: "start_step": null, 0: "end_step": null, 0: "metric_path": null, 0: "arg_mappings": null, 0: "metric": "throughput", 0: "model_info": null, 0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", 0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", 0: "overwrite": true, 0: "fast": true, 0: "start_profile_step": 3, 0: "end_profile_step": 5, 0: "tuner_type": "gridsearch", 0: "tuner_early_stopping": 5, 0: "tuner_num_trials": 50, 0: "model_info_path": null, 0: "mp_size": 1, 0: "max_train_batch_size": null, 0: "min_train_batch_size": 1, 0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, 0: "min_train_micro_batch_size_per_gpu": 1, 0: "num_tuning_micro_batch_sizes": 3 0: } 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] bfloat16_enabled ............. True 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] comms_config ................. 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] communication_data_type ...... None 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa 0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] curriculum_enabled ........... False 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] curriculum_params ............ False 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] dataloader_drop_last ......... False 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] disable_allgather ............ False 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] dump_state ................... False 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] elasticity_enabled ........... False 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] flops_profiler_config ........ { 0: "enabled": false, 0: "profile_step": 1, 0: "module_depth": -1, 0: "top_modules": 1, 0: "detailed": true, 0: "output_file": null 0: } 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] fp16_auto_cast ............... None 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] fp16_enabled ................. False 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] global_rank .................. 0 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] load_universal_checkpoint .... False 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] loss_scale ................... 1.0 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] memory_breakdown ............. False 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] monitor_config ............... 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] nebula_config ................ { 0: "enabled": false, 0: "persistent_storage_path": null, 0: "persistent_time_interval": 100, 0: "num_of_version_in_retention": 2, 0: "enable_nebula_load": true, 0: "load_path": null 0: } 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False 0: [2022-11-24 17:06:02,605] [INFO] [config.py:1011:print] optimizer_name ............... None 0: [2022-11-24 17:06:02,606] [INFO] [config.py:1011:print] optimizer_params ............. None 0: [2022-11-24 17:06:02,606] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} 0: [2022-11-24 17:06:02,606] [INFO] [config.py:1011:print] pld_enabled .................. False 0: [2022-11-24 17:06:02,606] [INFO] [config.py:1011:print] pld_params ................... False 0: [2022-11-24 17:06:02,606] [INFO] [config.py:1011:print] prescale_gradients ........... False 0: [2022-11-24 17:06:02,606] [INFO] [config.py:1011:print] scheduler_name ............... None 0: [2022-11-24 17:06:02,606] [INFO] [config.py:1011:print] scheduler_params ............. None 0: [2022-11-24 17:06:02,606] [INFO] [config.py:1011:print] sparse_attention ............. None 0: [2022-11-24 17:06:02,606] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False 0: [2022-11-24 17:06:02,606] [INFO] [config.py:1011:print] steps_per_print .............. 2000 0: [2022-11-24 17:06:02,606] [INFO] [config.py:1011:print] train_batch_size ............. 256 0: [2022-11-24 17:06:02,606] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 1 0: [2022-11-24 17:06:02,606] [INFO] [config.py:1011:print] use_node_local_storage ....... False 0: [2022-11-24 17:06:02,606] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False 0: [2022-11-24 17:06:02,606] [INFO] [config.py:1011:print] world_size ................... 256 0: [2022-11-24 17:06:02,606] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False 0: [2022-11-24 17:06:02,606] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False 0: [2022-11-24 17:06:02,606] [INFO] [config.py:1011:print] zero_enabled ................. False 0: [2022-11-24 17:06:02,606] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 0: [2022-11-24 17:06:02,606] [INFO] [config.py:996:print_user_config] json = { 0: "train_micro_batch_size_per_gpu": 1, 0: "train_batch_size": 256, 0: "gradient_clipping": 1.0, 0: "zero_optimization": { 0: "stage": 0 0: }, 0: "bf16": { 0: "enabled": true 0: }, 0: "steps_per_print": 2.000000e+03, 0: "wall_clock_breakdown": false 0: } 0: Time to load utils op: 0.0004143714904785156 seconds 0: [2022-11-24 17:06:02,607] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=1 0: [2022-11-24 17:06:02,617] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=17 [0, 17) STAGE_PARAMS=82741760 (82.742M) TOTAL_PARAMS=82741760 (82.742M) UNIQUE_PARAMS=82741760 (82.742M) 0: [2022-11-24 17:06:02,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 16: [2022-11-24 17:06:02,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 23: [2022-11-24 17:06:02,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 16: [2022-11-24 17:06:02,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 22: [2022-11-24 17:06:02,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 29: [2022-11-24 17:06:02,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 14: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 27: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 27: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 22: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 22: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 16: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 29: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 23: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 14: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 29: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 14: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 24: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 24: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 18: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 18: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 24: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 24: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 16: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 9: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 29: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 23: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 14: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 27: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 22: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 24: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 9: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 7: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 24: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 9: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 4: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 7: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 4: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 4: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 27: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 23: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 26: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 3: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 3: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 18: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 26: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 4: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 17: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 17: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 23: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 26: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 9: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 29: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 26: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 4: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 16: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 28: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 0: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 0: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 5: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 5: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 29: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 23: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 26: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 3: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 14: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 27: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 22: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 24: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 10: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 18: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 25: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 9: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 21: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 5: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 29: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 23: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 26: [2022-11-24 17:06:02,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 14: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 19: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 27: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 22: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 22: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 24: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 16: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 25: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 28: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 13: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 29: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 29: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 6: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 23: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 2: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 15: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 27: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 22: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 22: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 16: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 16: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 18: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 18: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 18: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 25: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 30: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 9: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 28: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 29: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 29: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 11: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 23: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 14: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 19: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 2: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 27: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 24: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 24: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 4: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 16: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 18: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 30: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 17: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 21: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 13: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 7: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 7: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 7: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 29: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 6: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 23: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 31: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 14: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 2: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 27: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 22: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 24: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 10: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 16: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 18: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 25: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 9: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 28: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 7: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 7: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 29: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 23: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 26: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 3: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 31: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 19: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 2: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 27: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 20: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 22: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 24: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 4: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 8: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 25: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 9: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 13: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 13: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 13: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 5: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 7: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 29: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 11: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 23: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 26: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 3: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 3: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 1: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 19: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 27: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 20: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 22: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 24: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 4: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 16: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 18: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 25: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 9: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 17: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 21: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 7: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 7: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 29: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 29: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 11: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 23: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 26: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 26: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 3: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 14: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 1: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 2: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 27: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 20: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 20: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 22: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 24: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 24: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 10: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 4: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 16: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 18: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 25: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 9: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 28: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 13: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 13: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 7: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 29: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 29: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 11: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 23: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 26: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 3: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 14: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 2: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 2: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 27: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 27: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 22: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 24: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 4: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 8: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 8: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 16: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 18: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 17: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 17: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 21: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 21: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 21: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 13: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 5: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 7: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 29: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 11: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 23: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 26: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 26: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 3: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 14: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 2: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 27: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 22: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 24: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 10: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 4: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 8: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 8: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 16: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 25: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 9: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 17: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 0: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 7: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 12: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 29: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 11: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 23: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 26: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 3: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 3: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 1: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 1: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 15: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 27: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 22: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 24: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 10: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 4: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 16: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 18: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 25: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 30: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 9: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 17: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 21: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 13: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 0: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 12: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 29: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 11: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 23: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 26: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 3: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 14: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 1: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 2: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 15: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 15: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 27: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 20: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 22: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 24: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 10: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 4: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 16: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 18: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 9: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 17: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 28: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 21: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 12: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 29: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 11: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 23: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 26: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 26: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 3: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 31: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 1: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 15: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 27: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 20: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 22: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 24: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 10: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 4: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 8: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 16: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 18: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 25: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 30: [2022-11-24 17:06:02,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 9: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 21: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 13: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 12: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 29: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 6: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 23: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 26: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 3: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 14: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 1: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 2: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 15: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 27: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 20: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 22: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 24: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 4: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 18: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 25: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 9: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 17: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 17: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 13: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 0: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 7: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 12: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 29: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 29: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 6: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 23: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 26: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 3: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 31: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 31: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 2: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 20: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 20: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 24: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 18: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 25: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 9: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 17: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 28: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 21: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 5: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 7: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 12: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 29: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 23: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 26: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 3: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 31: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 31: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 19: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 19: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 19: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 2: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 15: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 27: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 20: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 22: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 24: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 24: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 24: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 4: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 16: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 18: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 25: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 30: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 9: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 17: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 28: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 28: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 21: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 13: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 7: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 29: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 11: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 6: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 23: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 26: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 3: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 31: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 15: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 27: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 22: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 24: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 4: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 16: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 18: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 25: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 30: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 9: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 17: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 21: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 13: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 13: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 0: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 0: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 5: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 7: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 12: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 29: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 11: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 23: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 26: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 26: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 3: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 31: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 19: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 2: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 27: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 20: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 22: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 24: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 4: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 18: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 25: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 30: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 9: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 13: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 0: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 7: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 29: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 11: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 6: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 23: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 26: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 3: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 14: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 1: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 2: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 27: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 27: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 20: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 22: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 24: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 10: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 4: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 8: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 25: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 30: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 9: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 17: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 28: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 13: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 5: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 7: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 7: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 12: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 29: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 29: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 11: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 23: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 26: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 3: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 31: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 31: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 31: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 14: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 2: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 15: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 27: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 20: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 22: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 24: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 4: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 8: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 8: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 16: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 18: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 18: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 25: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 17: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 28: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 28: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 13: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 5: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 7: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 12: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 29: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 29: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 11: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 6: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 26: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 3: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 31: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 1: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 19: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 2: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 27: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 20: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 22: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 24: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 4: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 8: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 8: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 25: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 9: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 17: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 28: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 13: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 0: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 7: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 29: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 11: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 23: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 26: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 3: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 31: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 1: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 1: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 1: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 19: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 2: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 15: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 27: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 20: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 22: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 24: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 24: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 4: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 8: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 16: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 30: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 9: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 21: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 13: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 7: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 6: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 23: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 26: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 26: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 26: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 3: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 1: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 19: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 2: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 15: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 27: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 27: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 20: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 24: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 24: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 4: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 16: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 30: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 28: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 13: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 7: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 29: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 11: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 6: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 23: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 26: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 26: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 26: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 3: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 1: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 1: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 1: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 2: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 15: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 27: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 22: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 4: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 8: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 16: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 18: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 25: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 17: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 17: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 28: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 21: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 13: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 0: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 7: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 12: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 23: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 26: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 3: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 31: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 14: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 1: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 19: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 2: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 15: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 22: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 22: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 24: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 4: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 8: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 25: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 30: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 17: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 21: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 13: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 7: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 12: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 6: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 23: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 26: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 26: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 3: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 31: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 14: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 1: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 19: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 15: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 27: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 22: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 24: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 10: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 4: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 8: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 16: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 18: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 25: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 30: [2022-11-24 17:06:02,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 17: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 28: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 13: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 7: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 12: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 29: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 11: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 6: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 23: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 26: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 3: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 14: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 1: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 19: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 2: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 2: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 27: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 27: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 20: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 22: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 22: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 24: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 4: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 8: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 16: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 18: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 18: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 18: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 25: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 30: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 17: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 17: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 28: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 21: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 21: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 21: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 0: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 7: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 12: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 29: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 29: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 11: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 6: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 23: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 26: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 3: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 31: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 1: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 2: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 15: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 27: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 20: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 20: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 22: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 24: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 4: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 8: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 25: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 9: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 17: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 28: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 13: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 7: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 12: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 6: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 23: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 26: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 3: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 31: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 14: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 1: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 19: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 2: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 15: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 27: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 20: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 22: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 24: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 16: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 18: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 25: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 30: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 9: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 9: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 21: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 13: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 0: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 7: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 12: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 29: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 6: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 23: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 26: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 3: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 14: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 19: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 2: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 2: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 15: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 27: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 20: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 4: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 16: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 18: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 30: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 9: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 9: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 17: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 28: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 21: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 13: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 7: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 12: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 11: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 11: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 6: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 23: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 3: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 1: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 19: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 2: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 15: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 27: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 22: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 24: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 4: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 16: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 30: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 9: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 21: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 13: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 12: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 29: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 6: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 23: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 3: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 19: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 2: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 27: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 22: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 4: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 30: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 28: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 13: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 13: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 11: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 6: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 23: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 3: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 2: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 15: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 27: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 22: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 4: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 16: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 25: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 30: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 21: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 13: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 23: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 3: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 3: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 2: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 15: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 27: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 22: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 22: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 4: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 18: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 30: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 17: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 21: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 13: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 23: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 3: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 14: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 2: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 15: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 22: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 4: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 18: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 9: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 17: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 17: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 21: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 13: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 7: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 23: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 3: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 2: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 4: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 16: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 9: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 21: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 13: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 11: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 3: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 3: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 2: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 4: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt... 25: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 9: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 13: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 3: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 2: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 10: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 4: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 13: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 2: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 4: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 13: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 11: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 1: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 4: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 25: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 13: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 2: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 2: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 13: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 2: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 4: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 13: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 2: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 13: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 2: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/mp_rank_00_model_states.pt. 5: [2022-11-24 17:06:02,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 20: [2022-11-24 17:06:02,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 20: [2022-11-24 17:06:02,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 24: [2022-11-24 17:06:02,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 0: [2022-11-24 17:06:02,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 16: [2022-11-24 17:06:02,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 30: [2022-11-24 17:06:02,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 30: [2022-11-24 17:06:02,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 15: [2022-11-24 17:06:02,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 15: [2022-11-24 17:06:02,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 0: [2022-11-24 17:06:02,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 28: [2022-11-24 17:06:02,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 28: [2022-11-24 17:06:02,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 27: [2022-11-24 17:06:02,652] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 23: [2022-11-24 17:06:02,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 18: [2022-11-24 17:06:02,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 6: [2022-11-24 17:06:02,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 6: [2022-11-24 17:06:02,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 25: [2022-11-24 17:06:02,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 2: [2022-11-24 17:06:02,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 2: [2022-11-24 17:06:02,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 20: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 20: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 29: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 14: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 14: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 24: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 29: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 29: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 0: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 21: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 21: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 10: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 10: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 30: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 30: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 8: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 27: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 23: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 28: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 28: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 17: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 0: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 23: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 3: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 12: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 29: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 29: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 15: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 15: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 23: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 6: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 6: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 31: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 3: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 9: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 25: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 14: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 14: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 2: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 2: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 14: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 14: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 14: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 14: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 21: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 26: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 26: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 29: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 31: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 7: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 29: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 26: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 19: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 19: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 12: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 29: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 4: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 12: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 14: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 12: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 23: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 14: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 4: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 3: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 3: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 17: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 3: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 17: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 17: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 3: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 29: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 29: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 29: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 23: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 12: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 4: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 8: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 23: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 23: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 3: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 22: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 23: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 23: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 3: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 22: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 19: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 20: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 20: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 20: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 28: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 13: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 13: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 13: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 13: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 13: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 13: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 30: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 30: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 3: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 30: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 28: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 13: [2022-11-24 17:06:02,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 13: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 30: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 25: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 30: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 1: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 28: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 1: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 1: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 1: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 28: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 14: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 14: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 3: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 1: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 30: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 28: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 7: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 28: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 29: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 6: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 9: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 6: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 6: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 6: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 29: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 26: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 9: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 6: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 6: [2022-11-24 17:06:02,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 1: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 14: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 14: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 14: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 14: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 20: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 20: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 17: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 15: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 15: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 15: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 20: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 17: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 29: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 4: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 29: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 21: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 21: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 21: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 21: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 0: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 0: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 5: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 0: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 29: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 3: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 10: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 16: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 16: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 10: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 16: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 0: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 15: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 10: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 2: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 2: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 2: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 0: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 2: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 27: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 27: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 0: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 2: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 2: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 15: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 15: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 25: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 25: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 27: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 27: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 17: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 17: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 10: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 5: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 8: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 25: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 12: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 8: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 3: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 31: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 22: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 18: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 13: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 13: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 13: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 3: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 16: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 3: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 31: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 5: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 5: [2022-11-24 17:06:02,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 23: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 3: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 7: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 7: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 24: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 24: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 24: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 24: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 18: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 24: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 27: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 27: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 23: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 27: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 9: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 23: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 1: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 3: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 20: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 20: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 20: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 23: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 23: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 24: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 24: [2022-11-24 17:06:02,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 13: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 13: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 13: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 13: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 11: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 26: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 26: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 26: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 1: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 21: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 20: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 20: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 20: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 30: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 13: [2022-11-24 17:06:02,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 0: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 15: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 30: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 30: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 23: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 30: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 30: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 28: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 15: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 0: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 30: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 15: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 0: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 6: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 0: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 28: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 0: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 28: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 0: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 15: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 28: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 9: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 28: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 28: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 2: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 6: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 6: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 6: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 2: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 11: [2022-11-24 17:06:02,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 11: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 11: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 11: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 11: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 6: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 2: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 6: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 2: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 2: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 15: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 2: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 15: [2022-11-24 17:06:02,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 10: [2022-11-24 17:06:02,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 27: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 27: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 27: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 27: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 25: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 24: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 24: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 27: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 24: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 24: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 27: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 27: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 24: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 18: [2022-11-24 17:06:02,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 24: [2022-11-24 17:06:02,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 24: [2022-11-24 17:06:02,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 5: [2022-11-24 17:06:02,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 7: [2022-11-24 17:06:02,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 11: [2022-11-24 17:06:02,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt... 16: [2022-11-24 17:06:02,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 20: [2022-11-24 17:06:02,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 20: [2022-11-24 17:06:02,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 24: [2022-11-24 17:06:02,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 18: [2022-11-24 17:06:02,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 30: [2022-11-24 17:06:02,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 30: [2022-11-24 17:06:02,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 28: [2022-11-24 17:06:02,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 28: [2022-11-24 17:06:02,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 27: [2022-11-24 17:06:02,677] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 0: [2022-11-24 17:06:02,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 0: [2022-11-24 17:06:02,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 23: [2022-11-24 17:06:02,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 6: [2022-11-24 17:06:02,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 6: [2022-11-24 17:06:02,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 15: [2022-11-24 17:06:02,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 15: [2022-11-24 17:06:02,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 21: [2022-11-24 17:06:02,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 21: [2022-11-24 17:06:02,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 2: [2022-11-24 17:06:02,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 2: [2022-11-24 17:06:02,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 21: [2022-11-24 17:06:02,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 25: [2022-11-24 17:06:02,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 23: [2022-11-24 17:06:02,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 10: [2022-11-24 17:06:02,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 10: [2022-11-24 17:06:02,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 23: [2022-11-24 17:06:02,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 9: [2022-11-24 17:06:02,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 14: [2022-11-24 17:06:02,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 14: [2022-11-24 17:06:02,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 29: [2022-11-24 17:06:02,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 29: [2022-11-24 17:06:02,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 17: [2022-11-24 17:06:02,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 17: [2022-11-24 17:06:02,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 3: [2022-11-24 17:06:02,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 3: [2022-11-24 17:06:02,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 7: [2022-11-24 17:06:02,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 9: [2022-11-24 17:06:02,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 29: [2022-11-24 17:06:02,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 18: [2022-11-24 17:06:02,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 25: [2022-11-24 17:06:02,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 20: [2022-11-24 17:06:02,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 16: [2022-11-24 17:06:02,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 20: [2022-11-24 17:06:02,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 13: [2022-11-24 17:06:02,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 29: [2022-11-24 17:06:02,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 13: [2022-11-24 17:06:02,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 13: [2022-11-24 17:06:02,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 14: [2022-11-24 17:06:02,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 14: [2022-11-24 17:06:02,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 29: [2022-11-24 17:06:02,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 14: [2022-11-24 17:06:02,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 14: [2022-11-24 17:06:02,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 29: [2022-11-24 17:06:02,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 5: [2022-11-24 17:06:02,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 17: [2022-11-24 17:06:02,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 17: [2022-11-24 17:06:02,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 17: [2022-11-24 17:06:02,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 11: [2022-11-24 17:06:02,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 27: [2022-11-24 17:06:02,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 30: [2022-11-24 17:06:02,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 14: [2022-11-24 17:06:02,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 17: [2022-11-24 17:06:02,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 14: [2022-11-24 17:06:02,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 17: [2022-11-24 17:06:02,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 30: [2022-11-24 17:06:02,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 28: [2022-11-24 17:06:02,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 24: [2022-11-24 17:06:02,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 28: [2022-11-24 17:06:02,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 5: [2022-11-24 17:06:02,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 21: [2022-11-24 17:06:02,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 6: [2022-11-24 17:06:02,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 6: [2022-11-24 17:06:02,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 29: [2022-11-24 17:06:02,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 0: [2022-11-24 17:06:02,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 0: [2022-11-24 17:06:02,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 15: [2022-11-24 17:06:02,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 15: [2022-11-24 17:06:02,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 13: [2022-11-24 17:06:02,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 29: [2022-11-24 17:06:02,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 25: [2022-11-24 17:06:02,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 26: [2022-11-24 17:06:02,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 3: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 20: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 16: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 18: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 16: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 3: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 20: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 18: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 21: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 23: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 3: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 20: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 21: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 1: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 1: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 20: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 1: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 1: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 1: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 1: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 1: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 1: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 2: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 2: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 13: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 13: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 13: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 31: [2022-11-24 17:06:02,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 13: [2022-11-24 17:06:02,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 21: [2022-11-24 17:06:02,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 21: [2022-11-24 17:06:02,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 21: [2022-11-24 17:06:02,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 10: [2022-11-24 17:06:02,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 26: [2022-11-24 17:06:02,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 21: [2022-11-24 17:06:02,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 21: [2022-11-24 17:06:02,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 23: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 23: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 20: [2022-11-24 17:06:02,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 9: [2022-11-24 17:06:02,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 30: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 30: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 30: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 8: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 30: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 28: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 14: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 19: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 20: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 28: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 31: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 10: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 28: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 20: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 19: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 24: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 24: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 18: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 6: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 19: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 6: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 27: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 29: [2022-11-24 17:06:02,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 27: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 6: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 22: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 7: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 6: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 14: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 15: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 16: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 20: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 20: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 15: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 28: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 29: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 9: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 28: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 28: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 20: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 20: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 6: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 15: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 25: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 6: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 15: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 15: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 30: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 30: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 30: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 9: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 9: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 15: [2022-11-24 17:06:02,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 30: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 9: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 30: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 9: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 2: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 2: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 2: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 2: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 2: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 20: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 0: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 22: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 0: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 28: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 28: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 25: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 21: [2022-11-24 17:06:02,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 2: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 25: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 21: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 19: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 0: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 0: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 0: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 23: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 24: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 9: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 9: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 5: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 21: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 6: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 6: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 2: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 21: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 2: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 2: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 14: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 2: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 6: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 30: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 30: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 10: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 10: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 10: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 10: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 21: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 21: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 30: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 28: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 19: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 19: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 19: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 27: [2022-11-24 17:06:02,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 10: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 10: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 26: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 23: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 13: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 6: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 31: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 31: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 19: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 10: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 14: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 5: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 6: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 10: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 31: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 31: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 18: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 14: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 14: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 14: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 10: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 29: [2022-11-24 17:06:02,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 6: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 10: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 14: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 31: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 8: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 15: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 25: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 31: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 0: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 9: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 16: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 16: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 16: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 9: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 16: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 16: [2022-11-24 17:06:02,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 16: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 9: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 3: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 9: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 3: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 18: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 18: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 18: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 18: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 18: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 18: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 5: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 5: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 5: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 1: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 26: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 5: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 25: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 25: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 25: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 15: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 29: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 25: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 25: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 3: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 2: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 2: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 20: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 13: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 16: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 16: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 3: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 24: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 24: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 13: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 27: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 27: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 27: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 24: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 24: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 27: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 27: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 24: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 17: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 27: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 31: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 8: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 31: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 21: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 7: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 20: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 7: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 10: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 8: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 3: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 31: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 13: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 14: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 8: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 31: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 21: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 14: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 27: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 24: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 4: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 25: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 13: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 24: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 7: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 11: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 27: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 21: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 7: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 5: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 7: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 12: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 7: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 29: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 26: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 10: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 5: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 29: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 31: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 5: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 8: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 9: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 31: [2022-11-24 17:06:02,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 21: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 3: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 7: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 7: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 3: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 31: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 24: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 9: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 3: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 3: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 1: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 15: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 18: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 7: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 1: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 18: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 1: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 29: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 23: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 29: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 1: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 1: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 3: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 8: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 28: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 1: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 20: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 28: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 22: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 8: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 1: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 30: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 29: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 7: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 20: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 30: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 22: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 6: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 16: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 16: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 12: [2022-11-24 17:06:02,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 26: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 8: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 25: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 17: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 8: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 13: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 11: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 14: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 22: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 8: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 13: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 19: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 22: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 13: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 5: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 19: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 15: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 22: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 5: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 21: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 21: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 19: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 3: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 19: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 30: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 21: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 15: [2022-11-24 17:06:02,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 28: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 14: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 6: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 11: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 11: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 5: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 29: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 6: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 26: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 14: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 21: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 11: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 11: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 8: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 11: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 8: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 11: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 18: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_01-model_00-model_states.pt. 3: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 18: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 17: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 11: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 31: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 3: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 14: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 9: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 11: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 0: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 14: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 29: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 31: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 14: [2022-11-24 17:06:02,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 9: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 25: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 29: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 26: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 14: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 31: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 21: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 2: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 2: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 31: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 10: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 0: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 13: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 13: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 2: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 28: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 26: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 14: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 14: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 14: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 24: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 13: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 19: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 2: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 14: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 14: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 30: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 21: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 29: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 23: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 19: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 24: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 14: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 19: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 8: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 30: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 19: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 21: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 14: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 19: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 29: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 31: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 14: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 21: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 19: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 19: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 2: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 19: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 8: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 28: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 21: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 26: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 19: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 20: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 19: [2022-11-24 17:06:02,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 4: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 9: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 19: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 20: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 31: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 19: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 10: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 16: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 28: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 1: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 1: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 1: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 1: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 2: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 9: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 10: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 17: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 1: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 15: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 20: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 30: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 22: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 13: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 1: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 19: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 20: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 30: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 9: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 3: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 3: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 31: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 1: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 22: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 13: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 19: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 22: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 9: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 31: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 1: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 22: [2022-11-24 17:06:02,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 9: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 3: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 1: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 2: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 5: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 22: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 9: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 1: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 11: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 1: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 13: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 26: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 1: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 3: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 1: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 2: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 13: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 5: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 3: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 1: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 17: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 6: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 10: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 23: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 1: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 22: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 18: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 17: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 26: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 1: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 8: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 6: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 3: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 14: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 26: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 30: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 19: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 20: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 26: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 8: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 12: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 26: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 20: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 29: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 20: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 31: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 14: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 12: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 8: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 14: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 27: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 14: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 13: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 13: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 18: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 0: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 26: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 27: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 12: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 18: [2022-11-24 17:06:02,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 5: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 19: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 28: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 28: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 25: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 12: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 1: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 13: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 5: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 7: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 15: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 19: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 19: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 13: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 13: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 19: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 13: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 5: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 10: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 13: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 15: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 16: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 13: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 27: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 13: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 14: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 13: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 14: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 14: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 1: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 10: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 21: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 21: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 6: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 6: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 21: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 21: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 25: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 6: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 6: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 10: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 21: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 15: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 16: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 25: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 21: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 1: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 19: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 16: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 0: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 20: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 16: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 28: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 21: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 1: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 20: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 28: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 31: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 15: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 16: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 21: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 10: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 28: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 31: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 20: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 3: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 28: [2022-11-24 17:06:02,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 14: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 6: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 28: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 6: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 17: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 13: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 1: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 26: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 20: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 22: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 30: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 17: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 23: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 29: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 22: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 10: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 25: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 28: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 29: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 29: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 8: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 3: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 3: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 27: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 3: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 13: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 16: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 16: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 26: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 26: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 1: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 27: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 9: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 22: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 8: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 16: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 25: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 20: [2022-11-24 17:06:02,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 10: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 9: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 9: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 29: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 15: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 20: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 24: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 9: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 9: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 19: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 28: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 9: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 19: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 20: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 22: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 9: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 23: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 25: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 30: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 9: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 20: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 21: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 27: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 20: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 20: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 30: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 9: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 22: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 18: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 9: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 2: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 22: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 18: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 30: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 3: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 2: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 24: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 24: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 3: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 18: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 30: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 2: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 30: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 2: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 23: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 1: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 5: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 26: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 2: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 10: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 5: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 2: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 1: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 9: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 10: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 9: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 29: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 30: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 6: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 18: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 0: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 26: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 6: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 9: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 11: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 17: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 6: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 14: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 4: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 9: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 11: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 9: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 20: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 2: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 29: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 9: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 2: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 10: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 28: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 21: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 21: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 5: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 6: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 2: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 22: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 24: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 2: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 18: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 21: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 21: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 2: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 24: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 17: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 13: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 24: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 21: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 3: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 2: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 17: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 27: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 27: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 24: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 21: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 13: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 3: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 24: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 28: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 23: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 14: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 15: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 20: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 21: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 13: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 13: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 0: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 22: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 21: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 14: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 2: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 5: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 3: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 10: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 2: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 21: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 7: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 6: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 15: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 30: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 5: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 6: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 2: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 2: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 15: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 30: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 5: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 31: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 1: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 12: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 30: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 5: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 23: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 28: [2022-11-24 17:06:02,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 14: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 14: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 28: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 24: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 12: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 15: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 28: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 26: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 27: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 0: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 19: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 15: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 21: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 27: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 8: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 0: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 10: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 15: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 10: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 27: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 10: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 15: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 20: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 10: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 9: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 29: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 10: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 25: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 25: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 1: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 15: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 10: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 31: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 25: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 16: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 1: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 16: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 5: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 26: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 27: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 9: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 19: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 15: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 27: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 9: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 29: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 14: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 25: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 29: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 14: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 1: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 3: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 14: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 19: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 18: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 9: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 7: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 16: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 18: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 30: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 19: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 16: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 25: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 9: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 17: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 27: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 22: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 9: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 14: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 8: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 25: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 27: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 26: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 10: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 4: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 25: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 25: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 26: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 2: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 18: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 27: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 22: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 4: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 29: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 11: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 29: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 27: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 12: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 1: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 2: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 30: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 20: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 22: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 18: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 3: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 8: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 17: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 0: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 7: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 29: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 3: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 2: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 20: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 22: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 16: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 16: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 7: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 29: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 3: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 22: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 22: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 5: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 3: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 14: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 16: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 9: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 5: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 7: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 16: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 7: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 16: [2022-11-24 17:06:02,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 21: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 21: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 21: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 3: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 4: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 30: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 7: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 6: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 1: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 16: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 19: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 25: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 15: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 25: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 19: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 28: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 0: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 19: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 2: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 26: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 3: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 28: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 29: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 26: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 1: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 19: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 29: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 2: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 28: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 0: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 9: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 7: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 0: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 5: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 1: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 15: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 21: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 7: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 12: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 6: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 24: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 16: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 12: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 26: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 14: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 26: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 15: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 24: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 14: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 18: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 30: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 30: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 9: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 5: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 6: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 16: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 18: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 25: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 30: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 0: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 6: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 3: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 15: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 24: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 9: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 10: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 16: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 18: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 24: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 16: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 5: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 7: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 20: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 24: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 10: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 8: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 8: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 0: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 24: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 16: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 18: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 7: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 29: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 20: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 30: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 13: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 2: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 20: [2022-11-24 17:06:02,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 27: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 2: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 22: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 27: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 13: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 6: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 10: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 13: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 27: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 25: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 13: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 23: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 2: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 24: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 11: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 11: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 19: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 15: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 27: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 13: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 13: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 2: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 24: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 13: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 11: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 31: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 19: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 15: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 27: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 24: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 13: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 0: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 23: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 1: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 2: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 27: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 8: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 17: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 0: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 11: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 3: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 24: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 14: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 14: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 27: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 24: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 30: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 21: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 0: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 11: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 14: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 14: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 1: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 19: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 24: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 17: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 0: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 29: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 11: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 27: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 18: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 25: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 25: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 0: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 14: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 1: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 19: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 24: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 5: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 27: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 24: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 18: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 25: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 24: [2022-11-24 17:06:02,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 18: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 11: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 24: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 18: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 4: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 18: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 11: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 27: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 4: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 11: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 4: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 18: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 10: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 4: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 1: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 10: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 31: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 1: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 15: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 19: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 10: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 25: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 29: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 29: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 19: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 24: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 18: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 9: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 1: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 5: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 29: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 18: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 31: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 1: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 29: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 3: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 29: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 31: [2022-11-24 17:06:02,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 7: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 26: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 14: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 28: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 7: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 29: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 23: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 26: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 20: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 22: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 14: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 14: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 14: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 19: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 19: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 27: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 18: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 26: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 14: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 19: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 25: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 25: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 30: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 9: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 17: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 29: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 26: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 3: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 14: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 14: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 14: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 19: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 27: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 22: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 10: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 26: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 14: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 19: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 22: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 30: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 9: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 17: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 7: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 19: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 15: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 27: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 10: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 26: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 26: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 19: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 20: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 19: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 13: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 20: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 21: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 13: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 26: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 31: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 28: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 21: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 26: [2022-11-24 17:06:02,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 19: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 27: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 13: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 22: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 28: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 22: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 22: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 4: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 18: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 6: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 1: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 21: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 23: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 17: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 29: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 11: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 21: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 7: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 23: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 22: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 24: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 9: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 17: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 21: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 7: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 27: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 1: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 24: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 9: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 17: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 21: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 11: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 3: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 16: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 18: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 1: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 4: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 25: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 17: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 13: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 7: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 11: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 6: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 1: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 4: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 30: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 17: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 7: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 11: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 9: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 13: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 30: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 17: [2022-11-24 17:06:02,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 12: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 17: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 1: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 12: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 3: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 3: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 7: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 12: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 3: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 1: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 27: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 16: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 7: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 3: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 4: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 26: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 3: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 3: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 14: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 1: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 2: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 22: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 30: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 0: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 7: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 23: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 3: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 14: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 1: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 15: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 22: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 7: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 3: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 1: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 20: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 20: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 9: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 7: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 12: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 18: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 12: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 4: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 16: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 9: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 0: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 5: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 12: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 31: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 1: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 19: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 2: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 4: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 9: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 12: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 4: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 0: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 5: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 19: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 2: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 4: [2022-11-24 17:06:02,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 9: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 29: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 3: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 2: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 16: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 26: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 14: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 2: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 18: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 0: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 29: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 26: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 3: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 2: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 16: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 25: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 0: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 12: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 2: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 27: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 22: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 17: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 17: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 21: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 7: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 29: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 9: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 17: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 17: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 28: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 13: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 7: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 29: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 28: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 21: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 13: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 20: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 13: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 19: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 27: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 20: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 8: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 3: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 31: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 14: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 30: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 21: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 0: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 5: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 29: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 3: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 10: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 8: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 13: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 13: [2022-11-24 17:06:02,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 0: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 5: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 20: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 9: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 21: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 1: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 10: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 4: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 3: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 2: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 22: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 24: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 18: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 2: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 4: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 13: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 6: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 3: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 27: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 24: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 13: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 3: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 24: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 2: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 27: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 18: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 3: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 30: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 29: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 19: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 19: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 24: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 25: [2022-11-24 17:06:02,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 11: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 25: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 26: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 19: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 30: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 17: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 13: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 5: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 14: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 14: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 15: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 25: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 30: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 21: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 27: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 24: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 21: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 7: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 17: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 21: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 24: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 21: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 12: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 9: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 26: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 31: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 14: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 28: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 9: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 16: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 18: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 28: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 1: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 15: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 16: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 18: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 12: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 23: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 1: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 15: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 30: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 11: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 23: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 15: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 17: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 21: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 29: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 11: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 23: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 4: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 18: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 17: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 0: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 7: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 29: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 23: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 3: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 2: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 15: [2022-11-24 17:06:02,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 9: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 17: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 21: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 11: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 15: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 18: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 9: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 17: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 28: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 13: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 31: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 10: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 4: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 11: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 31: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 2: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 9: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 17: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 13: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 5: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 15: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 27: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 22: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 4: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 25: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 17: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 6: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 15: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 27: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 22: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 20: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 22: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 17: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 20: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 5: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 11: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 3: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 20: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 5: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 31: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 14: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 1: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 24: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 28: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 7: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 14: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 20: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 24: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 9: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 12: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 29: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 3: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 3: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 3: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 3: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 3: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 20: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 22: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 28: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 26: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 31: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 20: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 7: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 29: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 29: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 20: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 20: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 10: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 4: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 25: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 30: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 28: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 7: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 29: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 31: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 19: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 20: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 4: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 4: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 25: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 30: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 12: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 29: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 11: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 6: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 3: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 19: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 20: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 24: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 8: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 18: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 28: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 7: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 19: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 20: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 24: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 4: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 4: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 30: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 21: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 7: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 11: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 31: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 31: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 4: [2022-11-24 17:06:02,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 18: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 21: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 12: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 26: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 13: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 29: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 27: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 16: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 30: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 29: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 29: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 16: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 21: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 5: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 29: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 14: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 1: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 19: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 18: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 29: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 4: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 21: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 5: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 29: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 11: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 14: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 22: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 5: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 29: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 19: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 15: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 27: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 21: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 31: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 1: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 19: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 22: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 21: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 13: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 11: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt... 6: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 1: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 15: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 10: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 5: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 11: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 27: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 9: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 17: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 21: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 6: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 10: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 9: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 13: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 0: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 2: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 16: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 17: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 2: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 4: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 9: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 3: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 2: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 16: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 9: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 17: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 21: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 13: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 7: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 23: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 2: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 9: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 17: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 21: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 21: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 5: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 23: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 27: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 8: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 17: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 13: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 13: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 0: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 6: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 10: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 16: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 17: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 21: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 0: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 12: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 20: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 8: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 9: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 28: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 31: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 27: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 22: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 30: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 17: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 31: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 15: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 9: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 17: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 0: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 18: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 30: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 9: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 0: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 3: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 31: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 2: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 2: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 22: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 10: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 30: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 17: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 31: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 1: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 2: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 22: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 22: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 24: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 10: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 25: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 30: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 9: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 9: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 9: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 28: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 28: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 31: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 1: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 2: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 10: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 9: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 28: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 7: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 6: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 1: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 27: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 22: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 10: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 4: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 30: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 28: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 0: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 1: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 22: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 10: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 4: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 9: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 4: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 16: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 13: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 3: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 1: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 1: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 10: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 4: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 9: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 28: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 12: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 23: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 23: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 1: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 20: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 10: [2022-11-24 17:06:02,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 28: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 13: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 12: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 23: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 23: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 3: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 14: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 27: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 27: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 10: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 25: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 3: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 3: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 24: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 30: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 14: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 27: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 27: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 24: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 4: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 11: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 26: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 14: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 24: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 26: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 2: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 15: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 25: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 0: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 12: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 19: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 2: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 24: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 18: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 12: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 1: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 25: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 30: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 7: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 24: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 18: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 20: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 18: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 30: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 28: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 9: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 13: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 30: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 19: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 4: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 4: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 13: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 15: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 10: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 21: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 12: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 15: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 0: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 15: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 5: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 19: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 2: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 21: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 21: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 5: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 19: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 15: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 18: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 13: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 5: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 15: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 21: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 5: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 5: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 2: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 15: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 9: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 17: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 21: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 7: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 11: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 27: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 20: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 5: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 31: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 19: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 2: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 15: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 22: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 16: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 9: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 17: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 21: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 13: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 31: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 2: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 22: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 24: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 8: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 25: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 9: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 0: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 31: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 19: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 2: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 22: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 22: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 25: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 13: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 0: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 23: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 23: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 14: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 19: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 2: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 2: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 22: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 4: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 6: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 23: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 23: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 31: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 14: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 2: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 22: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 10: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 8: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 25: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 25: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 31: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 14: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 8: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 25: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 3: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 14: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 22: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 24: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 17: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 3: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 14: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 15: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 22: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 22: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 4: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 8: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 25: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 7: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 6: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 6: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 23: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 31: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 1: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 20: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 24: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 16: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 25: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 9: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 17: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 28: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 23: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 31: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 20: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 24: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 17: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 6: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 23: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 14: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 24: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 4: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 17: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 28: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 12: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 23: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 14: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 1: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 4: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 25: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 28: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 3: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 19: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 19: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 2: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 15: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 28: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 3: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 19: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 25: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 30: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 17: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 14: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 10: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 10: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 30: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 17: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 17: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 10: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 10: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 18: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 6: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 15: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 4: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 30: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 17: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 4: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 18: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 17: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 9: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 17: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 18: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 7: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 1: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 17: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 2: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 18: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 7: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 18: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 25: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 17: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 20: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 18: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 1: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 15: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 27: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 11: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_03-model_00-model_states.pt. 20: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 9: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 28: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 27: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 27: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 7: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 9: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 11: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 27: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 7: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 2: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 27: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 10: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 9: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 19: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 27: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 30: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 16: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 28: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 13: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 5: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 12: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 9: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 13: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 5: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 23: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 31: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 19: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 22: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 16: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 17: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 28: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 5: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 31: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 1: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 27: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 27: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 20: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 4: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 18: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 13: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 5: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 31: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 1: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 27: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 27: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 20: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 24: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 16: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 18: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 3: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 3: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 20: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 24: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 8: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 25: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 13: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 0: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 12: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 3: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 31: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 1: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 2: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 2: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 27: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 20: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 16: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 17: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 13: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 3: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 3: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 3: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 31: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 2: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 15: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 27: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 20: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 22: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 22: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 17: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 21: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 13: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 13: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 0: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 3: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 20: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 4: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 8: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 25: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 13: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 19: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 15: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 20: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 10: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 21: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 12: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 11: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 22: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 4: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 30: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 21: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 21: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 12: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 22: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 10: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 4: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 30: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 17: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 21: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 12: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 14: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 22: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 28: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 21: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 14: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 22: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 25: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 21: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 5: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 26: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 22: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 4: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 12: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 23: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 14: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 24: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 25: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 17: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 21: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 21: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 13: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 5: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 7: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 12: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 23: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 14: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 1: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 30: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 13: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 5: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 7: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 24: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 13: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 5: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 14: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 30: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 9: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 17: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 28: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 13: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 12: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 14: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 19: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 15: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 16: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 17: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 14: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 19: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 10: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 31: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 10: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 4: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 27: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 25: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 17: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 24: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 24: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 24: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 0: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 7: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 6: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 15: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 4: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 4: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 27: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 24: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 10: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 9: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 9: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 5: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 7: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 12: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 15: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 24: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 24: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 4: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 4: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 0: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 5: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 7: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 26: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 27: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 24: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 8: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 30: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 9: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 9: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 21: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 7: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 15: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 24: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 9: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 21: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 5: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 19: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 27: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 4: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 9: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 15: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 24: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 25: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 9: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 15: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 9: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 19: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 2: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 27: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 31: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 9: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 13: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 2: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 21: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 26: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 22: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 10: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 4: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 9: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 28: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 21: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 0: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 11: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 2: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 27: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 22: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 24: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 10: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 8: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 9: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 3: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 19: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 2: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 20: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 22: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 25: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 30: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 28: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 0: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 26: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 3: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 19: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 2: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 2: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 20: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 30: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 28: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 28: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 0: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 19: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 10: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 13: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 15: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 15: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 25: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 13: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 1: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 19: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 13: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 19: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 2: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 4: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 8: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 25: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 17: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 28: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 13: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 1: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 19: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 22: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 4: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 8: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 17: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 13: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 5: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 5: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 26: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 19: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 20: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 10: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 17: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 13: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 5: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 12: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 6: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 23: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 26: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 20: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 4: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 18: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 30: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 17: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 28: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 21: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 13: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 12: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 14: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 19: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 20: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 20: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 24: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 10: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 4: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 18: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 17: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 28: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 21: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 13: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 26: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 1: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 2: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 20: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 22: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 24: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 10: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 16: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 18: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 18: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 21: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 0: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 12: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 26: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 14: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 1: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 22: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 10: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 4: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 30: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 21: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 21: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 0: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 14: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 1: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 15: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 27: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 10: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 18: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 25: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 17: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 0: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 7: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 31: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 15: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 20: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 4: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 16: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 18: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 18: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 30: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 0: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 12: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 3: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 3: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 3: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 31: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 14: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 27: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 20: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 8: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 18: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 25: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 30: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 17: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 0: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 13: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 12: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 19: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 15: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 20: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 9: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 17: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 28: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 1: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 20: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 30: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 5: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 5: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 23: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 23: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 23: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 31: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 15: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 5: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 7: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 11: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 7: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 15: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 5: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 5: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 22: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 7: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 14: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 7: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 14: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 14: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 14: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 7: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 1: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 7: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 31: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 2: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 20: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 7: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 1: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 0: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 0: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 6: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 26: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 31: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 12: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 19: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 7: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 5: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 31: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 31: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 26: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 3: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 5: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 7: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 5: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 7: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 0: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 12: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 12: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 6: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 6: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 23: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 6: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 23: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 23: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 26: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 23: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 26: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 12: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 3: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 12: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 23: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 23: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 3: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 31: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 23: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 3: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 7: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 23: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 31: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 3: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 3: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 31: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 31: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 14: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 31: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 14: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 0: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 26: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 14: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 1: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 14: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 14: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 1: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 14: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 14: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 14: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 0: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 12: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 5: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 6: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 3: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 7: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 3: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 7: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 14: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 19: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 2: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 1: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 19: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 19: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 2: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 26: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 3: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 26: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 31: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 1: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 12: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 19: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 12: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 12: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 5: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 7: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 5: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 31: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 31: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 19: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 0: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 12: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 12: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 11: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 5: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 5: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 12: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 23: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 5: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 12: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 6: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 5: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 6: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 23: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 0: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 5: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 11: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 23: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 0: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 11: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 0: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 31: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 14: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 23: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 14: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 0: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 26: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 31: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 26: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 14: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 14: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 26: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 26: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 14: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 12: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 3: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 29: [2022-11-24 17:06:02,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 12: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 29: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 29: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 7: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 7: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 29: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 29: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 29: [2022-11-24 17:06:02,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 6: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 29: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 12: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 29: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 29: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 1: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 0: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 12: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 29: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 29: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 6: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 29: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 12: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 29: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 29: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 29: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 29: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 5: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 7: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 5: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 29: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 29: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 29: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 29: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 0: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 7: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 29: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 5: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 29: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 23: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 26: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 29: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 29: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 12: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 29: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt... 29: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 29: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 0: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 29: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 5: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 29: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 7: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 23: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 0: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 29: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 7: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 5: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 6: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 23: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 23: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 26: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 3: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 3: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 31: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 26: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 7: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 0: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 7: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 7: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 12: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 5: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 5: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 5: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 29: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 5: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 26: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 3: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 5: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 11: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 12: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 31: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 31: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 3: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 31: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 31: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 31: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 26: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 3: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 3: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 31: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 31: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 3: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 31: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 31: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 3: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 3: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 14: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 7: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 23: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 23: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 31: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 3: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 31: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 12: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 0: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 12: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 7: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 23: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 26: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 31: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 23: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 31: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 5: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 29: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 5: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 0: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 29: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 11: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 23: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 26: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 5: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 23: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 23: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 26: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 29: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 26: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 5: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 0: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 23: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 3: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 5: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 7: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 29: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 11: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 23: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 26: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 26: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 31: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 5: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 26: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 31: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 31: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 26: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 31: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 14: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 14: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 12: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 14: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 12: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 14: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 12: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 26: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 1: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 12: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 14: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 0: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 26: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 14: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 14: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 3: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 1: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 14: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 12: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 14: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 1: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 14: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 14: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 1: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 1: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 14: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 1: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 14: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 5: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 12: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 23: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 14: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 14: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 3: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 14: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 31: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 1: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 5: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 6: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 31: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 7: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 23: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 7: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 6: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 31: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 1: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 6: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 5: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 26: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 7: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 5: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 12: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 7: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 26: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 29: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 31: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 12: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 6: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 31: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 5: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 29: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 6: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 14: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 5: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 29: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 7: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 7: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 6: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 12: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 12: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 26: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 26: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 7: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 7: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 7: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 6: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 7: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 23: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 7: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 26: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 31: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 31: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 26: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 14: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 26: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 31: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 31: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 26: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 19: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 19: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 0: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 12: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 29: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 14: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 29: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 14: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 1: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 19: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 29: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 19: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 19: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 0: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 7: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 6: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 23: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 19: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 23: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 12: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 5: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 5: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 3: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 6: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 6: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 31: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 19: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 23: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 26: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 23: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 23: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 3: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 5: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 29: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 5: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 23: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 5: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 29: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 5: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 5: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 0: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 11: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 23: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 12: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 23: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 0: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 29: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 26: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 3: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 6: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 5: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 12: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 26: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 7: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 12: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 26: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 26: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 12: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 12: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 3: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 3: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 31: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 12: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 23: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 29: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 5: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 5: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 26: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 12: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 3: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 26: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 3: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 3: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 26: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 31: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 31: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 3: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 31: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 7: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 6: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 6: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 14: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 1: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 14: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 6: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 14: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 7: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 14: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 14: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 19: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 1: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 19: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 31: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 14: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 11: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 14: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 11: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 19: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 1: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 19: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 19: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 0: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 5: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 23: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 26: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 14: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 23: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 14: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 23: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 23: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 19: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 23: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 29: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 23: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 6: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 6: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 6: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 23: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 11: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 6: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 5: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 23: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 5: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 23: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 31: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 5: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 11: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 31: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 14: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 26: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 1: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 7: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 5: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 7: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 26: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 12: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 31: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 12: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 31: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 12: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 12: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 26: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 12: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 23: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 12: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 3: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 12: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 23: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 3: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 3: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 3: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 3: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 31: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 31: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 31: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 31: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 26: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 14: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 14: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 14: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 26: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 31: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 26: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 0: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 5: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 11: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 1: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 1: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 19: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 1: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 1: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 19: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 1: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 19: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 19: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 1: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 19: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 29: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 7: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 14: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 1: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 19: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 14: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 5: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 7: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 5: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 29: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 5: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 6: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 0: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 6: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 6: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 23: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 23: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 5: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 7: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 7: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 5: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 23: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 12: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 6: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 11: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 23: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 29: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 6: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 6: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 26: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 6: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 26: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 6: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 26: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 6: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 3: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 3: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 6: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 31: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 31: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 5: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 12: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 5: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 5: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 26: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 29: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 29: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 26: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 5: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 29: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 29: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 29: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 26: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 31: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 14: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 7: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 12: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 7: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 23: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 23: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 26: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 26: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 26: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 3: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 5: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 26: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 26: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 3: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 26: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 3: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 3: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 3: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 31: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 3: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 5: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 12: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 5: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 14: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 31: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 14: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 14: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 31: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 31: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 12: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 19: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 29: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 6: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 0: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 0: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 0: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 0: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 29: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 29: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 12: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 11: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 23: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 0: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 29: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 23: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 0: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 26: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 3: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 11: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 26: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 11: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 6: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 29: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 23: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 26: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 12: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 12: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 23: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 23: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 3: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 23: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 31: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 14: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 31: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 29: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 31: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 14: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 5: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 14: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 26: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 29: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 31: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 14: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 5: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 1: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 31: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 1: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 1: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 5: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 31: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 23: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 6: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 26: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 26: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 5: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 23: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 1: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 19: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 5: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 26: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 5: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 5: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 6: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 14: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 5: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 11: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 1: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 31: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 14: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 14: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 31: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 6: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 19: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 12: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 6: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 26: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 3: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 1: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 3: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 31: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 11: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 12: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 12: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 14: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 14: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 23: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 1: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 5: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 14: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 5: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 1: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 1: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 19: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 14: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 0: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 1: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 23: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 26: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 5: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 5: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 23: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 12: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 6: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 12: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 12: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 12: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 23: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 12: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 11: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 6: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 12: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 23: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 29: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 26: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 5: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 12: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 12: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 23: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 7: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 26: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 26: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 12: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 12: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 29: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 5: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 7: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 6: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 23: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 7: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 6: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 5: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 7: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 29: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 31: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 7: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 5: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 31: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 5: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 7: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 26: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 5: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 31: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 23: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 31: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 7: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 0: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 29: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 6: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 12: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 12: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 29: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 29: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 6: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 12: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 6: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 23: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 6: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 23: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 26: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 23: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 3: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 31: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 12: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 26: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 26: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 6: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 31: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 23: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 26: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 14: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 0: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 5: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 29: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 29: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 11: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 23: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 31: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 31: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 14: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 5: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 12: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 5: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 7: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 29: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 23: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 5: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 12: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 29: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 23: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 3: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 31: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 3: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 3: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 11: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 3: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 6: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 31: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 29: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 31: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 14: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 29: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 14: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 14: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 1: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 26: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 14: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 14: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 14: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 12: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 26: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 29: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 26: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 0: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 5: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 26: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 5: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 26: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 14: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 26: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 14: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 6: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 11: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 14: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 1: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 11: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 23: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 11: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 31: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 26: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 26: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 14: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 0: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 6: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 26: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 12: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 0: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 29: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 0: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 29: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 0: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 29: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 6: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 26: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 6: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 6: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 23: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 5: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 29: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 29: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 23: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 5: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 29: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 11: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 6: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 7: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 29: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 11: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 6: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 3: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 12: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 6: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 23: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 3: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 6: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 23: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 3: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 23: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 3: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 12: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 29: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 26: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 7: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 3: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 3: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 14: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 3: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 31: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 5: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 3: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 26: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 31: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 5: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 14: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 5: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 31: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 5: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 5: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 12: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 6: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 7: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 12: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 26: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 12: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 14: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 11: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 26: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 5: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 14: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 23: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 0: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 14: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 14: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 1: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 23: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 1: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 14: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 14: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 1: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 19: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 0: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 26: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 11: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 31: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 26: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 0: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 5: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 5: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 12: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 5: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 26: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 31: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 29: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 26: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 5: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 29: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 5: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 5: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 5: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 23: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 3: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 29: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 31: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 29: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 23: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 29: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 3: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 31: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 23: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 1: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 31: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 1: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 7: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 7: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 6: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 23: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 29: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 6: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 0: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 23: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 12: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 12: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 23: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 6: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 26: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 23: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 23: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 29: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 11: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 11: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 14: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 6: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 14: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 14: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 1: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 12: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 6: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 23: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 6: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 11: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 6: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 26: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 12: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 14: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 26: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 0: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 7: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 26: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 6: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 26: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 12: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 6: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 0: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 6: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 31: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 11: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 26: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 14: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 1: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 1: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 19: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 1: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 19: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 11: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 31: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 31: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 14: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 3: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 31: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 31: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 3: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 14: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 3: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 11: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 3: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 5: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 7: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 0: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 12: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 5: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 12: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 31: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 1: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 19: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 3: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 5: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 29: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 5: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 31: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 5: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 23: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 1: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 23: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 23: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 31: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 26: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 1: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 23: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 26: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 26: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 23: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 26: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 26: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 0: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 5: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 23: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 26: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 0: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 3: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 5: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 12: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 29: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 23: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 29: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 23: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 23: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 26: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 29: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 3: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 0: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 29: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 0: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 29: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 6: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 3: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 31: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 6: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 14: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 12: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 6: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 6: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 31: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 0: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 29: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 6: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 23: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 31: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 0: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 7: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 12: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 12: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 29: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 12: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 29: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 11: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 12: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 6: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 11: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 26: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 26: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 5: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 12: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 26: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 5: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 11: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 31: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 7: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 0: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 29: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 11: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 5: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 29: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 6: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 26: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 7: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 26: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 7: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 31: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 7: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 31: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 26: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 31: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 31: [2022-11-24 17:06:02,795] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 248 12: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 12: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 3: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 0: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 12: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 29: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 14: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 29: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 3: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 0: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 31: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 31: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 0: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 3: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 3: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 14: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 26: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 3: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 23: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 26: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 29: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 11: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 5: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 11: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 23: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 3: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 23: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 11: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 12: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 23: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 7: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 23: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 7: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 7: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 29: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 29: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 12: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 6: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 23: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 26: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 12: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 23: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 3: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 7: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 12: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 5: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 11: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 7: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 6: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 6: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 23: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 26: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 23: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,797] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 213 31: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 12: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 14: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 12: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 6: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 6: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 31: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 5: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 6: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 5: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 6: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 12: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 11: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 7: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 0: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 12: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 11: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 5: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 29: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 6: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 26: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 26: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 6: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 31: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 31: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 14: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 1: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 12: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 29: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 14: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 6: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 26: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 7: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 6: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 6: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 26: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 3: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 14: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 6: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 14: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 7: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 26: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 0: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 5: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 29: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 0: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 23: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 29: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 6: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 23: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 6: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 26: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 0: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 26: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 0: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 23: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 5: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 26: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 23: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 26: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 5: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 5: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 26: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 3: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 0: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 5: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 3: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 31: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 3: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 0: > using checkpoint value 0.0002 for learning rate 3: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 0: > using checkpoint value 2e-05 for minimum learning rate 3: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 0: > using checkpoint value 0 for warmup iterations 12: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 0: > using checkpoint value 9703701 for total number of iterations 3: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 0: > using checkpoint value cosine for decay style 12: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 0: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 12: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 3: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 12: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 6: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 11: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 23: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 23: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 12: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 29: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 6: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 5: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 6: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 12: [2022-11-24 17:06:02,801] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 99 6: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 11: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 23: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 23: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 23: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 26: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 5: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 12: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 26: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 3: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 26: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 3: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 3: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 3: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 3: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 31: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 5: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 31: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 14: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 5: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 31: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 11: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 31: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 0: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 31: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 29: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 11: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,802] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 41 29: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 6: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 31: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 14: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 14: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 1: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 31: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 14: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 14: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 14: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 14: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 1: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 7: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 3: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 3: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 0: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 7: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 6: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 0: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 5: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 0: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 11: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 5: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 23: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 0: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 3: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 23: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 3: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 3: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 23: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 23: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 3: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 5: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 23: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 23: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 12: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 12: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 0: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 6: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 12: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 29: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 12: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 29: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 5: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 7: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 12: [2022-11-24 17:06:02,804] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 96 12: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 29: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,804] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 98 6: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 23: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 26: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 6: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 6: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 23: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 26: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 3: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 5: [2022-11-24 17:06:02,805] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 44 11: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,805] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 54 26: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 29: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 11: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 3: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 29: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 26: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 11: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:06:02,805] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 50 6: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 26: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 26: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 6: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 26: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 26: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 6: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 11: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 29: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 3: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 3: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 29: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 3: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 3: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 3: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 14: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 3: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 3: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 31: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 3: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 29: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 29: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 3: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 0: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 26: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 3: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 23: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 0: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:06:02,806] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 208 23: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 26: [2022-11-24 17:06:02,806] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 209 3: [2022-11-24 17:06:02,807] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 25 31: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 3: [2022-11-24 17:06:02,807] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 24 12: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 14: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 5: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 31: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 1: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 0: [2022-11-24 17:06:02,807] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 0 5: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 6: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 29: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 7: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 5: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 29: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 11: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 23: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 11: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 7: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 6: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 7: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 23: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 3: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:06:02,808] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 92 23: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 23: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,806] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 251 7: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 23: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 7: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 12: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 7: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 26: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 7: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 7: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 12: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 29: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 23: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 23: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 23: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 26: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 23: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 31: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 31: [2022-11-24 17:06:02,807] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 253 12: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 14: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 12: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 23: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 31: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 0: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 11: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 23: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 29: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 23: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 6: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 26: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 23: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 3: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 6: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 31: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 14: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 26: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 31: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 31: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 14: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 26: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 3: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 31: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,809] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 5 26: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 3: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 14: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 31: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 14: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 5: [2022-11-24 17:06:02,810] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 41 7: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 7: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 11: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 3: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 3: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 11: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 14: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 14: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 1: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 11: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 14: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 11: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 14: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 1: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 19: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 14: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 0: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 29: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 23: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 3: [2022-11-24 17:06:02,810] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 29 14: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 3: [2022-11-24 17:06:02,810] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 31 14: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 29: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 5: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 5: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 14: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 29: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 14: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 5: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 14: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 14: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 1: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 23: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 31: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 29: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 31: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 29: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 23: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 14: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 1: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 29: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 14: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 29: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 12: [2022-11-24 17:06:02,811] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 101 14: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 14: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 29: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 31: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 29: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 26: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 1: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 5: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 6: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 23: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 6: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 26: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 11: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 29: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 29: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 6: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 29: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 11: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 29: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 11: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 14: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 0: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 12: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 29: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 11: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 6: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 29: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 6: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 29: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 6: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 29: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 23: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 26: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 29: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 23: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 26: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 26: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 26: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 3: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 29: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 3: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 26: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 3: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 3: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 31: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 14: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 14: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... 14: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 23: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 14: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 1: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 26: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 14: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 1: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 19: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 19: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 7: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 26: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 1: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 1: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 1: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 23: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 14: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 7: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 23: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 29: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 29: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 7: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 0: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 31: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 1: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 19: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 1: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 19: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 6: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 31: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 23: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 31: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 7: [2022-11-24 17:06:02,815] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 59 31: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 5: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 6: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 19: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 19: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 14: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 31: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 14: [2022-11-24 17:06:02,815] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 113 1: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 1: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 12: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 23: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 5: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 7: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 5: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 7: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 6: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 6: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 23: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 23: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 31: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 14: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 5: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 23: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 31: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 7: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 12: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 11: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 23: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 23: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 29: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:06:02,815] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 187 7: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 11: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 7: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,815] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 190 23: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 26: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 26: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 31: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 12: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 26: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 26: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 31: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 11: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 6: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 6: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 14: [2022-11-24 17:06:02,815] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 112 14: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 1: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 14: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 1: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 31: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 14: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 14: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 1: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 31: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 7: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 1: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 26: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 3: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 1: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 1: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 19: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 3: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 1: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 29: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 3: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 1: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 19: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 1: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 29: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 1: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 26: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 1: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 19: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 19: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 19: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 1: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 1: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 23: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 14: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 14: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 14: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 14: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 1: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 1: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 23: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 29: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt. 31: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 7: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 29: [2022-11-24 17:06:02,818] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 237 29: [2022-11-24 17:06:02,818] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 235 1: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 1: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 19: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 0: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 5: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 6: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 19: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 29: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 6: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 23: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 29: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 29: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 23: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 29: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 11: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 1: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 29: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 6: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 6: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 23: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 12: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 6: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 6: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 11: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 26: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 6: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 26: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 31: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 11: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 6: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 6: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 1: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 7: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 11: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,820] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 233 29: [2022-11-24 17:06:02,820] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 239 11: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 7: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 12: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 11: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 23: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 31: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 31: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 14: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... 14: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 12: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 6: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 1: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 19: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 12: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 1: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 19: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 1: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 19: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 6: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 1: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 1: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 19: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 7: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 1: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 1: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 19: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,810] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 8 19: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 7: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 23: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 1: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 3: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 1: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 5: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 1: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 5: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 1: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt. 26: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 26: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 3: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 11: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 14: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 0: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 14: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 23: [2022-11-24 17:06:02,821] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 186 14: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 0: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,814] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 14 23: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 31: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 0: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 23: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 26: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 19: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 23: [2022-11-24 17:06:02,822] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 185 26: [2022-11-24 17:06:02,822] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 215 1: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 19: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 7: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 23: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 23: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 11: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 23: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 23: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 7: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 26: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 26: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 31: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 31: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 0: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 5: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 5: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 7: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 0: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 5: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 5: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 0: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 29: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 11: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 23: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 23: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 29: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 26: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 11: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 31: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 11: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 26: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 23: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt. 26: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 0: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 6: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 12: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 23: [2022-11-24 17:06:02,823] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 184 12: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 6: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 26: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 31: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. 26: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 31: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 26: [2022-11-24 17:06:02,823] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 211 14: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 12: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 11: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 6: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 12: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 31: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 31: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 31: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 31: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt. 14: [2022-11-24 17:06:02,823] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 116 26: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 31: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 31: [2022-11-24 17:06:02,823] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 255 31: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 14: [2022-11-24 17:06:02,823] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 114 31: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 1: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 1: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 1: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 19: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 19: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 19: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 12: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 3: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 5: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 29: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 29: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 6: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 3: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 31: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 3: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 3: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 31: [2022-11-24 17:06:02,824] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 254 11: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 6: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 14: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 6: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 5: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 14: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 1: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 1: [2022-11-24 17:06:02,816] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 10 19: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 19: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,816] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 11 19: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 19: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 19: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 1: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 19: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 5: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 14: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 1: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 14: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 1: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 11: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 14: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 14: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 1: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 5: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 14: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 1: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 5: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 14: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 14: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 14: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 1: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 6: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 19: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 29: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 1: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 1: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 19: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 2: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 11: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 1: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 19: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 23: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 29: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 23: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 6: [2022-11-24 17:06:02,826] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 49 6: [2022-11-24 17:06:02,826] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 55 29: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 26: [2022-11-24 17:06:02,826] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 213 7: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 29: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 29: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 26: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 11: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 29: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 7: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 29: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 29: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 29: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 26: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 26: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 26: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 6: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 29: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 31: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 11: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 19: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 19: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 2: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 26: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 19: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 19: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 2: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 19: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 2: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 19: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 2: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 12: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 7: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 12: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 19: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 19: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 19: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 2: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 19: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 2: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 19: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 2: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 19: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 2: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 2: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 19: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 2: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 19: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 2: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 3: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 3: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 6: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 6: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 3: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 3: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 14: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 19: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 2: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 5: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 3: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 19: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 3: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 31: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 5: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 23: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt. 3: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 31: [2022-11-24 17:06:02,829] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 250 14: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 19: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 23: [2022-11-24 17:06:02,829] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 189 31: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 5: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 31: [2022-11-24 17:06:02,829] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 249 6: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 14: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 2: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 5: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 11: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 6: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 19: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 19: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 2: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 5: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 5: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 19: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 19: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 5: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 14: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 11: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 14: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 0: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 5: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 14: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 1: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 7: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 11: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 5: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 19: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 2: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 2: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 19: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 2: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 19: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 2: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 23: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 26: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 23: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 19: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 19: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 1: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 19: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 0: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,831] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 90 0: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 19: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 12: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 19: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 19: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 19: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 2: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 19: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 29: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 19: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 29: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 19: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 2: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 11: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 19: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 19: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 2: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 19: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 19: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 2: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 19: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 19: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 2: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 19: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 7: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 19: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 19: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 19: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 2: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 6: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 19: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 6: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 19: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 2: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 11: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 19: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 11: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 3: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 2: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 14: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 12: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 3: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 14: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 12: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 3: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 12: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 26: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt. 14: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 12: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 14: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 12: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 12: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 6: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 26: [2022-11-24 17:06:02,833] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 210 14: [2022-11-24 17:06:02,833] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 117 12: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 6: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 12: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 14: [2022-11-24 17:06:02,833] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 115 7: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 19: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 23: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,833] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 119 29: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 31: [2022-11-24 17:06:02,833] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 255 0: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 12: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 29: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 23: [2022-11-24 17:06:02,833] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 188 5: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 1: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 5: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 19: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 5: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 5: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 1: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 0: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 5: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 5: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 1: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 0: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 12: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 7: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 6: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 1: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 12: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 1: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 1: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 19: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 12: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 29: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 0: [2022-11-24 17:06:02,835] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 0 29: [2022-11-24 17:06:02,835] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 232 29: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 0: checkpoint version 3.0 14: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 29: [2022-11-24 17:06:02,835] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 238 12: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 1: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 29: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 19: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 2: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 19: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 29: [2022-11-24 17:06:02,835] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 234 19: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 19: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 0: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 19: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 7: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 14: [2022-11-24 17:06:02,835] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 118 19: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 2: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 19: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 19: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 7: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 12: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 3: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 19: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 2: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,836] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 92 11: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 3: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 3: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 19: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 3: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 19: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 19: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 19: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 3: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 19: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 2: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 3: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 3: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 19: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 11: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 3: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 3: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 19: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 3: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 3: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 19: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 5: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 3: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 19: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 19: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 2: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 3: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 29: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 3: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 2: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 19: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 2: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 19: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 2: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 6: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 19: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 19: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 19: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 2: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 19: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 12: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:06:02,837] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 42 11: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 19: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 2: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 7: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 19: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 19: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 19: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 19: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 2: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 2: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 7: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 6: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 19: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 7: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 6: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,807] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 159 2: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 19: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 7: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 19: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 19: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 12: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 7: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 5: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 19: [2022-11-24 17:06:02,810] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 156 5: [2022-11-24 17:06:02,838] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 45 19: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 19: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 19: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 2: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 1: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 2: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 19: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 2: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 12: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 19: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 5: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 19: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 19: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 2: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 11: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 19: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 19: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 2: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 19: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 2: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 19: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 2: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 19: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 19: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 2: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 0: [2022-11-24 17:06:02,839] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 5 12: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 12: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 19: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 5: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 19: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 5: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 19: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 2: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 19: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 2: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 29: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 12: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 0: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 0: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 19: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 23: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 19: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 19: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 19: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 2: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 19: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 29: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 29: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 19: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 12: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 2: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 19: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 19: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 19: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 11: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 3: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,840] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 103 19: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 6: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 19: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:06:02,825] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 154 2: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 11: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 19: [2022-11-24 17:06:02,826] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 157 19: [2022-11-24 17:06:02,826] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 155 2: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 2: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 15: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 19: [2022-11-24 17:06:02,826] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 152 2: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 2: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 19: [2022-11-24 17:06:02,826] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 153 2: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 12: [2022-11-24 17:06:02,841] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 102 19: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 19: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 19: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 2: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 19: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 19: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 19: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 6: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 6: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 19: [2022-11-24 17:06:02,833] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 158 1: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 2: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 15: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 6: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 5: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,841] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 99 5: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 2: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 0: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 7: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 0: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 2: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 2: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 15: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 2: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 2: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 15: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 1: [2022-11-24 17:06:02,842] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 13 2: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 23: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 7: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 11: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 2: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 15: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 15: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 2: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 15: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 2: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 15: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 2: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 15: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 2: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 15: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 5: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 12: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 2: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 2: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 15: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 2: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 15: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 3: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 2: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 15: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 3: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 7: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 3: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 1: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 15: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 2: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 15: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 7: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,843] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 12 2: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 15: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 2: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 15: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 2: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 15: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 2: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 15: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:06:02,844] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 30 2: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 15: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 3: [2022-11-24 17:06:02,844] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 26 2: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 3: [2022-11-24 17:06:02,844] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 28 2: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 3: [2022-11-24 17:06:02,844] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 27 2: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 2: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 15: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 2: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 15: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 2: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 15: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,844] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 9 2: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 6: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 2: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 15: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 2: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 15: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 2: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 2: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 15: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 2: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 15: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 2: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 2: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 15: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 2: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 15: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 15: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 12: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 15: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 2: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 7: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 1: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 6: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 2: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 15: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 6: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 2: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 2: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 15: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 6: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:06:02,845] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 100 1: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 6: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 6: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 1: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 12: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 6: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 2: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 15: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 15: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 6: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 1: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 29: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 29: [2022-11-24 17:06:02,846] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 236 2: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:06:02,810] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 23 15: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 2: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 15: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 12: [2022-11-24 17:06:02,846] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 97 2: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 5: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 5: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 6: [2022-11-24 17:06:02,846] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 50 2: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 15: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 15: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 2: [2022-11-24 17:06:02,811] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 21 5: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 2: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 2: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 15: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 2: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 2: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 15: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 2: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 15: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 2: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 15: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 2: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 2: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 15: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 2: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 2: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 15: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 2: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 2: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 15: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 2: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 15: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 2: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 2: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 2: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 15: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 2: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 2: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 15: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 2: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 15: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 15: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 15: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 27: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 2: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 15: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 2: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 2: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 15: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 7: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 11: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 5: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 6: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 2: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 2: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 2: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 15: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 3: [2022-11-24 17:06:02,848] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 25 2: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 2: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 6: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 0: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 0: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 2: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 6: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 7: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 6: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 0: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 0: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 6: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 7: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 2: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 2: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 15: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 7: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 7: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 7: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 1: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 7: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 2: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 11: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 6: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 2: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 15: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 2: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 15: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 0: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 2: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 2: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 15: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 2: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 15: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 15: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 2: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 15: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 2: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 2: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 15: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 2: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 15: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 2: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 15: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 2: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 15: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 2: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 15: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 5: [2022-11-24 17:06:02,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 15: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 27: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 2: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 15: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 2: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 2: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 15: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 5: [2022-11-24 17:06:02,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 15: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 15: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 15: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 15: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 2: [2022-11-24 17:06:02,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 15: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 15: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 27: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 15: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 27: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 15: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 15: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 15: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 27: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 15: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 15: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 15: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 15: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 15: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 15: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 1: [2022-11-24 17:06:02,851] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 8 15: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 15: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 15: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 27: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 15: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 27: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 31: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 31: [2022-11-24 17:06:02,851] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 252 23: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 2: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 11: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 23: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 15: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 1: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 11: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 15: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 6: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 15: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 15: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 27: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 6: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 15: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 15: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 27: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 6: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 15: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 6: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 7: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 15: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 6: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:06:02,852] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 15 6: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 15: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 15: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 27: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 11: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 15: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 15: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 15: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 27: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 2: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 15: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 15: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 6: [2022-11-24 17:06:02,852] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 53 15: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 6: [2022-11-24 17:06:02,852] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 51 2: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 15: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 15: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 15: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 27: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 11: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 15: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 27: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 15: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 15: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 27: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 15: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 15: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 5: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 15: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 0: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 0: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 5: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 15: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 15: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 15: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 27: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 0: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 5: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 15: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 15: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 27: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 2: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 5: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 15: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 11: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 15: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 7: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 15: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 15: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 27: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 11: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 5: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 15: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 15: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 15: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 5: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 15: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 15: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 27: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 27: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 15: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 15: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 15: [2022-11-24 17:06:02,804] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 123 5: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 15: [2022-11-24 17:06:02,805] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 120 15: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 15: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 27: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 14: [2022-11-24 17:06:02,855] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 113 15: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,855] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 59 15: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 6: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 5: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 15: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 15: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 15: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 15: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 27: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 2: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 15: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 23: [2022-11-24 17:06:02,855] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 190 15: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 27: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 15: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 15: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 27: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 15: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 15: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 27: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 15: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 27: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 6: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 15: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 27: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 15: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 15: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 27: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 15: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 15: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 15: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 15: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 15: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 15: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 11: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 6: [2022-11-24 17:06:02,856] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 48 15: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 27: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 15: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 15: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 27: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 7: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 15: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 15: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 7: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 2: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 15: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 5: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 2: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 7: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 15: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 7: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 2: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 7: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 6: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 15: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 7: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 15: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 15: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 0: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 15: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 2: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 15: [2022-11-24 17:06:02,832] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 123 31: [2022-11-24 17:06:02,857] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 248 15: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 0: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 6: [2022-11-24 17:06:02,858] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 52 2: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 15: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 15: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 15: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 2: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 15: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 7: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 2: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 7: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 15: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 15: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 23: [2022-11-24 17:06:02,858] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 189 15: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 12: [2022-11-24 17:06:02,858] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 98 15: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 15: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 3: [2022-11-24 17:06:02,858] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 24 15: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 15: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 5: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 12: [2022-11-24 17:06:02,858] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 101 15: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 15: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 27: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 15: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 15: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 11: [2022-11-24 17:06:02,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 15: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 15: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 11: [2022-11-24 17:06:02,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 15: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 15: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 11: [2022-11-24 17:06:02,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 15: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 15: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 27: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 15: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 15: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 15: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 7: [2022-11-24 17:06:02,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 15: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 15: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 27: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 15: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 27: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 15: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 15: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 15: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 27: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 15: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 27: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 27: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 15: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 23: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 15: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 15: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 15: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 15: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 27: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 15: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 27: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 11: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 15: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 5: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 15: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 27: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 15: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 15: [2022-11-24 17:06:02,858] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 127 27: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 0: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 15: [2022-11-24 17:06:02,860] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 121 27: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 15: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 27: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 20: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 0: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 20: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 27: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 0: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 7: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 7: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 2: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 27: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 27: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 27: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 20: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 27: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 20: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 11: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 31: [2022-11-24 17:06:02,861] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 254 27: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 20: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 27: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 20: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 27: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 20: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 27: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 20: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 27: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 2: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 15: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 15: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 27: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 5: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 20: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 5: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 27: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 27: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 27: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 27: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 27: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 20: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 27: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 27: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 5: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 27: [2022-11-24 17:06:02,801] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 221 20: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 23: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 14: [2022-11-24 17:06:02,862] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 112 5: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 27: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 20: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 20: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 27: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 20: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 15: [2022-11-24 17:06:02,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 27: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 2: [2022-11-24 17:06:02,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 27: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 27: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 20: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 20: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 5: [2022-11-24 17:06:02,863] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 47 7: [2022-11-24 17:06:02,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 11: [2022-11-24 17:06:02,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 5: [2022-11-24 17:06:02,863] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 43 7: [2022-11-24 17:06:02,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 11: [2022-11-24 17:06:02,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 15: [2022-11-24 17:06:02,863] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 125 7: [2022-11-24 17:06:02,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 11: [2022-11-24 17:06:02,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 15: [2022-11-24 17:06:02,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 27: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 20: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 27: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 20: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 27: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 5: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 15: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 27: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 5: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 2: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 15: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 27: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 2: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 15: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 27: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 27: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 20: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 27: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 20: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 27: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 20: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 27: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 20: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 27: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 20: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 27: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 27: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 20: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 20: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,865] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 62 27: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 2: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 27: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 2: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 20: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 20: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 27: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 27: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 20: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 20: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 20: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,865] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 58 27: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 5: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 0: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 27: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 20: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 27: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 20: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 2: [2022-11-24 17:06:02,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 15: [2022-11-24 17:06:02,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 27: [2022-11-24 17:06:02,826] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 221 20: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 20: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 27: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 27: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 20: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 20: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 27: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 27: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 27: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 20: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 27: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 20: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 27: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 20: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 27: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 20: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 27: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 20: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 27: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 20: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 27: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 2: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 5: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 0: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 0: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 0: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 0: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 27: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 20: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 27: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 20: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 20: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 20: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 1: [2022-11-24 17:06:02,867] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 13 2: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 27: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 20: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 27: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 20: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 2: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 20: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 20: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 20: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 11: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 27: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 0: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 7: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 2: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 2: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 27: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 20: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 2: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 27: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 20: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 11: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 27: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 20: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 27: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 20: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 5: [2022-11-24 17:06:02,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 7: [2022-11-24 17:06:02,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 15: [2022-11-24 17:06:02,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:06:02,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,849] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 218 20: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 20: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 27: [2022-11-24 17:06:02,850] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 223 20: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 15: [2022-11-24 17:06:02,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 27: [2022-11-24 17:06:02,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 5: [2022-11-24 17:06:02,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 7: [2022-11-24 17:06:02,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 5: [2022-11-24 17:06:02,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 15: [2022-11-24 17:06:02,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 7: [2022-11-24 17:06:02,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 15: [2022-11-24 17:06:02,869] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 120 27: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 27: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 27: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 20: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 5: [2022-11-24 17:06:02,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 7: [2022-11-24 17:06:02,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 15: [2022-11-24 17:06:02,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 27: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 20: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 20: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 22: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 27: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 20: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 27: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 5: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 11: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,854] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 219 20: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 6: [2022-11-24 17:06:02,870] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 51 2: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 27: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 11: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 27: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 5: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 5: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 11: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 20: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 7: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 27: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 5: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 7: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 27: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 27: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 20: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 20: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 27: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 20: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 27: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 20: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 27: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 20: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 27: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 20: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 27: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 20: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 20: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 6: [2022-11-24 17:06:02,871] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 54 27: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 20: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 20: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 7: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 0: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,871] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 208 11: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:06:02,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 27: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 20: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 20: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 27: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 20: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 11: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 27: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 27: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:06:02,872] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 222 2: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 20: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 7: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 20: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 22: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 27: [2022-11-24 17:06:02,872] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 217 20: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 2: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 20: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 22: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 22: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 20: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 22: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 22: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 2: [2022-11-24 17:06:02,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 20: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 22: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 20: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 22: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 1: [2022-11-24 17:06:02,873] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 12 20: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 22: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 20: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:06:02,799] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 167 22: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 5: [2022-11-24 17:06:02,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:06:02,799] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 160 20: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 22: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 20: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 22: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 20: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 22: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 20: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 22: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 20: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 22: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 22: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 2: [2022-11-24 17:06:02,874] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 20 20: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 22: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 20: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 22: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 20: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 20: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 22: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 22: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 22: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 20: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 22: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 22: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 22: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 22: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 27: [2022-11-24 17:06:02,874] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 216 20: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 22: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 2: [2022-11-24 17:06:02,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 20: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 20: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 20: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 22: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 20: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 22: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 15: [2022-11-24 17:06:02,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 22: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 15: [2022-11-24 17:06:02,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 20: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 22: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 20: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 22: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 5: [2022-11-24 17:06:02,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 20: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 20: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 22: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 26: [2022-11-24 17:06:02,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 20: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 22: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 22: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 2: [2022-11-24 17:06:02,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 20: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 26: [2022-11-24 17:06:02,875] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 212 20: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 20: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 20: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 5: [2022-11-24 17:06:02,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:06:02,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 22: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 5: [2022-11-24 17:06:02,875] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 40 11: [2022-11-24 17:06:02,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 20: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 22: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 20: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 20: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 22: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 0: [2022-11-24 17:06:02,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 7: [2022-11-24 17:06:02,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,876] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 19 5: [2022-11-24 17:06:02,876] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 46 20: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 7: [2022-11-24 17:06:02,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 2: [2022-11-24 17:06:02,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 11: [2022-11-24 17:06:02,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 2: [2022-11-24 17:06:02,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 20: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 22: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 22: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 20: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 22: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 22: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 7: [2022-11-24 17:06:02,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 7: [2022-11-24 17:06:02,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 7: [2022-11-24 17:06:02,877] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 61 20: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 20: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 22: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 20: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 15: [2022-11-24 17:06:02,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 22: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 20: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 22: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 27: [2022-11-24 17:06:02,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 11: [2022-11-24 17:06:02,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 20: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 22: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 27: [2022-11-24 17:06:02,877] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 220 20: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 20: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 22: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 22: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 20: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 7: [2022-11-24 17:06:02,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,877] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 18 7: [2022-11-24 17:06:02,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 20: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 22: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 20: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 20: [2022-11-24 17:06:02,840] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 165 22: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 20: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 20: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 20: [2022-11-24 17:06:02,840] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 161 22: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 0: [2022-11-24 17:06:02,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 20: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 20: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 22: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 20: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 20: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 22: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 20: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 22: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 20: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 22: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 22: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 20: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,842] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 166 22: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 20: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 2: [2022-11-24 17:06:02,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 20: [2022-11-24 17:06:02,847] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 164 22: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 20: [2022-11-24 17:06:02,847] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 162 22: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 20: [2022-11-24 17:06:02,848] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 163 22: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 22: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 24: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 20: [2022-11-24 17:06:02,863] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 167 22: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 22: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 24: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 24: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 24: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 2: [2022-11-24 17:06:02,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 22: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 1: [2022-11-24 17:06:02,879] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 9 22: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 2: [2022-11-24 17:06:02,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 22: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 2: [2022-11-24 17:06:02,880] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 22 22: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 22: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 24: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 22: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 24: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 24: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 22: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 24: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 15: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 11: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 2: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 22: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 24: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 1: [2022-11-24 17:06:02,880] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 15 22: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 0: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 22: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 15: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 0: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 7: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 0: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 15: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 0: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 0: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 22: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 24: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 11: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 15: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 22: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 24: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 7: [2022-11-24 17:06:02,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 22: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 22: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 24: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 11: [2022-11-24 17:06:02,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 22: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 22: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 22: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 24: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 11: [2022-11-24 17:06:02,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 22: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 0: [2022-11-24 17:06:02,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 22: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 11: [2022-11-24 17:06:02,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 22: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 12: [2022-11-24 17:06:02,881] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 96 22: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 0: [2022-11-24 17:06:02,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 22: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 24: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 7: [2022-11-24 17:06:02,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 22: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 22: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 24: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 22: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 24: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 22: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 24: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 22: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 22: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 24: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 22: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 24: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 24: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 22: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 24: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 22: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 24: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 22: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 22: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 24: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 24: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 22: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 22: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 24: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 22: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 24: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 22: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 24: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 22: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 26: [2022-11-24 17:06:02,882] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 209 22: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 22: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 22: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 22: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 2: [2022-11-24 17:06:02,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 22: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 22: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 24: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 22: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 24: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 22: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 24: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 24: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 22: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 24: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 22: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 22: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 7: [2022-11-24 17:06:02,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 22: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 2: [2022-11-24 17:06:02,883] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 16 22: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 7: [2022-11-24 17:06:02,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 22: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 24: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 22: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 24: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 22: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 24: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 14: [2022-11-24 17:06:02,884] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 114 7: [2022-11-24 17:06:02,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 22: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:06:02,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 15: [2022-11-24 17:06:02,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 7: [2022-11-24 17:06:02,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 22: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 7: [2022-11-24 17:06:02,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 7: [2022-11-24 17:06:02,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 22: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 22: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 22: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 22: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 22: [2022-11-24 17:06:02,800] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 180 24: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 22: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 7: [2022-11-24 17:06:02,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 22: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 24: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 7: [2022-11-24 17:06:02,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 22: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 7: [2022-11-24 17:06:02,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 22: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 24: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 22: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 24: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 22: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 22: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,805] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 178 24: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 22: [2022-11-24 17:06:02,805] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 176 24: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 24: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 22: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 22: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 22: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 22: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 22: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 26: [2022-11-24 17:06:02,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 24: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 22: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 0: [2022-11-24 17:06:02,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 22: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 26: [2022-11-24 17:06:02,886] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 214 15: [2022-11-24 17:06:02,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:06:02,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 22: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 31: [2022-11-24 17:06:02,886] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 252 22: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 27: [2022-11-24 17:06:02,886] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 218 22: [2022-11-24 17:06:02,810] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 180 22: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 22: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 2: [2022-11-24 17:06:02,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 22: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 2: [2022-11-24 17:06:02,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 0: [2022-11-24 17:06:02,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 22: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 22: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 0: [2022-11-24 17:06:02,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 22: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 7: [2022-11-24 17:06:02,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 2: [2022-11-24 17:06:02,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 22: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 0: [2022-11-24 17:06:02,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 22: [2022-11-24 17:06:02,817] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 181 22: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 22: [2022-11-24 17:06:02,817] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 177 24: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 22: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 22: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 22: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 22: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 7: [2022-11-24 17:06:02,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 7: [2022-11-24 17:06:02,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,887] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 57 22: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 22: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 22: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 22: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 5: [2022-11-24 17:06:02,887] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 44 22: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 24: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 22: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 7: [2022-11-24 17:06:02,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 22: [2022-11-24 17:06:02,826] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 183 24: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 22: [2022-11-24 17:06:02,826] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 179 24: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 22: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 22: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 22: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 0: [2022-11-24 17:06:02,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 11: [2022-11-24 17:06:02,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 15: [2022-11-24 17:06:02,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:06:02,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 22: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:06:02,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 22: [2022-11-24 17:06:02,843] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 182 0: [2022-11-24 17:06:02,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 22: [2022-11-24 17:06:02,861] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 181 24: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 10: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 22: [2022-11-24 17:06:02,870] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 176 24: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 22: [2022-11-24 17:06:02,870] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 178 24: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 24: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 10: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 24: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 10: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 24: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 10: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 24: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 10: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 24: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 10: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 24: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 24: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 24: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 10: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 10: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 10: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 24: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 10: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 10: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 24: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 10: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 24: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 10: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 24: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 10: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 24: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 10: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 24: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 10: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 24: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 0: [2022-11-24 17:06:02,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 24: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 15: [2022-11-24 17:06:02,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 24: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 24: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 24: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 24: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 10: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 2: [2022-11-24 17:06:02,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 10: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 24: [2022-11-24 17:06:02,796] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 196 10: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 15: [2022-11-24 17:06:02,890] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 124 24: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 1: [2022-11-24 17:06:02,890] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 10 24: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 7: [2022-11-24 17:06:02,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:06:02,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 24: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 24: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 24: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 24: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 24: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 10: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 24: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 10: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 24: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 10: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 24: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 24: [2022-11-24 17:06:02,802] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 196 10: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 24: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 10: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 24: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 10: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 24: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 10: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 24: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 7: [2022-11-24 17:06:02,891] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 60 7: [2022-11-24 17:06:02,891] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 56 24: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 10: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 24: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 10: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 24: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 10: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 24: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 24: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 10: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 11: [2022-11-24 17:06:02,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 24: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 7: [2022-11-24 17:06:02,891] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 63 24: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 24: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 24: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 10: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 24: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 24: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 10: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 24: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 24: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 15: [2022-11-24 17:06:02,891] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 122 24: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 24: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 10: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 24: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 24: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 10: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 24: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 10: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 24: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 24: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 24: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 24: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 0: [2022-11-24 17:06:02,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 2: [2022-11-24 17:06:02,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:06:02,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 24: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 10: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 24: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 10: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 24: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 10: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 24: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 24: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 24: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 24: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 24: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 24: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 2: [2022-11-24 17:06:02,893] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 17 24: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 24: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 10: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 24: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 10: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 24: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 0: [2022-11-24 17:06:02,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 10: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 24: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 24: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 24: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 24: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 24: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 24: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 24: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 11: [2022-11-24 17:06:02,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 24: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 10: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 24: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 10: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 24: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 7: [2022-11-24 17:06:02,894] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 61 24: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 10: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 24: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 10: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 24: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 24: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 24: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 24: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 10: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 24: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 10: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 24: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 10: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 24: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,844] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 192 10: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,845] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 199 10: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 24: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 24: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 10: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 24: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 10: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 24: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 10: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 24: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 10: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 24: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 10: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 24: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 10: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 24: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 10: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 24: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 24: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 10: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 10: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 10: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 24: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 10: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 24: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 10: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 24: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 10: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 10: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 10: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 24: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 10: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 24: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 10: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 10: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 10: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 24: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 24: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 10: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 10: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 24: [2022-11-24 17:06:02,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 10: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 24: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 10: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 0: [2022-11-24 17:06:02,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 24: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 10: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 0: [2022-11-24 17:06:02,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 24: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 10: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 24: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 10: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 10: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 24: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 10: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 24: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 10: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 0: [2022-11-24 17:06:02,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 10: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 0: [2022-11-24 17:06:02,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 10: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 24: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 10: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 24: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 10: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 24: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 10: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 10: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 24: [2022-11-24 17:06:02,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 10: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 24: [2022-11-24 17:06:02,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 15: [2022-11-24 17:06:02,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:06:02,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:06:02,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:06:02,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 10: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 10: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 10: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 10: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 0: [2022-11-24 17:06:02,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 10: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 0: [2022-11-24 17:06:02,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 11: [2022-11-24 17:06:02,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,880] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 198 24: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:06:02,897] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 126 24: [2022-11-24 17:06:02,880] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 197 24: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 24: [2022-11-24 17:06:02,881] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 195 10: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:02,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 10: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 24: [2022-11-24 17:06:02,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 10: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 24: [2022-11-24 17:06:02,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 24: [2022-11-24 17:06:02,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 10: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 4: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 24: [2022-11-24 17:06:02,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 24: [2022-11-24 17:06:02,887] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 193 10: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 24: [2022-11-24 17:06:02,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 10: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 24: [2022-11-24 17:06:02,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 10: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:06:02,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 10: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:06:02,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 10: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 11: [2022-11-24 17:06:02,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 24: [2022-11-24 17:06:02,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:06:02,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 24: [2022-11-24 17:06:02,896] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 194 10: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 4: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 10: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 10: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 10: [2022-11-24 17:06:02,804] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 84 4: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 10: [2022-11-24 17:06:02,804] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 85 4: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 10: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 4: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 10: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 4: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 10: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 4: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 10: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 4: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 10: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 4: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 10: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 4: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 10: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 10: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 4: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 10: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 10: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 10: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 10: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 4: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 10: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 10: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 4: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 10: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 10: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 0: [2022-11-24 17:06:02,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 10: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 10: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 10: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 4: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:06:02,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 10: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 0: [2022-11-24 17:06:02,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 0: [2022-11-24 17:06:02,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 5: [2022-11-24 17:06:02,900] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 47 10: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 10: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 4: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 10: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 10: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 10: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 10: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 10: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 4: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 10: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 4: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 10: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 4: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 10: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 10: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 4: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 10: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 10: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 23: [2022-11-24 17:06:02,901] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 188 10: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 10: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 10: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 10: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 4: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 10: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 4: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 4: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 4: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 4: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 4: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 10: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 10: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 4: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 4: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 10: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 4: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:06:02,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:06:02,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 10: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 4: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 10: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 2: [2022-11-24 17:06:02,901] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 16 10: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 10: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 10: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 10: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 4: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 10: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 4: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 4: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,841] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 84 4: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 4: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 10: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 4: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,843] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 81 4: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 10: [2022-11-24 17:06:02,843] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 82 10: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 4: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 4: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 10: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 4: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 4: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 4: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 10: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 4: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 10: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 10: [2022-11-24 17:06:02,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 4: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 4: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 4: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 10: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 4: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 10: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 4: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 4: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 4: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 4: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 4: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 4: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 4: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 4: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 4: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 4: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 10: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 4: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 4: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 4: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 4: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 4: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 4: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,858] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 87 4: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 10: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 10: [2022-11-24 17:06:02,859] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 86 4: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 10: [2022-11-24 17:06:02,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 10: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 10: [2022-11-24 17:06:02,860] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 83 4: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 8: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 10: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,863] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 80 4: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 10: [2022-11-24 17:06:02,881] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 85 4: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 4: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 8: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 4: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 8: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 4: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 8: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 4: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 8: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 4: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 8: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 4: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 8: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 4: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 8: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 8: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 4: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 8: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 4: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 8: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 8: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 4: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 8: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 4: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 8: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 4: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 8: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 4: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 8: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 0: [2022-11-24 17:06:02,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 4: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 4: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 4: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 8: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 4: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 0: [2022-11-24 17:06:02,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 4: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 8: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:06:02,905] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 124 4: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 8: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 4: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 8: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 4: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 8: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 4: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 8: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 4: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 8: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 5: [2022-11-24 17:06:02,905] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 43 4: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 4: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 8: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 4: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 8: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 4: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 8: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 4: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 4: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 4: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 8: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 4: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 4: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 8: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 11: [2022-11-24 17:06:02,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 4: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 4: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 8: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 4: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 8: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 4: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 8: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 8: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 4: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 8: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 4: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 8: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 4: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 8: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 4: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 8: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 4: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 4: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 8: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 27: [2022-11-24 17:06:02,906] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 223 4: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 8: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 4: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 8: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 11: [2022-11-24 17:06:02,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 4: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 4: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 8: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 4: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 4: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 4: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 4: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 8: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,906] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 90 14: [2022-11-24 17:06:02,906] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 115 11: [2022-11-24 17:06:02,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 11: [2022-11-24 17:06:02,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 4: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 8: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 4: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 4: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 8: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 8: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 14: [2022-11-24 17:06:02,906] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 117 4: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 4: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 8: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 8: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 4: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 8: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:06:02,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 4: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 8: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 4: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 8: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 4: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 8: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 4: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 8: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 8: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:06:02,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 31: [2022-11-24 17:06:02,907] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 253 4: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 8: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 4: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 8: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 8: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 8: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 4: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 4: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 8: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 4: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 8: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 4: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 8: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 4: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 4: [2022-11-24 17:06:02,802] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 34 8: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 23: [2022-11-24 17:06:02,907] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 187 4: [2022-11-24 17:06:02,803] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 38 4: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 8: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 4: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 8: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 4: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 4: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 8: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 4: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 8: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 4: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 8: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 8: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 7: [2022-11-24 17:06:02,908] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 63 4: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 4: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 8: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:06:02,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 4: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 8: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 4: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 8: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 8: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 4: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 8: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 4: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 8: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 4: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 4: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 4: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 4: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 4: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 8: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 8: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 8: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 8: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 4: [2022-11-24 17:06:02,812] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 32 8: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 0: [2022-11-24 17:06:02,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 4: [2022-11-24 17:06:02,812] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 33 4: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 4: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 4: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 8: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 4: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,909] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 7 0: [2022-11-24 17:06:02,909] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 2 4: [2022-11-24 17:06:02,816] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 37 4: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 8: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 8: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,909] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 91 4: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 4: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 8: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 0: [2022-11-24 17:06:02,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 0: [2022-11-24 17:06:02,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 4: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 4: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 8: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 0: [2022-11-24 17:06:02,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 4: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 4: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 8: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 4: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 8: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 4: [2022-11-24 17:06:02,824] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 39 8: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 8: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,824] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 36 8: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 0: [2022-11-24 17:06:02,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 4: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:06:02,826] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 35 8: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 8: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 26: [2022-11-24 17:06:02,910] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 212 4: [2022-11-24 17:06:02,826] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 38 4: [2022-11-24 17:06:02,830] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 34 8: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 4: [2022-11-24 17:06:02,838] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 39 8: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 8: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 4: [2022-11-24 17:06:02,853] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 36 8: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 0: [2022-11-24 17:06:02,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 8: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 16: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 8: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 8: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 16: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 16: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 16: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 8: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 16: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 8: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 8: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 16: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 0: [2022-11-24 17:06:02,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 8: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 11: [2022-11-24 17:06:02,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 11: [2022-11-24 17:06:02,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 8: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 0: [2022-11-24 17:06:02,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 11: [2022-11-24 17:06:02,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 16: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 11: [2022-11-24 17:06:02,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 8: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 16: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 16: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 8: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 16: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 8: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 16: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 16: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 8: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 16: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 16: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 8: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 16: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 8: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 16: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 16: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 8: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 16: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 16: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 8: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 8: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 16: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 8: [2022-11-24 17:06:02,800] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 68 16: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 8: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 11: [2022-11-24 17:06:02,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 11: [2022-11-24 17:06:02,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 8: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 11: [2022-11-24 17:06:02,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 8: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 8: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 16: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 8: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,805] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 68 16: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 8: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 8: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 16: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 20: [2022-11-24 17:06:02,912] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 160 8: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 16: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 8: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 8: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 8: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 8: [2022-11-24 17:06:02,810] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 69 8: [2022-11-24 17:06:02,810] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 70 16: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 16: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 8: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 8: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 8: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 16: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 8: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 16: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 16: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 8: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 16: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 8: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 8: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 16: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 8: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 8: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 8: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 16: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 8: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 8: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 8: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 8: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:06:02,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 0: [2022-11-24 17:06:02,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 8: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 0: [2022-11-24 17:06:02,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:06:02,913] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 214 8: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 8: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 16: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 8: [2022-11-24 17:06:02,822] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 67 16: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 8: [2022-11-24 17:06:02,822] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 66 16: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 8: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 8: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 0: [2022-11-24 17:06:02,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 8: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 0: [2022-11-24 17:06:02,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 8: [2022-11-24 17:06:02,825] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 71 16: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 8: [2022-11-24 17:06:02,825] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 65 16: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 8: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 8: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 8: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 8: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 8: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 0: [2022-11-24 17:06:02,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 8: [2022-11-24 17:06:02,835] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 64 8: [2022-11-24 17:06:02,896] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 71 16: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 16: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 18: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 8: [2022-11-24 17:06:02,906] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 65 16: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 8: [2022-11-24 17:06:02,915] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 69 16: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 18: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 16: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 18: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 11: [2022-11-24 17:06:02,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 18: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 11: [2022-11-24 17:06:02,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 16: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 11: [2022-11-24 17:06:02,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 16: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 18: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 16: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 16: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 18: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 16: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 18: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 16: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 16: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 18: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 11: [2022-11-24 17:06:02,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 16: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 16: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 18: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 16: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 18: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 5: [2022-11-24 17:06:02,915] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 45 16: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 16: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 18: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 18: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 11: [2022-11-24 17:06:02,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,916] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 94 16: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 16: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 11: [2022-11-24 17:06:02,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 16: [2022-11-24 17:06:02,795] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 132 16: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 18: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 16: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 16: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 16: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:06:02,916] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 14 16: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 18: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:06:02,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 16: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 18: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 18: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 18: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 11: [2022-11-24 17:06:02,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 16: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 16: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 11: [2022-11-24 17:06:02,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 16: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 16: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 18: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 16: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 16: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 18: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,917] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 1 16: [2022-11-24 17:06:02,808] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 132 16: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 18: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:06:02,917] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 3 16: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 16: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 16: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 16: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 18: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 0: [2022-11-24 17:06:02,917] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 6 16: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 16: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 18: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 7: [2022-11-24 17:06:02,917] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 60 16: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 16: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 18: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 7: [2022-11-24 17:06:02,917] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 56 16: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:06:02,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 16: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 16: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 16: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 16: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 18: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 16: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 16: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 16: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 18: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 16: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 18: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 16: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 18: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 18: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 18: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 16: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 18: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 18: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 16: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 18: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 16: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 18: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 16: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 18: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 18: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 16: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 18: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 16: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 18: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 18: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 18: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 18: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 16: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 18: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 11: [2022-11-24 17:06:02,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 2: [2022-11-24 17:06:02,919] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 23 16: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 18: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 18: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 18: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 18: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 18: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 16: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 18: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 11: [2022-11-24 17:06:02,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 18: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 18: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 11: [2022-11-24 17:06:02,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 16: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 18: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 11: [2022-11-24 17:06:02,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 11: [2022-11-24 17:06:02,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:06:02,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:06:02,838] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 131 18: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 0: [2022-11-24 17:06:02,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:06:02,838] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 134 16: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 18: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 18: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 16: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 18: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 16: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 18: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 16: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 18: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 18: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 18: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 18: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 16: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 18: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 18: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 16: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 18: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 18: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 18: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 16: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 18: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:06:02,921] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 4 16: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 16: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 18: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 18: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 16: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 18: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 16: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 18: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 16: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 18: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 16: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 18: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 16: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 18: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 18: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 16: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 16: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 18: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 16: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 18: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:06:02,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 18: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 16: [2022-11-24 17:06:02,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 18: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 16: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 18: [2022-11-24 17:06:02,792] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 144 16: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 18: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 16: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 18: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 16: [2022-11-24 17:06:02,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 18: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 16: [2022-11-24 17:06:02,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 18: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 16: [2022-11-24 17:06:02,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 18: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 16: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 18: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 16: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 18: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 16: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 18: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 16: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 16: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 18: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 16: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 18: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 16: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 18: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 16: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 18: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 16: [2022-11-24 17:06:02,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 18: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 16: [2022-11-24 17:06:02,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 18: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 11: [2022-11-24 17:06:02,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:06:02,873] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:02,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 18: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 16: [2022-11-24 17:06:02,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 18: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 11: [2022-11-24 17:06:02,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:06:02,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 18: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 16: [2022-11-24 17:06:02,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 18: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 16: [2022-11-24 17:06:02,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 18: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 16: [2022-11-24 17:06:02,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 18: [2022-11-24 17:06:02,805] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 144 16: [2022-11-24 17:06:02,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 18: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 16: [2022-11-24 17:06:02,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 18: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 16: [2022-11-24 17:06:02,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 18: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 16: [2022-11-24 17:06:02,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 18: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 11: [2022-11-24 17:06:02,922] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 95 16: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 18: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 11: [2022-11-24 17:06:02,923] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 88 16: [2022-11-24 17:06:02,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 16: [2022-11-24 17:06:02,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 18: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 16: [2022-11-24 17:06:02,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 18: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 16: [2022-11-24 17:06:02,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:02,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 18: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 16: [2022-11-24 17:06:02,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 16: [2022-11-24 17:06:02,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 18: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 16: [2022-11-24 17:06:02,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 18: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 16: [2022-11-24 17:06:02,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 18: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 16: [2022-11-24 17:06:02,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 18: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 16: [2022-11-24 17:06:02,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 18: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 16: [2022-11-24 17:06:02,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 18: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 11: [2022-11-24 17:06:02,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:06:02,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 16: [2022-11-24 17:06:02,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:02,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 18: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 16: [2022-11-24 17:06:02,886] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 133 18: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 11: [2022-11-24 17:06:02,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:06:02,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:06:02,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 18: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 16: [2022-11-24 17:06:02,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 18: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 16: [2022-11-24 17:06:02,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 18: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 16: [2022-11-24 17:06:02,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 18: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:02,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 18: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 16: [2022-11-24 17:06:02,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 18: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 16: [2022-11-24 17:06:02,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 16: [2022-11-24 17:06:02,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 18: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 16: [2022-11-24 17:06:02,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 18: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 16: [2022-11-24 17:06:02,890] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 128 18: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 16: [2022-11-24 17:06:02,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 16: [2022-11-24 17:06:02,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 16: [2022-11-24 17:06:02,893] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 130 18: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 16: [2022-11-24 17:06:02,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 16: [2022-11-24 17:06:02,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 16: [2022-11-24 17:06:02,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 31: [2022-11-24 17:06:02,924] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 249 16: [2022-11-24 17:06:02,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:06:02,896] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 135 18: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 16: [2022-11-24 17:06:02,896] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 129 18: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 18: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 25: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 25: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 18: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 25: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 18: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 25: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 25: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 25: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 25: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 18: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 18: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 18: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 25: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 25: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 18: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 25: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 18: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 25: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 18: [2022-11-24 17:06:02,830] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 147 25: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 25: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 18: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 18: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 25: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 25: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 18: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 25: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 18: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 25: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 25: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 18: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 25: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 18: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 25: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 18: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 25: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 18: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 18: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 18: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 18: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 25: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 18: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 25: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 18: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 25: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 18: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 25: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 18: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 18: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 25: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 25: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 18: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 18: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 25: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 18: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 25: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 18: [2022-11-24 17:06:02,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 18: [2022-11-24 17:06:02,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 18: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 25: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 18: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 18: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 25: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 18: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 25: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 18: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 25: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 18: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 25: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 25: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 18: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 25: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 18: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 25: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 18: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 5: [2022-11-24 17:06:02,926] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 42 18: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 18: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 18: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 25: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 25: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 18: [2022-11-24 17:06:02,862] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 147 25: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 18: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 18: [2022-11-24 17:06:02,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 25: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 18: [2022-11-24 17:06:02,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 25: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 25: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 18: [2022-11-24 17:06:02,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 18: [2022-11-24 17:06:02,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 25: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 18: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 25: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 25: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 18: [2022-11-24 17:06:02,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 25: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 18: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 25: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 18: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 25: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 25: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 25: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 25: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 25: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 25: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 25: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 18: [2022-11-24 17:06:02,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 25: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 18: [2022-11-24 17:06:02,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 25: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 25: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 25: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 18: [2022-11-24 17:06:02,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 25: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 18: [2022-11-24 17:06:02,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 18: [2022-11-24 17:06:02,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 25: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 25: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 25: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 25: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 25: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 25: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 25: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 25: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 25: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 18: [2022-11-24 17:06:02,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 18: [2022-11-24 17:06:02,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 25: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 18: [2022-11-24 17:06:02,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 25: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 18: [2022-11-24 17:06:02,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 25: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 25: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 25: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 18: [2022-11-24 17:06:02,886] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 151 25: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 18: [2022-11-24 17:06:02,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 25: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 18: [2022-11-24 17:06:02,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 25: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 25: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 18: [2022-11-24 17:06:02,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 25: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 25: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,890] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 146 25: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 25: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 18: [2022-11-24 17:06:02,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 25: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 25: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 25: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 18: [2022-11-24 17:06:02,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 25: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 25: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 25: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 25: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 18: [2022-11-24 17:06:02,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 25: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 18: [2022-11-24 17:06:02,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 25: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 18: [2022-11-24 17:06:02,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 25: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 25: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 25: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 25: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 25: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 25: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 11: [2022-11-24 17:06:02,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:06:02,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 18: [2022-11-24 17:06:02,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 25: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 25: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 18: [2022-11-24 17:06:02,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 25: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 18: [2022-11-24 17:06:02,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 25: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 18: [2022-11-24 17:06:02,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 18: [2022-11-24 17:06:02,900] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 145 25: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 18: [2022-11-24 17:06:02,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 25: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 18: [2022-11-24 17:06:02,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 25: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 18: [2022-11-24 17:06:02,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 25: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 18: [2022-11-24 17:06:02,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 25: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 18: [2022-11-24 17:06:02,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 25: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 11: [2022-11-24 17:06:02,930] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 89 18: [2022-11-24 17:06:02,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:06:02,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 18: [2022-11-24 17:06:02,904] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 148 25: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 18: [2022-11-24 17:06:02,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:06:02,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 18: [2022-11-24 17:06:02,905] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 149 25: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 18: [2022-11-24 17:06:02,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 18: [2022-11-24 17:06:02,907] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 150 25: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 18: [2022-11-24 17:06:02,914] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 146 25: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 25: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_04-model_00-model_states.pt. 25: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 25: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 25: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 30: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 25: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 25: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 25: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 25: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 25: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 25: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 25: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 25: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 25: [2022-11-24 17:06:02,800] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 202 30: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 25: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 25: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 25: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 25: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 25: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 25: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 25: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 25: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 25: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 25: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 25: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 25: [2022-11-24 17:06:02,803] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 205 30: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 25: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 30: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 25: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 30: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 25: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 30: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 23: [2022-11-24 17:06:02,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 25: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 25: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 25: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 23: [2022-11-24 17:06:02,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 25: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 30: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 25: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 30: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 25: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 30: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 25: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 30: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 25: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 30: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 25: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 30: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 25: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 30: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 25: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 30: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 25: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 30: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 25: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 30: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 25: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 30: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 25: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 30: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 25: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 30: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 25: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 30: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 25: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 30: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 25: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 30: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 25: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 30: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 25: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 30: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 25: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 30: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 25: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 30: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 25: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 30: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 25: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 30: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 25: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 30: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 25: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 25: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 30: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 25: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 30: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 25: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 30: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 30: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 25: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 30: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 25: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 30: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 8: [2022-11-24 17:06:02,933] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 67 25: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 25: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 25: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 30: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 25: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 25: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 30: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 30: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 30: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 30: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 30: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 25: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 30: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 25: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 30: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 25: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 30: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 30: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 30: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 30: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 25: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 30: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 25: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 30: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 25: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 30: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 25: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 30: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 25: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 30: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 25: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 30: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 25: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 30: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 30: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 25: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 30: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 25: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 25: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 30: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 25: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 25: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 30: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 25: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 30: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 25: [2022-11-24 17:06:02,839] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 202 30: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 25: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 30: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 25: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 30: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 14: [2022-11-24 17:06:02,934] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 119 25: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 25: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 30: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 25: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 30: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 25: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 30: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 25: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 30: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 25: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 30: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 25: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 30: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 25: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 30: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 25: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 30: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 25: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 25: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 25: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 30: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 25: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 30: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 4: [2022-11-24 17:06:02,934] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 37 25: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 25: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 30: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 25: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 25: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 30: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 25: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 25: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 30: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 30: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 25: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 30: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 25: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 25: [2022-11-24 17:06:02,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 25: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 25: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 30: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 25: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 30: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 25: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 30: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 25: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 30: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 25: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 30: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 25: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 30: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 25: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 25: [2022-11-24 17:06:02,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 25: [2022-11-24 17:06:02,853] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 201 30: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 25: [2022-11-24 17:06:02,854] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 206 30: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 25: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 25: [2022-11-24 17:06:02,855] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 203 30: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 25: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 25: [2022-11-24 17:06:02,856] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 207 30: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 25: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 25: [2022-11-24 17:06:02,857] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 204 30: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 25: [2022-11-24 17:06:02,873] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 205 30: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 25: [2022-11-24 17:06:02,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 25: [2022-11-24 17:06:02,878] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 200 30: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 30: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 25: [2022-11-24 17:06:02,896] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 201 30: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 30: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 25: [2022-11-24 17:06:02,901] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 206 30: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 25: [2022-11-24 17:06:02,935] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 200 30: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 30: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 9: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 30: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 9: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 30: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 30: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 9: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 30: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 9: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 30: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 9: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 30: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 30: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 30: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 9: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 30: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 9: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 30: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 9: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 30: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 30: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 30: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 30: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 30: [2022-11-24 17:06:02,802] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 244 30: [2022-11-24 17:06:02,802] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 242 9: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 30: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 9: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 30: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 30: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 30: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 9: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 30: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 30: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 30: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 30: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 9: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 30: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 30: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 30: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 30: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 30: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 30: [2022-11-24 17:06:02,809] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 244 9: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 30: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 30: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 30: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 9: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 30: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 9: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 30: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 9: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 30: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 29: [2022-11-24 17:06:02,937] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 235 30: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 30: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 9: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 30: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 9: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 9: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 9: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 30: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 9: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 9: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 9: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 30: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 9: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 30: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 9: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 30: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 30: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 30: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 30: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 9: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 30: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 9: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 30: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 9: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 9: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 30: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 9: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 30: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 9: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 30: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 9: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 30: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 30: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 9: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 30: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 9: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 29: [2022-11-24 17:06:02,938] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 237 30: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 9: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 30: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 9: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 30: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 9: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 30: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 9: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 30: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 9: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 30: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 30: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 9: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 30: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 9: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 30: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 9: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 30: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 9: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 30: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 9: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 9: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,939] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 118 30: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 9: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 9: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 30: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 9: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 30: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 30: [2022-11-24 17:06:02,839] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 247 9: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 30: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 9: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 30: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 9: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 30: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 9: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 30: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 9: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 9: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 30: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 9: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 30: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 9: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 9: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 30: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 9: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 30: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 9: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 30: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 9: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 30: [2022-11-24 17:06:02,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 9: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 30: [2022-11-24 17:06:02,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 9: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 30: [2022-11-24 17:06:02,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 9: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 30: [2022-11-24 17:06:02,850] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 242 9: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 30: [2022-11-24 17:06:02,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 9: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 9: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 30: [2022-11-24 17:06:02,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 9: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 30: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 9: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 30: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 9: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 30: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 9: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 9: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 9: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 30: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 9: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 30: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 9: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 30: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 9: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 30: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 9: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 9: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 9: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 9: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 9: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 30: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 9: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 30: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 9: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 30: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 9: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 30: [2022-11-24 17:06:02,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 9: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 30: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 9: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 30: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 9: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 30: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 9: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 30: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 9: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 30: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 9: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 30: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 9: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 30: [2022-11-24 17:06:02,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 9: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 15: [2022-11-24 17:06:02,941] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 127 30: [2022-11-24 17:06:02,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 30: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 30: [2022-11-24 17:06:02,868] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 240 9: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 30: [2022-11-24 17:06:02,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 9: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 30: [2022-11-24 17:06:02,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 9: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 11: [2022-11-24 17:06:02,941] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 94 30: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 30: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 9: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 30: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 9: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 30: [2022-11-24 17:06:02,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 9: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 30: [2022-11-24 17:06:02,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 9: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 30: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 9: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 30: [2022-11-24 17:06:02,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 9: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 30: [2022-11-24 17:06:02,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 30: [2022-11-24 17:06:02,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 9: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 9: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 9: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 9: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:06:02,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 9: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 30: [2022-11-24 17:06:02,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:06:02,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 9: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 30: [2022-11-24 17:06:02,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 30: [2022-11-24 17:06:02,878] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 241 9: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 30: [2022-11-24 17:06:02,878] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 246 9: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:06:02,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 9: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:06:02,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 9: [2022-11-24 17:06:02,800] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 78 30: [2022-11-24 17:06:02,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 30: [2022-11-24 17:06:02,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:06:02,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 9: [2022-11-24 17:06:02,800] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 72 30: [2022-11-24 17:06:02,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 9: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 30: [2022-11-24 17:06:02,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,884] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 245 9: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 30: [2022-11-24 17:06:02,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 30: [2022-11-24 17:06:02,888] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 243 9: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 30: [2022-11-24 17:06:02,898] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 247 9: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 9: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 17: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 30: [2022-11-24 17:06:02,917] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 243 9: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 17: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 9: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 17: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 9: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 17: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 9: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 17: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 9: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 17: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 9: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 17: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 9: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 17: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 9: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 17: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 9: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 17: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 9: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 17: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 9: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 17: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 9: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 17: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 9: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 9: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 9: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 17: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 9: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 9: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 9: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 17: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 9: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 17: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 9: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 17: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 9: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 17: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 9: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 17: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 9: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 17: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 9: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 17: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 9: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 17: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 9: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 17: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 9: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 17: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 9: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 17: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 17: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 9: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 17: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 9: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 17: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 9: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 17: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 17: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 9: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 17: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 9: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 17: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 9: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 17: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 9: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 17: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 9: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 17: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 17: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 17: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 9: [2022-11-24 17:06:02,827] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 78 17: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 17: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 17: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 17: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 9: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 17: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 24: [2022-11-24 17:06:02,944] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 195 9: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 9: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 9: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 17: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 9: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 17: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 9: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 17: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 9: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 17: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 9: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 17: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 9: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 17: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 25: [2022-11-24 17:06:02,944] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 203 9: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 17: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 17: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 9: [2022-11-24 17:06:02,834] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 79 17: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 9: [2022-11-24 17:06:02,834] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 75 17: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 17: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 17: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 17: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 8: [2022-11-24 17:06:02,945] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 66 9: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 9: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 17: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 9: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 17: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 17: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 9: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 17: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 9: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 17: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 17: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 17: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 17: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 17: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 17: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 9: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 9: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 17: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 17: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 9: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 17: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 17: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 17: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 17: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 17: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 17: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 17: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 17: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 17: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 17: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 17: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 17: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 17: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 17: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 17: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 17: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 17: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 9: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 9: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 9: [2022-11-24 17:06:02,856] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 77 17: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 9: [2022-11-24 17:06:02,857] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 76 17: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 9: [2022-11-24 17:06:02,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,859] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 74 17: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 9: [2022-11-24 17:06:02,863] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 72 17: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,875] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 79 17: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 9: [2022-11-24 17:06:02,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 9: [2022-11-24 17:06:02,882] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 73 17: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 9: [2022-11-24 17:06:02,887] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 74 17: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 9: [2022-11-24 17:06:02,916] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 75 17: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 17: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 28: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 17: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 17: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 28: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 17: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 28: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 17: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 17: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 28: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 17: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 28: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 17: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 28: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 17: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 28: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 28: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 17: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 17: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 28: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 17: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 28: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 17: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 22: [2022-11-24 17:06:02,947] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 177 17: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 17: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 17: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 28: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 17: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 28: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 17: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 28: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 17: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 28: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 17: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 28: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 17: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 28: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 17: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 17: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 28: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 17: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 28: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 17: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 17: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 28: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 17: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 28: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 17: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 28: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 17: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 28: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 17: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 28: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 17: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 28: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 17: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 28: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 17: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 28: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 17: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 28: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 17: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 28: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 17: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 28: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 17: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 28: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 17: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 17: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 28: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 17: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 17: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 28: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 17: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 17: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 28: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 17: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 28: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 17: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 28: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 17: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 28: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 17: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 28: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 17: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 28: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 17: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 28: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 28: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 17: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 17: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 17: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 28: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 17: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 28: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 28: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 17: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 17: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 17: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 17: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 17: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 28: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 17: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 28: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 17: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 28: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 14: [2022-11-24 17:06:02,949] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 116 17: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 17: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 28: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 17: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 17: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 28: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 17: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 28: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 17: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 28: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 17: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 28: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 17: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 28: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 17: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 28: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 17: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 28: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 17: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 28: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 17: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 28: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 11: [2022-11-24 17:06:02,950] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 88 17: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 17: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 28: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 17: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 28: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 17: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 28: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 17: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 28: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 17: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 28: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 17: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 28: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 17: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 17: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 28: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 17: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 28: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 17: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 28: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 28: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 28: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 17: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 28: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 28: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 17: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 28: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 17: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 28: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 17: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 28: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 17: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 28: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 17: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 28: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 17: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 8: [2022-11-24 17:06:02,951] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 70 17: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 17: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 28: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 17: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 28: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 17: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 28: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 17: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 28: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 17: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 28: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 17: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 28: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 17: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 28: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 17: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 28: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 17: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 28: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 17: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 17: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 28: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 28: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 17: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 17: [2022-11-24 17:06:02,818] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 136 17: [2022-11-24 17:06:02,818] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 138 28: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 17: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 28: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 17: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 17: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 28: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 17: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 28: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 17: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 17: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 28: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 17: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 17: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 28: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 17: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 17: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 28: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 17: [2022-11-24 17:06:02,822] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 141 28: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 17: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 17: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 28: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 17: [2022-11-24 17:06:02,823] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 137 28: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 17: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 17: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 28: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 17: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 28: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 17: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 28: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 17: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 28: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 17: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 17: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 28: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 17: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 28: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 17: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 28: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 17: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 28: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 28: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 17: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 28: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 17: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 28: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 17: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 28: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 17: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 28: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 17: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 28: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 17: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 28: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 17: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 28: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 17: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 28: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:02,952] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 21 17: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 28: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 17: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 17: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 17: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 17: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 28: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 17: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 28: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 17: [2022-11-24 17:06:02,837] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 140 17: [2022-11-24 17:06:02,837] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 142 28: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 17: [2022-11-24 17:06:02,837] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 143 28: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 17: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 28: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 17: [2022-11-24 17:06:02,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 28: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 17: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 28: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 17: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 28: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 17: [2022-11-24 17:06:02,843] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 136 28: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 17: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 28: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 28: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 17: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 28: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 17: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 17: [2022-11-24 17:06:02,849] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 139 28: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 28: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 28: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 21: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 21: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 28: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 21: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 21: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 21: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 28: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 21: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 28: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 28: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 21: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 28: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:06:02,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 31: [2022-11-24 17:06:02,954] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 250 28: [2022-11-24 17:06:02,805] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 227 11: [2022-11-24 17:06:02,954] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 91 28: [2022-11-24 17:06:02,805] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 231 28: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt... 28: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 28: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 28: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 28: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 21: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 28: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 28: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 28: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 28: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 28: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 28: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 28: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 28: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 28: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 28: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 28: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 28: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 28: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 28: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 28: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 28: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 28: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 28: [2022-11-24 17:06:02,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 28: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 28: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 28: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 28: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 28: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 21: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 28: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 28: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 28: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 15: [2022-11-24 17:06:02,956] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 121 28: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 28: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 28: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 28: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 21: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 28: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 28: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 21: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 28: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 28: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 21: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 28: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 28: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 21: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 28: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 21: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 28: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 21: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 21: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 21: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 28: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 28: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 21: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 28: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 21: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 28: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 21: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 28: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 28: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 21: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 28: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 21: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 28: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 28: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 21: [2022-11-24 17:06:02,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 28: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 21: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 21: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 28: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 21: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 28: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 21: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 28: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 21: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 28: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 21: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:06:02,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 28: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 21: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 28: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 21: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 28: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 21: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 28: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 28: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 28: [2022-11-24 17:06:02,839] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 228 21: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 28: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 28: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:06:02,839] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 225 21: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 28: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 21: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 28: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 21: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 21: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 28: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 21: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 28: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 21: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 28: [2022-11-24 17:06:02,841] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 230 21: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 28: [2022-11-24 17:06:02,841] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 226 21: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 21: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 28: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 21: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 21: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 21: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 28: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 21: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 21: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 28: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 28: [2022-11-24 17:06:02,846] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 229 21: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 28: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 28: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 28: [2022-11-24 17:06:02,849] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 224 21: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 28: [2022-11-24 17:06:02,941] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 227 21: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 21: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 13: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_05-model_00-model_states.pt. 21: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 21: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 21: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 21: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 13: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 13: [2022-11-24 17:06:02,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 21: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 13: [2022-11-24 17:06:02,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 21: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 13: [2022-11-24 17:06:02,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 21: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 13: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 21: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 13: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 21: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 21: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 13: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 21: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 21: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 21: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 21: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 13: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 21: [2022-11-24 17:06:02,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 21: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 21: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 21: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 13: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 21: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 13: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 21: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 13: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 21: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 21: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 13: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 21: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 13: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 21: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 13: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 21: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 13: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 21: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 21: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt... 21: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 21: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 21: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 13: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 21: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 13: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 21: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 13: [2022-11-24 17:06:02,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 21: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 13: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 21: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 13: [2022-11-24 17:06:02,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 21: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 13: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 21: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 21: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 21: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 21: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 21: [2022-11-24 17:06:02,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_06-model_00-model_states.pt. 21: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 13: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 13: [2022-11-24 17:06:02,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 21: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 13: [2022-11-24 17:06:02,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 21: [2022-11-24 17:06:02,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 13: [2022-11-24 17:06:02,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 21: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 21: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 21: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 13: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 21: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 13: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 13: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 21: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 13: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 21: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 21: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 21: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 21: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 13: [2022-11-24 17:06:02,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 21: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 21: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 21: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 21: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 21: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 21: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 13: [2022-11-24 17:06:02,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 21: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 21: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 21: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 13: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 21: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 21: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 13: [2022-11-24 17:06:02,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 21: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:06:02,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 21: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 13: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt... 21: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 13: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 21: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 13: [2022-11-24 17:06:02,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 21: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 13: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 21: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 13: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 7: [2022-11-24 17:06:02,961] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 58 2: [2022-11-24 17:06:02,961] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 17 21: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 13: [2022-11-24 17:06:02,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 21: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 13: [2022-11-24 17:06:02,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 21: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 13: [2022-11-24 17:06:02,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 21: [2022-11-24 17:06:02,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 21: [2022-11-24 17:06:02,821] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 170 13: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 13: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_07-model_00-model_states.pt. 21: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 13: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 21: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:06:02,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 21: [2022-11-24 17:06:02,823] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 168 21: [2022-11-24 17:06:02,823] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 169 13: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 21: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 13: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 21: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 13: [2022-11-24 17:06:02,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 21: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 13: [2022-11-24 17:06:02,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 21: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 13: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 21: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 21: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 13: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 21: [2022-11-24 17:06:02,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 13: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 21: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 21: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 21: [2022-11-24 17:06:02,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 21: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 21: [2022-11-24 17:06:02,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 21: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 13: [2022-11-24 17:06:02,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 21: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 13: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 21: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 13: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 21: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 13: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 21: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 21: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 21: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 21: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 21: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 13: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt... 21: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 13: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 21: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 21: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 21: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 21: [2022-11-24 17:06:02,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 21: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 21: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_08-model_00-model_states.pt. 21: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 21: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 21: [2022-11-24 17:06:02,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 11: [2022-11-24 17:06:02,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 21: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 13: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 21: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 13: [2022-11-24 17:06:02,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 21: [2022-11-24 17:06:02,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 21: [2022-11-24 17:06:02,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 21: [2022-11-24 17:06:02,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 21: [2022-11-24 17:06:02,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 21: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 21: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 11: [2022-11-24 17:06:02,964] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 93 21: [2022-11-24 17:06:02,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:06:02,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,863] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 171 13: [2022-11-24 17:06:02,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 21: [2022-11-24 17:06:02,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 21: [2022-11-24 17:06:02,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 21: [2022-11-24 17:06:02,866] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 172 13: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 21: [2022-11-24 17:06:02,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 21: [2022-11-24 17:06:02,867] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 174 13: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt... 21: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 21: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 21: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 21: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 21: [2022-11-24 17:06:02,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 21: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:06:02,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 21: [2022-11-24 17:06:02,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:06:02,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 13: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 21: [2022-11-24 17:06:02,874] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 175 13: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 21: [2022-11-24 17:06:02,874] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 173 13: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 21: [2022-11-24 17:06:02,901] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 173 13: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 21: [2022-11-24 17:06:02,903] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 175 13: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 21: [2022-11-24 17:06:02,911] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 171 13: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 21: [2022-11-24 17:06:02,928] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 172 13: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 21: [2022-11-24 17:06:02,965] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 174 13: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 13: [2022-11-24 17:06:02,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 13: [2022-11-24 17:06:02,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:06:02,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 13: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 13: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 13: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 13: [2022-11-24 17:06:02,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:06:02,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,807] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 109 13: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:06:02,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 13: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 13: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:06:02,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:06:02,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:06:02,814] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 108 13: [2022-11-24 17:06:02,815] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 109 13: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:06:02,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,816] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 106 13: [2022-11-24 17:06:02,816] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 110 13: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 13: [2022-11-24 17:06:02,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 13: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 13: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:06:02,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 13: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:06:02,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:06:02,841] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 111 13: [2022-11-24 17:06:02,842] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 107 13: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:06:02,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 13: [2022-11-24 17:06:02,844] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 105 13: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 13: [2022-11-24 17:06:02,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 13: [2022-11-24 17:06:02,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:06:02,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:06:02,856] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 104 13: [2022-11-24 17:06:02,866] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 108 3: [2022-11-24 17:06:02,969] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 30 3: [2022-11-24 17:06:02,970] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 28 12: [2022-11-24 17:06:02,972] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 103 25: [2022-11-24 17:06:02,973] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 207 31: [2022-11-24 17:06:02,973] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 251 3: [2022-11-24 17:06:02,974] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 29 13: [2022-11-24 17:06:02,977] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 110 10: [2022-11-24 17:06:02,980] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 87 3: [2022-11-24 17:06:02,980] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 26 10: [2022-11-24 17:06:02,980] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 86 3: [2022-11-24 17:06:02,980] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 27 10: [2022-11-24 17:06:02,981] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 80 10: [2022-11-24 17:06:02,982] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 83 23: [2022-11-24 17:06:02,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_09-model_00-model_states.pt. 15: [2022-11-24 17:06:02,983] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 126 20: [2022-11-24 17:06:02,983] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 165 20: [2022-11-24 17:06:02,984] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 161 0: [2022-11-24 17:06:02,984] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 7 0: [2022-11-24 17:06:02,985] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 2 20: [2022-11-24 17:06:02,985] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 164 20: [2022-11-24 17:06:02,986] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 162 0: [2022-11-24 17:06:02,987] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 4 2: [2022-11-24 17:06:02,987] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 20 20: [2022-11-24 17:06:02,988] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 163 9: [2022-11-24 17:06:02,988] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 77 6: [2022-11-24 17:06:02,989] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 53 20: [2022-11-24 17:06:02,989] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 166 1: [2022-11-24 17:06:02,990] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 11 6: [2022-11-24 17:06:02,991] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 52 6: [2022-11-24 17:06:02,992] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 48 11: [2022-11-24 17:06:02,995] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 95 23: [2022-11-24 17:06:02,995] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 186 24: [2022-11-24 17:06:03,001] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 199 9: [2022-11-24 17:06:03,001] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 76 24: [2022-11-24 17:06:03,001] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 192 16: [2022-11-24 17:06:03,007] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 134 16: [2022-11-24 17:06:03,007] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 133 28: [2022-11-24 17:06:03,009] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 229 17: [2022-11-24 17:06:03,010] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 139 19: [2022-11-24 17:06:03,013] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 157 19: [2022-11-24 17:06:03,013] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 155 19: [2022-11-24 17:06:03,014] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 154 22: [2022-11-24 17:06:03,014] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 179 30: [2022-11-24 17:06:03,015] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 241 23: [2022-11-24 17:06:03,015] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 184 16: [2022-11-24 17:06:03,017] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 135 16: [2022-11-24 17:06:03,018] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 129 21: [2022-11-24 17:06:03,018] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 170 30: [2022-11-24 17:06:03,019] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 246 22: [2022-11-24 17:06:03,021] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 183 22: [2022-11-24 17:06:03,021] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 182 27: [2022-11-24 17:06:03,023] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 219 3: [2022-11-24 17:06:03,024] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 31 13: [2022-11-24 17:06:03,025] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 111 19: [2022-11-24 17:06:03,025] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 159 27: [2022-11-24 17:06:03,028] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 216 27: [2022-11-24 17:06:03,028] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 220 23: [2022-11-24 17:06:03,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 13: [2022-11-24 17:06:03,030] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 106 4: [2022-11-24 17:06:03,031] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 35 7: [2022-11-24 17:06:03,032] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 57 25: [2022-11-24 17:06:03,032] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 204 6: [2022-11-24 17:06:03,035] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 49 0: [2022-11-24 17:06:03,037] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 3 26: [2022-11-24 17:06:03,037] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 215 26: [2022-11-24 17:06:03,038] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 210 13: [2022-11-24 17:06:03,038] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 107 10: [2022-11-24 17:06:03,038] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 81 10: [2022-11-24 17:06:03,039] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 82 15: [2022-11-24 17:06:03,041] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 122 13: [2022-11-24 17:06:03,042] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 105 30: [2022-11-24 17:06:03,043] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 245 30: [2022-11-24 17:06:03,044] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 240 7: [2022-11-24 17:06:03,045] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 62 26: [2022-11-24 17:06:03,047] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 211 13: [2022-11-24 17:06:03,051] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 104 16: [2022-11-24 17:06:03,056] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 131 12: [2022-11-24 17:06:03,063] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 100 18: [2022-11-24 17:06:03,064] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 148 18: [2022-11-24 17:06:03,068] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 149 18: [2022-11-24 17:06:03,071] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 150 17: [2022-11-24 17:06:03,071] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 138 24: [2022-11-24 17:06:03,077] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 198 24: [2022-11-24 17:06:03,078] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 197 23: [2022-11-24 17:06:03,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 5: [2022-11-24 17:06:03,105] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 40 12: [2022-11-24 17:06:03,127] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 102 29: [2022-11-24 17:06:03,116] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 236 23: [2022-11-24 17:06:03,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt... 2: [2022-11-24 17:06:03,103] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 19 15: [2022-11-24 17:06:03,114] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 125 27: [2022-11-24 17:06:03,094] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 217 8: [2022-11-24 17:06:03,094] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 64 28: [2022-11-24 17:06:03,088] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 224 5: [2022-11-24 17:06:03,107] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 46 23: [2022-11-24 17:06:03,102] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 185 27: [2022-11-24 17:06:03,095] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 222 28: [2022-11-24 17:06:03,089] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 225 12: [2022-11-24 17:06:03,128] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 97 29: [2022-11-24 17:06:03,117] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 232 2: [2022-11-24 17:06:03,114] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 18 28: [2022-11-24 17:06:03,091] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 228 29: [2022-11-24 17:06:03,119] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 234 2: [2022-11-24 17:06:03,127] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 22 28: [2022-11-24 17:06:03,095] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 226 29: [2022-11-24 17:06:03,128] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 239 29: [2022-11-24 17:06:03,129] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 238 29: [2022-11-24 17:06:03,133] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 233 0: [2022-11-24 17:06:03,138] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 1 23: [2022-11-24 17:06:03,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_10-model_00-model_states.pt. 28: [2022-11-24 17:06:03,144] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 230 23: [2022-11-24 17:06:03,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 4: [2022-11-24 17:06:03,147] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 33 4: [2022-11-24 17:06:03,148] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 32 11: [2022-11-24 17:06:03,155] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 89 11: [2022-11-24 17:06:03,186] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 93 24: [2022-11-24 17:06:03,190] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 193 9: [2022-11-24 17:06:03,199] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 73 23: [2022-11-24 17:06:03,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 23: [2022-11-24 17:06:03,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt... 0: [2022-11-24 17:06:03,205] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 6 28: [2022-11-24 17:06:03,207] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 231 23: [2022-11-24 17:06:03,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_11-model_00-model_states.pt. 23: [2022-11-24 17:06:03,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 16: [2022-11-24 17:06:03,216] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 130 16: [2022-11-24 17:06:03,217] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 128 23: [2022-11-24 17:06:03,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 23: [2022-11-24 17:06:03,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt... 19: [2022-11-24 17:06:03,224] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 152 19: [2022-11-24 17:06:03,226] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 153 19: [2022-11-24 17:06:03,226] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 156 17: [2022-11-24 17:06:03,230] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 137 23: [2022-11-24 17:06:03,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_12-model_00-model_states.pt. 23: [2022-11-24 17:06:03,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 23: [2022-11-24 17:06:03,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 23: [2022-11-24 17:06:03,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt... 23: [2022-11-24 17:06:03,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/layer_14-model_00-model_states.pt. 24: [2022-11-24 17:06:03,236] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 194 21: [2022-11-24 17:06:03,238] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 169 23: [2022-11-24 17:06:03,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:06:03,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_83m/global_step36000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:06:03,244] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 191 6: [2022-11-24 17:06:03,245] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 55 21: [2022-11-24 17:06:03,268] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 168 23: [2022-11-24 17:06:03,272] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 191 18: [2022-11-24 17:06:03,277] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 145 19: [2022-11-24 17:06:03,285] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 158 17: [2022-11-24 17:06:03,285] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 142 18: [2022-11-24 17:06:03,311] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 151 17: [2022-11-24 17:06:03,324] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 141 17: [2022-11-24 17:06:03,357] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 143 17: [2022-11-24 17:06:03,358] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 140 0: successfully loaded checkpoint from checkpoints_83m at iteration 36000 31: time (ms) | load-checkpoint: 788.67 0: estimated model parameters: 0.08274176 0: estimated model parameters without embeddings: 0.04923648 0: [after model, optimizer, and learning rate scheduler are built] datetime: 2022-11-24 17:06:04 0: > building train, validation, and test datasets ... 0: > datasets target sizes (minimum size): 0: train: 9703701 0: validation: 9728 0: test: 256 0: > building train, validation, and test datasets for GPT ... 0: > building dataset index ... 0: reading sizes... 0: reading pointers... 0: reading document index... 0: creating numpy buffer of mmap... 0: creating memory view of numpy buffer... 0: > finished creating indexed dataset in 0.001445 seconds 0: number of documents: 210604984 0: > dataset split: 0: train: 0: document indices in [0, 199864130) total of 199864130 documents 0: validation: 0: document indices in [199864130, 210394379) total of 10530249 documents 0: test: 0: document indices in [210394379, 210604984) total of 210605 documents 0: > loading doc-idx mapping from /scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document_train_indexmap_9703701ns_2048sl_1234s_doc_idx.npy 0: > loading sample-idx mapping from /scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document_train_indexmap_9703701ns_2048sl_1234s_sample_idx.npy 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document_train_indexmap_9703701ns_2048sl_1234s_shuffle_idx.npy 0: loaded indexed file in 0.005 seconds 0: total number of samples: 173377817 0: total number of epochs: 1 0: > loading doc-idx mapping from /scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document_valid_indexmap_9728ns_2048sl_1234s_doc_idx.npy 0: > loading sample-idx mapping from /scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document_valid_indexmap_9728ns_2048sl_1234s_sample_idx.npy 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document_valid_indexmap_9728ns_2048sl_1234s_shuffle_idx.npy 0: loaded indexed file in 0.004 seconds 0: total number of samples: 9118345 0: total number of epochs: 1 0: > loading doc-idx mapping from /scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document_test_indexmap_256ns_2048sl_1234s_doc_idx.npy 0: > loading sample-idx mapping from /scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document_test_indexmap_256ns_2048sl_1234s_sample_idx.npy 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document_test_indexmap_256ns_2048sl_1234s_shuffle_idx.npy 0: loaded indexed file in 0.003 seconds 0: total number of samples: 182928 0: total number of epochs: 1 0: > finished creating GPT datasets ... 0: [after dataloaders are built] datetime: 2022-11-24 17:06:26 0: done with setup ... 0: training ... 0: Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: 31: time (ms) | model-and-optimizer-setup: 28597.99 | train/valid/test-data-iterators-setup: 21273.11 0: [000-000] 0.0827B / 0.0492B 0: [before the start of training step] datetime: 2022-11-24 17:06:26 0: [2022-11-24 17:06:26,829] [INFO] [checkpointing.py:553:forward] Activation Checkpointing Information 0: [2022-11-24 17:06:26,830] [INFO] [checkpointing.py:554:forward] ----Partition Activations False, CPU CHECKPOINTING False 0: [2022-11-24 17:06:26,830] [INFO] [checkpointing.py:557:forward] ----contiguous Memory Checkpointing False with None total layers 0: [2022-11-24 17:06:26,830] [INFO] [checkpointing.py:560:forward] ----Synchronization False 0: [2022-11-24 17:06:26,830] [INFO] [checkpointing.py:561:forward] ----Profiling time in checkpointing False 0: [Rank 0] (after 36010 iterations) memory (MB) | allocated: 1042.54638671875 | max allocated: 1703.8681640625 | reserved: 2544.0 | max reserved: 2544.0 31: iteration 36010/ 37905 | consumed samples: 9218560 | consumed tokens: 18879610880 | elapsed time per iteration (s): 1.13 | learning rate: 2.111E-05 | global batch size: 256 | lm loss: 2.875034E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 227.502 | TFLOPs: 1.45 | 31: iteration 36020/ 37905 | consumed samples: 9221120 | consumed tokens: 18884853760 | elapsed time per iteration (s): 0.22 | learning rate: 2.110E-05 | global batch size: 256 | lm loss: 2.886648E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1183.794 | TFLOPs: 7.54 | 31: iteration 36030/ 37905 | consumed samples: 9223680 | consumed tokens: 18890096640 | elapsed time per iteration (s): 0.22 | learning rate: 2.108E-05 | global batch size: 256 | lm loss: 2.865829E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1156.433 | TFLOPs: 7.37 | 31: iteration 36040/ 37905 | consumed samples: 9226240 | consumed tokens: 18895339520 | elapsed time per iteration (s): 0.23 | learning rate: 2.107E-05 | global batch size: 256 | lm loss: 2.881052E+00 | grad norm: 0.181 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1103.036 | TFLOPs: 7.03 | 31: iteration 36050/ 37905 | consumed samples: 9228800 | consumed tokens: 18900582400 | elapsed time per iteration (s): 0.30 | learning rate: 2.106E-05 | global batch size: 256 | lm loss: 2.905450E+00 | grad norm: 0.185 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 842.087 | TFLOPs: 5.36 | 31: iteration 36060/ 37905 | consumed samples: 9231360 | consumed tokens: 18905825280 | elapsed time per iteration (s): 0.28 | learning rate: 2.105E-05 | global batch size: 256 | lm loss: 2.913760E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 919.798 | TFLOPs: 5.86 | 31: iteration 36070/ 37905 | consumed samples: 9233920 | consumed tokens: 18911068160 | elapsed time per iteration (s): 0.31 | learning rate: 2.104E-05 | global batch size: 256 | lm loss: 2.899444E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 822.160 | TFLOPs: 5.24 | 31: iteration 36080/ 37905 | consumed samples: 9236480 | consumed tokens: 18916311040 | elapsed time per iteration (s): 0.25 | learning rate: 2.103E-05 | global batch size: 256 | lm loss: 2.876626E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1007.094 | TFLOPs: 6.41 | 31: iteration 36090/ 37905 | consumed samples: 9239040 | consumed tokens: 18921553920 | elapsed time per iteration (s): 0.25 | learning rate: 2.102E-05 | global batch size: 256 | lm loss: 2.869850E+00 | grad norm: 0.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1026.997 | TFLOPs: 6.54 | 31: iteration 36100/ 37905 | consumed samples: 9241600 | consumed tokens: 18926796800 | elapsed time per iteration (s): 0.26 | learning rate: 2.101E-05 | global batch size: 256 | lm loss: 2.925788E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 989.948 | TFLOPs: 6.30 | 31: iteration 36110/ 37905 | consumed samples: 9244160 | consumed tokens: 18932039680 | elapsed time per iteration (s): 0.30 | learning rate: 2.099E-05 | global batch size: 256 | lm loss: 2.901951E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 865.712 | TFLOPs: 5.51 | 31: iteration 36120/ 37905 | consumed samples: 9246720 | consumed tokens: 18937282560 | elapsed time per iteration (s): 0.24 | learning rate: 2.098E-05 | global batch size: 256 | lm loss: 2.886228E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1055.010 | TFLOPs: 6.72 | 31: iteration 36130/ 37905 | consumed samples: 9249280 | consumed tokens: 18942525440 | elapsed time per iteration (s): 0.21 | learning rate: 2.097E-05 | global batch size: 256 | lm loss: 2.864394E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1191.977 | TFLOPs: 7.59 | 31: iteration 36140/ 37905 | consumed samples: 9251840 | consumed tokens: 18947768320 | elapsed time per iteration (s): 0.22 | learning rate: 2.096E-05 | global batch size: 256 | lm loss: 2.870307E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1146.676 | TFLOPs: 7.30 | 31: iteration 36150/ 37905 | consumed samples: 9254400 | consumed tokens: 18953011200 | elapsed time per iteration (s): 0.24 | learning rate: 2.095E-05 | global batch size: 256 | lm loss: 2.926268E+00 | grad norm: 0.185 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1060.585 | TFLOPs: 6.75 | 31: iteration 36160/ 37905 | consumed samples: 9256960 | consumed tokens: 18958254080 | elapsed time per iteration (s): 0.29 | learning rate: 2.094E-05 | global batch size: 256 | lm loss: 2.925979E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 877.367 | TFLOPs: 5.59 | 31: iteration 36170/ 37905 | consumed samples: 9259520 | consumed tokens: 18963496960 | elapsed time per iteration (s): 0.26 | learning rate: 2.093E-05 | global batch size: 256 | lm loss: 2.908002E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 991.455 | TFLOPs: 6.31 | 31: iteration 36180/ 37905 | consumed samples: 9262080 | consumed tokens: 18968739840 | elapsed time per iteration (s): 0.24 | learning rate: 2.092E-05 | global batch size: 256 | lm loss: 2.891486E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1060.741 | TFLOPs: 6.76 | 31: iteration 36190/ 37905 | consumed samples: 9264640 | consumed tokens: 18973982720 | elapsed time per iteration (s): 0.28 | learning rate: 2.091E-05 | global batch size: 256 | lm loss: 2.885267E+00 | grad norm: 0.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 910.763 | TFLOPs: 5.80 | 31: iteration 36200/ 37905 | consumed samples: 9267200 | consumed tokens: 18979225600 | elapsed time per iteration (s): 0.23 | learning rate: 2.090E-05 | global batch size: 256 | lm loss: 2.914770E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1131.859 | TFLOPs: 7.21 | 31: iteration 36210/ 37905 | consumed samples: 9269760 | consumed tokens: 18984468480 | elapsed time per iteration (s): 0.25 | learning rate: 2.089E-05 | global batch size: 256 | lm loss: 2.879210E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1037.116 | TFLOPs: 6.61 | 31: iteration 36220/ 37905 | consumed samples: 9272320 | consumed tokens: 18989711360 | elapsed time per iteration (s): 0.22 | learning rate: 2.088E-05 | global batch size: 256 | lm loss: 2.916792E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1168.566 | TFLOPs: 7.44 | 31: iteration 36230/ 37905 | consumed samples: 9274880 | consumed tokens: 18994954240 | elapsed time per iteration (s): 0.21 | learning rate: 2.087E-05 | global batch size: 256 | lm loss: 2.907865E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1217.897 | TFLOPs: 7.76 | 31: iteration 36240/ 37905 | consumed samples: 9277440 | consumed tokens: 19000197120 | elapsed time per iteration (s): 0.23 | learning rate: 2.086E-05 | global batch size: 256 | lm loss: 2.906250E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1112.036 | TFLOPs: 7.08 | 31: iteration 36250/ 37905 | consumed samples: 9280000 | consumed tokens: 19005440000 | elapsed time per iteration (s): 0.30 | learning rate: 2.085E-05 | global batch size: 256 | lm loss: 2.873895E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 839.966 | TFLOPs: 5.35 | 31: iteration 36260/ 37905 | consumed samples: 9282560 | consumed tokens: 19010682880 | elapsed time per iteration (s): 0.21 | learning rate: 2.084E-05 | global batch size: 256 | lm loss: 2.863333E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1221.544 | TFLOPs: 7.78 | 31: iteration 36270/ 37905 | consumed samples: 9285120 | consumed tokens: 19015925760 | elapsed time per iteration (s): 0.22 | learning rate: 2.083E-05 | global batch size: 256 | lm loss: 2.908844E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1146.904 | TFLOPs: 7.30 | 31: iteration 36280/ 37905 | consumed samples: 9287680 | consumed tokens: 19021168640 | elapsed time per iteration (s): 0.20 | learning rate: 2.082E-05 | global batch size: 256 | lm loss: 2.878504E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1292.360 | TFLOPs: 8.23 | 31: iteration 36290/ 37905 | consumed samples: 9290240 | consumed tokens: 19026411520 | elapsed time per iteration (s): 0.23 | learning rate: 2.081E-05 | global batch size: 256 | lm loss: 2.909422E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1122.660 | TFLOPs: 7.15 | 31: iteration 36300/ 37905 | consumed samples: 9292800 | consumed tokens: 19031654400 | elapsed time per iteration (s): 0.19 | learning rate: 2.080E-05 | global batch size: 256 | lm loss: 2.912795E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1367.999 | TFLOPs: 8.71 | 31: iteration 36310/ 37905 | consumed samples: 9295360 | consumed tokens: 19036897280 | elapsed time per iteration (s): 0.20 | learning rate: 2.079E-05 | global batch size: 256 | lm loss: 2.908192E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1277.109 | TFLOPs: 8.13 | 31: iteration 36320/ 37905 | consumed samples: 9297920 | consumed tokens: 19042140160 | elapsed time per iteration (s): 0.19 | learning rate: 2.078E-05 | global batch size: 256 | lm loss: 2.877349E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1360.378 | TFLOPs: 8.66 | 31: iteration 36330/ 37905 | consumed samples: 9300480 | consumed tokens: 19047383040 | elapsed time per iteration (s): 0.18 | learning rate: 2.077E-05 | global batch size: 256 | lm loss: 2.855331E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1409.661 | TFLOPs: 8.98 | 31: iteration 36340/ 37905 | consumed samples: 9303040 | consumed tokens: 19052625920 | elapsed time per iteration (s): 0.19 | learning rate: 2.076E-05 | global batch size: 256 | lm loss: 2.895671E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1321.992 | TFLOPs: 8.42 | 31: iteration 36350/ 37905 | consumed samples: 9305600 | consumed tokens: 19057868800 | elapsed time per iteration (s): 0.19 | learning rate: 2.075E-05 | global batch size: 256 | lm loss: 2.861573E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1332.339 | TFLOPs: 8.49 | 31: iteration 36360/ 37905 | consumed samples: 9308160 | consumed tokens: 19063111680 | elapsed time per iteration (s): 0.21 | learning rate: 2.074E-05 | global batch size: 256 | lm loss: 2.903195E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1245.169 | TFLOPs: 7.93 | 31: iteration 36370/ 37905 | consumed samples: 9310720 | consumed tokens: 19068354560 | elapsed time per iteration (s): 0.19 | learning rate: 2.073E-05 | global batch size: 256 | lm loss: 2.890572E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1351.750 | TFLOPs: 8.61 | 31: iteration 36380/ 37905 | consumed samples: 9313280 | consumed tokens: 19073597440 | elapsed time per iteration (s): 0.19 | learning rate: 2.072E-05 | global batch size: 256 | lm loss: 2.911003E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1347.164 | TFLOPs: 8.58 | 31: iteration 36390/ 37905 | consumed samples: 9315840 | consumed tokens: 19078840320 | elapsed time per iteration (s): 0.20 | learning rate: 2.071E-05 | global batch size: 256 | lm loss: 2.882359E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1272.122 | TFLOPs: 8.10 | 31: iteration 36400/ 37905 | consumed samples: 9318400 | consumed tokens: 19084083200 | elapsed time per iteration (s): 0.20 | learning rate: 2.070E-05 | global batch size: 256 | lm loss: 2.912443E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1251.648 | TFLOPs: 7.97 | 31: iteration 36410/ 37905 | consumed samples: 9320960 | consumed tokens: 19089326080 | elapsed time per iteration (s): 0.19 | learning rate: 2.069E-05 | global batch size: 256 | lm loss: 2.871431E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1321.396 | TFLOPs: 8.42 | 31: iteration 36420/ 37905 | consumed samples: 9323520 | consumed tokens: 19094568960 | elapsed time per iteration (s): 0.18 | learning rate: 2.068E-05 | global batch size: 256 | lm loss: 2.874770E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1439.613 | TFLOPs: 9.17 | 31: iteration 36430/ 37905 | consumed samples: 9326080 | consumed tokens: 19099811840 | elapsed time per iteration (s): 0.19 | learning rate: 2.067E-05 | global batch size: 256 | lm loss: 2.890210E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1369.065 | TFLOPs: 8.72 | 31: iteration 36440/ 37905 | consumed samples: 9328640 | consumed tokens: 19105054720 | elapsed time per iteration (s): 0.22 | learning rate: 2.066E-05 | global batch size: 256 | lm loss: 2.869658E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1145.108 | TFLOPs: 7.29 | 31: iteration 36450/ 37905 | consumed samples: 9331200 | consumed tokens: 19110297600 | elapsed time per iteration (s): 0.23 | learning rate: 2.065E-05 | global batch size: 256 | lm loss: 2.908603E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1126.942 | TFLOPs: 7.18 | 31: iteration 36460/ 37905 | consumed samples: 9333760 | consumed tokens: 19115540480 | elapsed time per iteration (s): 0.21 | learning rate: 2.064E-05 | global batch size: 256 | lm loss: 2.884865E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1246.050 | TFLOPs: 7.94 | 31: iteration 36470/ 37905 | consumed samples: 9336320 | consumed tokens: 19120783360 | elapsed time per iteration (s): 0.20 | learning rate: 2.064E-05 | global batch size: 256 | lm loss: 2.872897E+00 | grad norm: 0.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1289.569 | TFLOPs: 8.21 | 31: iteration 36480/ 37905 | consumed samples: 9338880 | consumed tokens: 19126026240 | elapsed time per iteration (s): 0.19 | learning rate: 2.063E-05 | global batch size: 256 | lm loss: 2.898441E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1326.955 | TFLOPs: 8.45 | 31: iteration 36490/ 37905 | consumed samples: 9341440 | consumed tokens: 19131269120 | elapsed time per iteration (s): 0.19 | learning rate: 2.062E-05 | global batch size: 256 | lm loss: 2.908790E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1359.530 | TFLOPs: 8.66 | 31: iteration 36500/ 37905 | consumed samples: 9344000 | consumed tokens: 19136512000 | elapsed time per iteration (s): 0.19 | learning rate: 2.061E-05 | global batch size: 256 | lm loss: 2.859162E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1365.318 | TFLOPs: 8.70 | 31: iteration 36510/ 37905 | consumed samples: 9346560 | consumed tokens: 19141754880 | elapsed time per iteration (s): 0.19 | learning rate: 2.060E-05 | global batch size: 256 | lm loss: 2.901369E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1358.146 | TFLOPs: 8.65 | 31: iteration 36520/ 37905 | consumed samples: 9349120 | consumed tokens: 19146997760 | elapsed time per iteration (s): 0.20 | learning rate: 2.059E-05 | global batch size: 256 | lm loss: 2.859150E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1256.662 | TFLOPs: 8.00 | 31: iteration 36530/ 37905 | consumed samples: 9351680 | consumed tokens: 19152240640 | elapsed time per iteration (s): 0.19 | learning rate: 2.058E-05 | global batch size: 256 | lm loss: 2.892040E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1369.588 | TFLOPs: 8.72 | 31: iteration 36540/ 37905 | consumed samples: 9354240 | consumed tokens: 19157483520 | elapsed time per iteration (s): 0.19 | learning rate: 2.058E-05 | global batch size: 256 | lm loss: 2.892833E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1336.422 | TFLOPs: 8.51 | 31: iteration 36550/ 37905 | consumed samples: 9356800 | consumed tokens: 19162726400 | elapsed time per iteration (s): 0.20 | learning rate: 2.057E-05 | global batch size: 256 | lm loss: 2.890966E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1280.684 | TFLOPs: 8.16 | 31: iteration 36560/ 37905 | consumed samples: 9359360 | consumed tokens: 19167969280 | elapsed time per iteration (s): 0.18 | learning rate: 2.056E-05 | global batch size: 256 | lm loss: 2.902261E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1384.632 | TFLOPs: 8.82 | 31: iteration 36570/ 37905 | consumed samples: 9361920 | consumed tokens: 19173212160 | elapsed time per iteration (s): 0.19 | learning rate: 2.055E-05 | global batch size: 256 | lm loss: 2.873650E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1350.440 | TFLOPs: 8.60 | 31: iteration 36580/ 37905 | consumed samples: 9364480 | consumed tokens: 19178455040 | elapsed time per iteration (s): 0.19 | learning rate: 2.054E-05 | global batch size: 256 | lm loss: 2.861152E+00 | grad norm: 0.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1360.166 | TFLOPs: 8.66 | 31: iteration 36590/ 37905 | consumed samples: 9367040 | consumed tokens: 19183697920 | elapsed time per iteration (s): 0.19 | learning rate: 2.053E-05 | global batch size: 256 | lm loss: 2.879736E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1356.293 | TFLOPs: 8.64 | 31: iteration 36600/ 37905 | consumed samples: 9369600 | consumed tokens: 19188940800 | elapsed time per iteration (s): 0.19 | learning rate: 2.053E-05 | global batch size: 256 | lm loss: 2.841673E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1317.084 | TFLOPs: 8.39 | 31: iteration 36610/ 37905 | consumed samples: 9372160 | consumed tokens: 19194183680 | elapsed time per iteration (s): 0.19 | learning rate: 2.052E-05 | global batch size: 256 | lm loss: 2.921107E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1347.097 | TFLOPs: 8.58 | 31: iteration 36620/ 37905 | consumed samples: 9374720 | consumed tokens: 19199426560 | elapsed time per iteration (s): 0.18 | learning rate: 2.051E-05 | global batch size: 256 | lm loss: 2.891282E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1397.380 | TFLOPs: 8.90 | 31: iteration 36630/ 37905 | consumed samples: 9377280 | consumed tokens: 19204669440 | elapsed time per iteration (s): 0.20 | learning rate: 2.050E-05 | global batch size: 256 | lm loss: 2.837898E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1305.183 | TFLOPs: 8.31 | 31: iteration 36640/ 37905 | consumed samples: 9379840 | consumed tokens: 19209912320 | elapsed time per iteration (s): 0.18 | learning rate: 2.049E-05 | global batch size: 256 | lm loss: 2.883154E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1413.042 | TFLOPs: 9.00 | 31: iteration 36650/ 37905 | consumed samples: 9382400 | consumed tokens: 19215155200 | elapsed time per iteration (s): 0.19 | learning rate: 2.049E-05 | global batch size: 256 | lm loss: 2.869710E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1353.578 | TFLOPs: 8.62 | 31: iteration 36660/ 37905 | consumed samples: 9384960 | consumed tokens: 19220398080 | elapsed time per iteration (s): 0.19 | learning rate: 2.048E-05 | global batch size: 256 | lm loss: 2.890990E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1313.808 | TFLOPs: 8.37 | 31: iteration 36670/ 37905 | consumed samples: 9387520 | consumed tokens: 19225640960 | elapsed time per iteration (s): 0.22 | learning rate: 2.047E-05 | global batch size: 256 | lm loss: 2.905231E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1182.158 | TFLOPs: 7.53 | 31: iteration 36680/ 37905 | consumed samples: 9390080 | consumed tokens: 19230883840 | elapsed time per iteration (s): 0.20 | learning rate: 2.046E-05 | global batch size: 256 | lm loss: 2.890004E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1279.816 | TFLOPs: 8.15 | 31: iteration 36690/ 37905 | consumed samples: 9392640 | consumed tokens: 19236126720 | elapsed time per iteration (s): 0.21 | learning rate: 2.046E-05 | global batch size: 256 | lm loss: 2.886239E+00 | grad norm: 0.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1245.270 | TFLOPs: 7.93 | 31: iteration 36700/ 37905 | consumed samples: 9395200 | consumed tokens: 19241369600 | elapsed time per iteration (s): 0.19 | learning rate: 2.045E-05 | global batch size: 256 | lm loss: 2.884292E+00 | grad norm: 0.184 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1330.523 | TFLOPs: 8.47 | 31: iteration 36710/ 37905 | consumed samples: 9397760 | consumed tokens: 19246612480 | elapsed time per iteration (s): 0.19 | learning rate: 2.044E-05 | global batch size: 256 | lm loss: 2.872428E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1315.466 | TFLOPs: 8.38 | 31: iteration 36720/ 37905 | consumed samples: 9400320 | consumed tokens: 19251855360 | elapsed time per iteration (s): 0.18 | learning rate: 2.043E-05 | global batch size: 256 | lm loss: 2.869320E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1390.349 | TFLOPs: 8.85 | 31: iteration 36730/ 37905 | consumed samples: 9402880 | consumed tokens: 19257098240 | elapsed time per iteration (s): 0.18 | learning rate: 2.043E-05 | global batch size: 256 | lm loss: 2.892738E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1402.523 | TFLOPs: 8.93 | 31: iteration 36740/ 37905 | consumed samples: 9405440 | consumed tokens: 19262341120 | elapsed time per iteration (s): 0.18 | learning rate: 2.042E-05 | global batch size: 256 | lm loss: 2.901523E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1402.434 | TFLOPs: 8.93 | 31: iteration 36750/ 37905 | consumed samples: 9408000 | consumed tokens: 19267584000 | elapsed time per iteration (s): 0.18 | learning rate: 2.041E-05 | global batch size: 256 | lm loss: 2.876865E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1422.311 | TFLOPs: 9.06 | 31: iteration 36760/ 37905 | consumed samples: 9410560 | consumed tokens: 19272826880 | elapsed time per iteration (s): 0.20 | learning rate: 2.041E-05 | global batch size: 256 | lm loss: 2.895394E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1307.389 | TFLOPs: 8.33 | 31: iteration 36770/ 37905 | consumed samples: 9413120 | consumed tokens: 19278069760 | elapsed time per iteration (s): 0.19 | learning rate: 2.040E-05 | global batch size: 256 | lm loss: 2.878531E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1340.737 | TFLOPs: 8.54 | 31: iteration 36780/ 37905 | consumed samples: 9415680 | consumed tokens: 19283312640 | elapsed time per iteration (s): 0.18 | learning rate: 2.039E-05 | global batch size: 256 | lm loss: 2.917804E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1431.500 | TFLOPs: 9.12 | 31: iteration 36790/ 37905 | consumed samples: 9418240 | consumed tokens: 19288555520 | elapsed time per iteration (s): 0.20 | learning rate: 2.038E-05 | global batch size: 256 | lm loss: 2.904743E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1290.449 | TFLOPs: 8.22 | 31: iteration 36800/ 37905 | consumed samples: 9420800 | consumed tokens: 19293798400 | elapsed time per iteration (s): 0.18 | learning rate: 2.038E-05 | global batch size: 256 | lm loss: 2.870599E+00 | grad norm: 0.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1454.534 | TFLOPs: 9.26 | 31: iteration 36810/ 37905 | consumed samples: 9423360 | consumed tokens: 19299041280 | elapsed time per iteration (s): 0.20 | learning rate: 2.037E-05 | global batch size: 256 | lm loss: 2.895813E+00 | grad norm: 0.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1302.421 | TFLOPs: 8.29 | 31: iteration 36820/ 37905 | consumed samples: 9425920 | consumed tokens: 19304284160 | elapsed time per iteration (s): 0.18 | learning rate: 2.036E-05 | global batch size: 256 | lm loss: 2.901664E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1392.883 | TFLOPs: 8.87 | 31: iteration 36830/ 37905 | consumed samples: 9428480 | consumed tokens: 19309527040 | elapsed time per iteration (s): 0.19 | learning rate: 2.036E-05 | global batch size: 256 | lm loss: 2.899883E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1380.356 | TFLOPs: 8.79 | 31: iteration 36840/ 37905 | consumed samples: 9431040 | consumed tokens: 19314769920 | elapsed time per iteration (s): 0.21 | learning rate: 2.035E-05 | global batch size: 256 | lm loss: 2.873335E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1207.281 | TFLOPs: 7.69 | 31: iteration 36850/ 37905 | consumed samples: 9433600 | consumed tokens: 19320012800 | elapsed time per iteration (s): 0.18 | learning rate: 2.034E-05 | global batch size: 256 | lm loss: 2.909697E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1419.896 | TFLOPs: 9.04 | 31: iteration 36860/ 37905 | consumed samples: 9436160 | consumed tokens: 19325255680 | elapsed time per iteration (s): 0.19 | learning rate: 2.034E-05 | global batch size: 256 | lm loss: 2.896464E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1383.513 | TFLOPs: 8.81 | 31: iteration 36870/ 37905 | consumed samples: 9438720 | consumed tokens: 19330498560 | elapsed time per iteration (s): 0.20 | learning rate: 2.033E-05 | global batch size: 256 | lm loss: 2.861786E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1259.062 | TFLOPs: 8.02 | 31: iteration 36880/ 37905 | consumed samples: 9441280 | consumed tokens: 19335741440 | elapsed time per iteration (s): 0.18 | learning rate: 2.032E-05 | global batch size: 256 | lm loss: 2.854629E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1438.264 | TFLOPs: 9.16 | 31: iteration 36890/ 37905 | consumed samples: 9443840 | consumed tokens: 19340984320 | elapsed time per iteration (s): 0.18 | learning rate: 2.032E-05 | global batch size: 256 | lm loss: 2.875157E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1403.619 | TFLOPs: 8.94 | 31: iteration 36900/ 37905 | consumed samples: 9446400 | consumed tokens: 19346227200 | elapsed time per iteration (s): 0.19 | learning rate: 2.031E-05 | global batch size: 256 | lm loss: 2.921170E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1346.945 | TFLOPs: 8.58 | 31: iteration 36910/ 37905 | consumed samples: 9448960 | consumed tokens: 19351470080 | elapsed time per iteration (s): 0.18 | learning rate: 2.031E-05 | global batch size: 256 | lm loss: 2.924317E+00 | grad norm: 0.180 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1423.314 | TFLOPs: 9.06 | 31: iteration 36920/ 37905 | consumed samples: 9451520 | consumed tokens: 19356712960 | elapsed time per iteration (s): 0.20 | learning rate: 2.030E-05 | global batch size: 256 | lm loss: 2.893167E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1269.729 | TFLOPs: 8.09 | 31: iteration 36930/ 37905 | consumed samples: 9454080 | consumed tokens: 19361955840 | elapsed time per iteration (s): 0.19 | learning rate: 2.029E-05 | global batch size: 256 | lm loss: 2.908124E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1382.666 | TFLOPs: 8.81 | 31: iteration 36940/ 37905 | consumed samples: 9456640 | consumed tokens: 19367198720 | elapsed time per iteration (s): 0.19 | learning rate: 2.029E-05 | global batch size: 256 | lm loss: 2.926399E+00 | grad norm: 0.182 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1354.013 | TFLOPs: 8.62 | 31: iteration 36950/ 37905 | consumed samples: 9459200 | consumed tokens: 19372441600 | elapsed time per iteration (s): 0.19 | learning rate: 2.028E-05 | global batch size: 256 | lm loss: 2.885104E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1323.865 | TFLOPs: 8.43 | 31: iteration 36960/ 37905 | consumed samples: 9461760 | consumed tokens: 19377684480 | elapsed time per iteration (s): 0.18 | learning rate: 2.028E-05 | global batch size: 256 | lm loss: 2.866648E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1407.151 | TFLOPs: 8.96 | 31: iteration 36970/ 37905 | consumed samples: 9464320 | consumed tokens: 19382927360 | elapsed time per iteration (s): 0.19 | learning rate: 2.027E-05 | global batch size: 256 | lm loss: 2.908297E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1364.717 | TFLOPs: 8.69 | 31: iteration 36980/ 37905 | consumed samples: 9466880 | consumed tokens: 19388170240 | elapsed time per iteration (s): 0.18 | learning rate: 2.026E-05 | global batch size: 256 | lm loss: 2.902516E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1385.900 | TFLOPs: 8.83 | 31: iteration 36990/ 37905 | consumed samples: 9469440 | consumed tokens: 19393413120 | elapsed time per iteration (s): 0.19 | learning rate: 2.026E-05 | global batch size: 256 | lm loss: 2.872168E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1321.558 | TFLOPs: 8.42 | 31: iteration 37000/ 37905 | consumed samples: 9472000 | consumed tokens: 19398656000 | elapsed time per iteration (s): 0.18 | learning rate: 2.025E-05 | global batch size: 256 | lm loss: 2.880073E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1392.667 | TFLOPs: 8.87 | 31: ------------------------------------------------------------------------------------------- 31: valid loss at iteration 37000 | lm loss value: 2.859078E+00 | lm loss PPL: 1.744544E+01 | 31: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 37000 to checkpoints_83m 0: [2022-11-24 17:10:03,373] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step37000 is begin to save! 0: [2022-11-24 17:10:03,606] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/layer_01-model_00-model_states.pt... 0: [2022-11-24 17:10:03,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/layer_01-model_00-model_states.pt. 0: [2022-11-24 17:10:03,720] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/layer_03-model_00-model_states.pt... 0: [2022-11-24 17:10:03,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:10:03,733] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/layer_04-model_00-model_states.pt... 0: [2022-11-24 17:10:03,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/layer_04-model_00-model_states.pt. 0: [2022-11-24 17:10:03,745] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/layer_05-model_00-model_states.pt... 0: [2022-11-24 17:10:03,756] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:10:03,756] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/layer_06-model_00-model_states.pt... 0: [2022-11-24 17:10:03,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/layer_06-model_00-model_states.pt. 0: [2022-11-24 17:10:03,767] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/layer_07-model_00-model_states.pt... 0: [2022-11-24 17:10:03,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:10:03,778] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:10:03,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/layer_08-model_00-model_states.pt. 0: [2022-11-24 17:10:03,789] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:10:03,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:10:03,800] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/layer_10-model_00-model_states.pt... 0: [2022-11-24 17:10:03,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/layer_10-model_00-model_states.pt. 0: [2022-11-24 17:10:03,811] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/layer_11-model_00-model_states.pt... 0: [2022-11-24 17:10:03,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:10:03,823] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/layer_12-model_00-model_states.pt... 0: [2022-11-24 17:10:03,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/layer_12-model_00-model_states.pt. 0: [2022-11-24 17:10:03,834] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/layer_14-model_00-model_states.pt... 0: [2022-11-24 17:10:03,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/layer_14-model_00-model_states.pt. 0: [2022-11-24 17:10:03,836] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_83m/global_step37000/mp_rank_00_model_states.pt 0: [2022-11-24 17:10:03,836] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/mp_rank_00_model_states.pt... 0: [2022-11-24 17:10:03,838] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/mp_rank_00_model_states.pt. 0: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt... 14: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... 14: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... 14: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... 14: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... 14: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... 14: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt... 14: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... 14: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:10:03,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:10:04,004] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:10:04,004] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt 12: [2022-11-24 17:10:04,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 8: [2022-11-24 17:10:04,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:10:04,005] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt 8: [2022-11-24 17:10:04,005] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 26: [2022-11-24 17:10:04,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt. 26: [2022-11-24 17:10:04,005] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt 26: [2022-11-24 17:10:04,005] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2022-11-24 17:10:04,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:10:04,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:10:04,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:10:04,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt 17: [2022-11-24 17:10:04,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 27: [2022-11-24 17:10:04,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt 27: [2022-11-24 17:10:04,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2022-11-24 17:10:04,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:10:04,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-24 17:10:04,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2022-11-24 17:10:04,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:10:04,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:10:04,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 22: [2022-11-24 17:10:04,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt 6: [2022-11-24 17:10:04,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2022-11-24 17:10:04,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:10:04,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 9: [2022-11-24 17:10:04,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. 3: [2022-11-24 17:10:04,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:10:04,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt 1: [2022-11-24 17:10:04,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 16: [2022-11-24 17:10:04,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:10:04,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2022-11-24 17:10:04,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 16: [2022-11-24 17:10:04,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt 3: [2022-11-24 17:10:04,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 15: [2022-11-24 17:10:04,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:10:04,007] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2022-11-24 17:10:04,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 25: [2022-11-24 17:10:04,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:10:04,007] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt 15: [2022-11-24 17:10:04,007] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 11: [2022-11-24 17:10:04,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:10:04,007] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt 11: [2022-11-24 17:10:04,007] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt 11: [2022-11-24 17:10:04,007] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 25: [2022-11-24 17:10:04,007] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 28: [2022-11-24 17:10:04,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:10:04,007] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt 28: [2022-11-24 17:10:04,007] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 20: [2022-11-24 17:10:04,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:10:04,007] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt 20: [2022-11-24 17:10:04,007] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 31: [2022-11-24 17:10:04,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt. 31: [2022-11-24 17:10:04,007] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt 31: [2022-11-24 17:10:04,008] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 23: [2022-11-24 17:10:04,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:10:04,008] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt 23: [2022-11-24 17:10:04,008] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 30: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:10:04,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt 12: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 12: [2022-11-24 17:10:04,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt 19: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 19: [2022-11-24 17:10:04,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt 24: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 24: [2022-11-24 17:10:04,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt 4: [2022-11-24 17:10:04,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 14: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 14: [2022-11-24 17:10:04,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt 8: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. 14: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 8: [2022-11-24 17:10:04,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt 5: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:10:04,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2022-11-24 17:10:04,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 22: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:10:04,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt 21: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:10:04,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt 22: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 13: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 21: [2022-11-24 17:10:04,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt 21: [2022-11-24 17:10:04,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 17: [2022-11-24 17:10:04,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:10:04,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt 17: [2022-11-24 17:10:04,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 26: [2022-11-24 17:10:04,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:10:04,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:10:04,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt 26: [2022-11-24 17:10:04,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt 16: [2022-11-24 17:10:04,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:10:04,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2022-11-24 17:10:04,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 26: [2022-11-24 17:10:04,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 16: [2022-11-24 17:10:04,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt 6: [2022-11-24 17:10:04,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 16: [2022-11-24 17:10:04,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2022-11-24 17:10:04,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 15: [2022-11-24 17:10:04,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:10:04,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:10:04,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:10:04,011] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt 15: [2022-11-24 17:10:04,011] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt 25: [2022-11-24 17:10:04,011] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt 28: [2022-11-24 17:10:04,011] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 15: [2022-11-24 17:10:04,011] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 25: [2022-11-24 17:10:04,011] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 11: [2022-11-24 17:10:04,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:10:04,011] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt 11: [2022-11-24 17:10:04,011] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2022-11-24 17:10:04,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:10:04,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:10:04,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-24 17:10:04,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2022-11-24 17:10:04,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-24 17:10:04,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 27: [2022-11-24 17:10:04,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:10:04,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt 27: [2022-11-24 17:10:04,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2022-11-24 17:10:04,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-24 17:10:04,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-24 17:10:04,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2022-11-24 17:10:04,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:10:04,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-24 17:10:04,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2022-11-24 17:10:04,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:10:04,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:10:04,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 19: [2022-11-24 17:10:04,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt 5: [2022-11-24 17:10:04,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 19: [2022-11-24 17:10:04,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 21: [2022-11-24 17:10:04,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:10:04,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt 21: [2022-11-24 17:10:04,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 24: [2022-11-24 17:10:04,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:10:04,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt 24: [2022-11-24 17:10:04,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 13: [2022-11-24 17:10:04,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:10:04,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt 13: [2022-11-24 17:10:04,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 20: [2022-11-24 17:10:04,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:10:04,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt 20: [2022-11-24 17:10:04,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 31: [2022-11-24 17:10:04,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt. 31: [2022-11-24 17:10:04,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt 31: [2022-11-24 17:10:04,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 23: [2022-11-24 17:10:04,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:10:04,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt 23: [2022-11-24 17:10:04,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 30: [2022-11-24 17:10:04,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:10:04,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt 30: [2022-11-24 17:10:04,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 14: [2022-11-24 17:10:04,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. 14: [2022-11-24 17:10:04,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt 14: [2022-11-24 17:10:04,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2022-11-24 17:10:04,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:10:04,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:10:04,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-24 17:10:04,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 8: [2022-11-24 17:10:04,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt 12: [2022-11-24 17:10:04,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:10:04,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 12: [2022-11-24 17:10:04,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt 22: [2022-11-24 17:10:04,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:10:04,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 22: [2022-11-24 17:10:04,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt 22: [2022-11-24 17:10:04,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 17: [2022-11-24 17:10:04,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:10:04,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt 17: [2022-11-24 17:10:04,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 16: [2022-11-24 17:10:04,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:10:04,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt 9: [2022-11-24 17:10:04,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:10:04,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 9: [2022-11-24 17:10:04,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt 9: [2022-11-24 17:10:04,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 26: [2022-11-24 17:10:04,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt. 26: [2022-11-24 17:10:04,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt 26: [2022-11-24 17:10:04,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2022-11-24 17:10:04,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:10:04,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:10:04,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 25: [2022-11-24 17:10:04,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt 6: [2022-11-24 17:10:04,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 25: [2022-11-24 17:10:04,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 28: [2022-11-24 17:10:04,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:10:04,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt 28: [2022-11-24 17:10:04,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 15: [2022-11-24 17:10:04,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:10:04,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt 15: [2022-11-24 17:10:04,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2022-11-24 17:10:04,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:10:04,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-24 17:10:04,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 11: [2022-11-24 17:10:04,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:10:04,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt 11: [2022-11-24 17:10:04,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 27: [2022-11-24 17:10:04,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:10:04,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt 27: [2022-11-24 17:10:04,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2022-11-24 17:10:04,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:10:04,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-24 17:10:04,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2022-11-24 17:10:04,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-24 17:10:04,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 7: [2022-11-24 17:10:04,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 3: [2022-11-24 17:10:04,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2022-11-24 17:10:04,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-24 17:10:04,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 19: [2022-11-24 17:10:04,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:10:04,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt 19: [2022-11-24 17:10:04,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 21: [2022-11-24 17:10:04,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:10:04,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:10:04,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 21: [2022-11-24 17:10:04,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt 5: [2022-11-24 17:10:04,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 21: [2022-11-24 17:10:04,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 13: [2022-11-24 17:10:04,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:10:04,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt 13: [2022-11-24 17:10:04,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 24: [2022-11-24 17:10:04,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:10:04,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt 24: [2022-11-24 17:10:04,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 31: [2022-11-24 17:10:04,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt. 31: [2022-11-24 17:10:04,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt 0: [2022-11-24 17:10:04,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 31: [2022-11-24 17:10:04,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 20: [2022-11-24 17:10:04,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:10:04,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 20: [2022-11-24 17:10:04,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt 20: [2022-11-24 17:10:04,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 23: [2022-11-24 17:10:04,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:10:04,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt 23: [2022-11-24 17:10:04,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 30: [2022-11-24 17:10:04,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:10:04,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt 14: [2022-11-24 17:10:04,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:10:04,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 14: [2022-11-24 17:10:04,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt 14: [2022-11-24 17:10:04,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 12: [2022-11-24 17:10:04,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:10:04,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt 12: [2022-11-24 17:10:04,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2022-11-24 17:10:04,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:10:04,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-24 17:10:04,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 8: [2022-11-24 17:10:04,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:10:04,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt 8: [2022-11-24 17:10:04,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 17: [2022-11-24 17:10:04,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:10:04,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt 17: [2022-11-24 17:10:04,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 22: [2022-11-24 17:10:04,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:10:04,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt 22: [2022-11-24 17:10:04,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 9: [2022-11-24 17:10:04,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:10:04,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt 9: [2022-11-24 17:10:04,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 26: [2022-11-24 17:10:04,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:10:04,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:10:04,024] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 26: [2022-11-24 17:10:04,024] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt 16: [2022-11-24 17:10:04,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:10:04,024] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 26: [2022-11-24 17:10:04,024] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 16: [2022-11-24 17:10:04,024] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt 16: [2022-11-24 17:10:04,024] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 25: [2022-11-24 17:10:04,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:10:04,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt 25: [2022-11-24 17:10:04,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 15: [2022-11-24 17:10:04,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:10:04,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt 28: [2022-11-24 17:10:04,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:10:04,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 28: [2022-11-24 17:10:04,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt 28: [2022-11-24 17:10:04,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 11: [2022-11-24 17:10:04,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:10:04,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:10:04,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt 11: [2022-11-24 17:10:04,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2022-11-24 17:10:04,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-24 17:10:04,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 27: [2022-11-24 17:10:04,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:10:04,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt 27: [2022-11-24 17:10:04,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2022-11-24 17:10:04,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:10:04,026] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-24 17:10:04,026] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2022-11-24 17:10:04,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-24 17:10:04,026] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-24 17:10:04,026] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2022-11-24 17:10:04,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:10:04,026] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-24 17:10:04,026] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 19: [2022-11-24 17:10:04,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:10:04,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt 19: [2022-11-24 17:10:04,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 21: [2022-11-24 17:10:04,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:10:04,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt 21: [2022-11-24 17:10:04,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2022-11-24 17:10:04,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:10:04,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-24 17:10:04,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 13: [2022-11-24 17:10:04,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:10:04,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt 13: [2022-11-24 17:10:04,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 24: [2022-11-24 17:10:04,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:10:04,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt 24: [2022-11-24 17:10:04,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 31: [2022-11-24 17:10:04,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:10:04,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:10:04,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt 31: [2022-11-24 17:10:04,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt 20: [2022-11-24 17:10:04,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 31: [2022-11-24 17:10:04,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2022-11-24 17:10:04,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:10:04,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-24 17:10:04,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 23: [2022-11-24 17:10:04,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:10:04,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:10:04,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 30: [2022-11-24 17:10:04,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:10:04,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt 2: [2022-11-24 17:10:04,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 30: [2022-11-24 17:10:04,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt 23: [2022-11-24 17:10:04,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 30: [2022-11-24 17:10:04,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2022-11-24 17:10:04,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:10:04,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-24 17:10:04,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:10:04,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2022-11-24 17:10:04,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-24 17:10:04,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 14: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. 14: [2022-11-24 17:10:04,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt 14: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 10: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:10:04,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt 10: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 8: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:10:04,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 8: [2022-11-24 17:10:04,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt 12: [2022-11-24 17:10:04,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt 2: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 8: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 18: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 18: [2022-11-24 17:10:04,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt 18: [2022-11-24 17:10:04,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt 18: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 18: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 29: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:10:04,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt 18: [2022-11-24 17:10:04,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt 29: [2022-11-24 17:10:04,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt 18: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 18: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 29: [2022-11-24 17:10:04,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt 29: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 29: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 29: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:10:04,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt 4: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:10:04,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:10:04,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt 4: [2022-11-24 17:10:04,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 29: [2022-11-24 17:10:04,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 10: [2022-11-24 17:10:04,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2022-11-24 17:10:04,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 29: [2022-11-24 17:10:04,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt 29: [2022-11-24 17:10:04,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 10: [2022-11-24 17:10:04,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:10:04,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:10:04,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:10:04,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt 22: [2022-11-24 17:10:04,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:10:04,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt 17: [2022-11-24 17:10:04,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 22: [2022-11-24 17:10:04,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt 10: [2022-11-24 17:10:04,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt 22: [2022-11-24 17:10:04,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 10: [2022-11-24 17:10:04,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 10: [2022-11-24 17:10:04,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 9: [2022-11-24 17:10:04,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:10:04,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt 29: [2022-11-24 17:10:04,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:10:04,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 29: [2022-11-24 17:10:04,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt 29: [2022-11-24 17:10:04,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 26: [2022-11-24 17:10:04,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt. 26: [2022-11-24 17:10:04,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt 26: [2022-11-24 17:10:04,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 16: [2022-11-24 17:10:04,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:10:04,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt 16: [2022-11-24 17:10:04,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 25: [2022-11-24 17:10:04,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:10:04,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:10:04,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt 25: [2022-11-24 17:10:04,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2022-11-24 17:10:04,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-24 17:10:04,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 15: [2022-11-24 17:10:04,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:10:04,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt 15: [2022-11-24 17:10:04,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 28: [2022-11-24 17:10:04,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:10:04,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt 28: [2022-11-24 17:10:04,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2022-11-24 17:10:04,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:10:04,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-24 17:10:04,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 27: [2022-11-24 17:10:04,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:10:04,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt 27: [2022-11-24 17:10:04,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2022-11-24 17:10:04,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:10:04,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-24 17:10:04,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 11: [2022-11-24 17:10:04,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:10:04,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt 11: [2022-11-24 17:10:04,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2022-11-24 17:10:04,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-24 17:10:04,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 18: [2022-11-24 17:10:04,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt. 3: [2022-11-24 17:10:04,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 18: [2022-11-24 17:10:04,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt 18: [2022-11-24 17:10:04,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2022-11-24 17:10:04,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:10:04,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-24 17:10:04,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 19: [2022-11-24 17:10:04,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:10:04,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt 19: [2022-11-24 17:10:04,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 21: [2022-11-24 17:10:04,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:10:04,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt 21: [2022-11-24 17:10:04,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2022-11-24 17:10:04,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:10:04,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-24 17:10:04,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 24: [2022-11-24 17:10:04,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:10:04,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt 24: [2022-11-24 17:10:04,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 31: [2022-11-24 17:10:04,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt. 31: [2022-11-24 17:10:04,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt 13: [2022-11-24 17:10:04,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. 31: [2022-11-24 17:10:04,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 13: [2022-11-24 17:10:04,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt 13: [2022-11-24 17:10:04,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 20: [2022-11-24 17:10:04,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:10:04,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt 20: [2022-11-24 17:10:04,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 23: [2022-11-24 17:10:04,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:10:04,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt 23: [2022-11-24 17:10:04,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 30: [2022-11-24 17:10:04,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:10:04,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt 30: [2022-11-24 17:10:04,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 14: [2022-11-24 17:10:04,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. 14: [2022-11-24 17:10:04,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt 14: [2022-11-24 17:10:04,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 10: [2022-11-24 17:10:04,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:10:04,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt 10: [2022-11-24 17:10:04,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2022-11-24 17:10:04,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:10:04,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-24 17:10:04,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 12: [2022-11-24 17:10:04,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:10:04,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt 12: [2022-11-24 17:10:04,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 8: [2022-11-24 17:10:04,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:10:04,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt 8: [2022-11-24 17:10:04,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2022-11-24 17:10:04,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:10:04,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-24 17:10:04,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 22: [2022-11-24 17:10:04,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:10:04,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt 22: [2022-11-24 17:10:04,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 17: [2022-11-24 17:10:04,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:10:04,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt 17: [2022-11-24 17:10:04,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 29: [2022-11-24 17:10:04,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:10:04,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt 29: [2022-11-24 17:10:04,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 9: [2022-11-24 17:10:04,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:10:04,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt 9: [2022-11-24 17:10:04,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 26: [2022-11-24 17:10:04,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt. 26: [2022-11-24 17:10:04,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt 26: [2022-11-24 17:10:04,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2022-11-24 17:10:04,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:10:04,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:10:04,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt 6: [2022-11-24 17:10:04,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 16: [2022-11-24 17:10:04,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2022-11-24 17:10:04,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 25: [2022-11-24 17:10:04,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:10:04,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt 25: [2022-11-24 17:10:04,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 15: [2022-11-24 17:10:04,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:10:04,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt 15: [2022-11-24 17:10:04,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 11: [2022-11-24 17:10:04,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:10:04,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:10:04,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt 28: [2022-11-24 17:10:04,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt 11: [2022-11-24 17:10:04,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 28: [2022-11-24 17:10:04,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2022-11-24 17:10:04,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:10:04,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-24 17:10:04,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 27: [2022-11-24 17:10:04,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:10:04,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt 27: [2022-11-24 17:10:04,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2022-11-24 17:10:04,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:10:04,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-24 17:10:04,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2022-11-24 17:10:04,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-24 17:10:04,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-24 17:10:04,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 18: [2022-11-24 17:10:04,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:10:04,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt 18: [2022-11-24 17:10:04,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2022-11-24 17:10:04,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:10:04,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-24 17:10:04,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 19: [2022-11-24 17:10:04,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:10:04,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt 19: [2022-11-24 17:10:04,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2022-11-24 17:10:04,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:10:04,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-24 17:10:04,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 21: [2022-11-24 17:10:04,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:10:04,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:10:04,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt 24: [2022-11-24 17:10:04,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt 21: [2022-11-24 17:10:04,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 24: [2022-11-24 17:10:04,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 31: [2022-11-24 17:10:04,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt. 31: [2022-11-24 17:10:04,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt 13: [2022-11-24 17:10:04,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. 31: [2022-11-24 17:10:04,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 13: [2022-11-24 17:10:04,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt 13: [2022-11-24 17:10:04,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 20: [2022-11-24 17:10:04,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:10:04,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt 20: [2022-11-24 17:10:04,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 23: [2022-11-24 17:10:04,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:10:04,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt 23: [2022-11-24 17:10:04,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 30: [2022-11-24 17:10:04,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:10:04,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt 30: [2022-11-24 17:10:04,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 14: [2022-11-24 17:10:04,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. 14: [2022-11-24 17:10:04,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt 14: [2022-11-24 17:10:04,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 10: [2022-11-24 17:10:04,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:10:04,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt 10: [2022-11-24 17:10:04,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2022-11-24 17:10:04,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:10:04,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-24 17:10:04,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 12: [2022-11-24 17:10:04,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:10:04,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:10:04,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt 12: [2022-11-24 17:10:04,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 8: [2022-11-24 17:10:04,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt 8: [2022-11-24 17:10:04,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2022-11-24 17:10:04,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:10:04,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:10:04,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-24 17:10:04,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 22: [2022-11-24 17:10:04,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt 22: [2022-11-24 17:10:04,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 17: [2022-11-24 17:10:04,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:10:04,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt 17: [2022-11-24 17:10:04,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 29: [2022-11-24 17:10:04,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:10:04,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt 29: [2022-11-24 17:10:04,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 9: [2022-11-24 17:10:04,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:10:04,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt 26: [2022-11-24 17:10:04,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:10:04,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 26: [2022-11-24 17:10:04,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt 26: [2022-11-24 17:10:04,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 16: [2022-11-24 17:10:04,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:10:04,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt 16: [2022-11-24 17:10:04,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2022-11-24 17:10:04,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:10:04,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-24 17:10:04,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 25: [2022-11-24 17:10:04,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:10:04,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt 25: [2022-11-24 17:10:04,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 15: [2022-11-24 17:10:04,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:10:04,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt 15: [2022-11-24 17:10:04,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 11: [2022-11-24 17:10:04,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:10:04,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt 11: [2022-11-24 17:10:04,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 28: [2022-11-24 17:10:04,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:10:04,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:10:04,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt 1: [2022-11-24 17:10:04,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-24 17:10:04,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 28: [2022-11-24 17:10:04,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 27: [2022-11-24 17:10:04,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:10:04,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt 27: [2022-11-24 17:10:04,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2022-11-24 17:10:04,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:10:04,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:10:04,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 18: [2022-11-24 17:10:04,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt 0: [2022-11-24 17:10:04,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 18: [2022-11-24 17:10:04,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2022-11-24 17:10:04,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-24 17:10:04,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-24 17:10:04,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2022-11-24 17:10:04,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:10:04,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:10:04,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 19: [2022-11-24 17:10:04,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt 7: [2022-11-24 17:10:04,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 19: [2022-11-24 17:10:04,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 21: [2022-11-24 17:10:04,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:10:04,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt 21: [2022-11-24 17:10:04,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2022-11-24 17:10:04,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:10:04,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-24 17:10:04,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 24: [2022-11-24 17:10:04,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:10:04,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt 24: [2022-11-24 17:10:04,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 13: [2022-11-24 17:10:04,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:10:04,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt 13: [2022-11-24 17:10:04,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 31: [2022-11-24 17:10:04,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt. 31: [2022-11-24 17:10:04,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt 31: [2022-11-24 17:10:04,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 20: [2022-11-24 17:10:04,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:10:04,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt 20: [2022-11-24 17:10:04,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 23: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt 23: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 30: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt 30: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 14: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. 14: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt 10: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. 14: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 10: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt 10: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 15: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt 30: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 5: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt 6: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 27: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt 22: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:10:04,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt 13: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 7: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 23: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt. 26: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt. 3: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 31: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt. 14: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:10:04,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt 2: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt 20: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 10: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt 25: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt 30: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt 17: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 21: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2022-11-24 17:10:04,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 12: [2022-11-24 17:10:04,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt 29: [2022-11-24 17:10:04,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt 23: [2022-11-24 17:10:04,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt 26: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt 3: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 31: [2022-11-24 17:10:04,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt 14: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt 1: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 19: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 27: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 22: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt 24: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt 4: [2022-11-24 17:10:04,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 8: [2022-11-24 17:10:04,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt 16: [2022-11-24 17:10:04,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt 18: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 25: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 30: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 9: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt 13: [2022-11-24 17:10:04,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt 7: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 12: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 29: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 23: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 26: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 31: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 14: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 22: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 24: [2022-11-24 17:10:04,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt 10: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 8: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 16: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 9: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 17: [2022-11-24 17:10:04,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt 21: [2022-11-24 17:10:04,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt 13: [2022-11-24 17:10:04,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 24: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 17: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 21: [2022-11-24 17:10:04,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: successfully saved checkpoint at iteration 37000 to checkpoints_83m 31: time (ms) | save-checkpoint: 683.16 31: iteration 37010/ 37905 | consumed samples: 9474560 | consumed tokens: 19403898880 | elapsed time per iteration (s): 0.32 | learning rate: 2.025E-05 | global batch size: 256 | lm loss: 2.908644E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 812.330 | TFLOPs: 5.17 | 31: iteration 37020/ 37905 | consumed samples: 9477120 | consumed tokens: 19409141760 | elapsed time per iteration (s): 0.25 | learning rate: 2.024E-05 | global batch size: 256 | lm loss: 2.861787E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1018.631 | TFLOPs: 6.49 | 31: iteration 37030/ 37905 | consumed samples: 9479680 | consumed tokens: 19414384640 | elapsed time per iteration (s): 0.23 | learning rate: 2.024E-05 | global batch size: 256 | lm loss: 2.901721E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1098.796 | TFLOPs: 7.00 | 31: iteration 37040/ 37905 | consumed samples: 9482240 | consumed tokens: 19419627520 | elapsed time per iteration (s): 0.24 | learning rate: 2.023E-05 | global batch size: 256 | lm loss: 2.881049E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1064.154 | TFLOPs: 6.78 | 31: iteration 37050/ 37905 | consumed samples: 9484800 | consumed tokens: 19424870400 | elapsed time per iteration (s): 0.25 | learning rate: 2.023E-05 | global batch size: 256 | lm loss: 2.905183E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1034.975 | TFLOPs: 6.59 | 31: iteration 37060/ 37905 | consumed samples: 9487360 | consumed tokens: 19430113280 | elapsed time per iteration (s): 0.26 | learning rate: 2.022E-05 | global batch size: 256 | lm loss: 2.882003E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1000.768 | TFLOPs: 6.37 | 31: iteration 37070/ 37905 | consumed samples: 9489920 | consumed tokens: 19435356160 | elapsed time per iteration (s): 0.24 | learning rate: 2.022E-05 | global batch size: 256 | lm loss: 2.877896E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1065.754 | TFLOPs: 6.79 | 31: iteration 37080/ 37905 | consumed samples: 9492480 | consumed tokens: 19440599040 | elapsed time per iteration (s): 0.27 | learning rate: 2.021E-05 | global batch size: 256 | lm loss: 2.879283E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 950.016 | TFLOPs: 6.05 | 31: iteration 37090/ 37905 | consumed samples: 9495040 | consumed tokens: 19445841920 | elapsed time per iteration (s): 0.23 | learning rate: 2.021E-05 | global batch size: 256 | lm loss: 2.907372E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1120.462 | TFLOPs: 7.14 | 31: iteration 37100/ 37905 | consumed samples: 9497600 | consumed tokens: 19451084800 | elapsed time per iteration (s): 0.20 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 2.876361E+00 | grad norm: 0.182 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1266.688 | TFLOPs: 8.07 | 31: iteration 37110/ 37905 | consumed samples: 9500160 | consumed tokens: 19456327680 | elapsed time per iteration (s): 0.23 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 2.911300E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1118.006 | TFLOPs: 7.12 | 31: iteration 37120/ 37905 | consumed samples: 9502720 | consumed tokens: 19461570560 | elapsed time per iteration (s): 0.21 | learning rate: 2.019E-05 | global batch size: 256 | lm loss: 2.858933E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1196.532 | TFLOPs: 7.62 | 31: iteration 37130/ 37905 | consumed samples: 9505280 | consumed tokens: 19466813440 | elapsed time per iteration (s): 0.26 | learning rate: 2.019E-05 | global batch size: 256 | lm loss: 2.936755E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 995.961 | TFLOPs: 6.34 | 31: iteration 37140/ 37905 | consumed samples: 9507840 | consumed tokens: 19472056320 | elapsed time per iteration (s): 0.26 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 2.879073E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 979.663 | TFLOPs: 6.24 | 31: iteration 37150/ 37905 | consumed samples: 9510400 | consumed tokens: 19477299200 | elapsed time per iteration (s): 0.23 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 2.890731E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1112.519 | TFLOPs: 7.09 | 31: iteration 37160/ 37905 | consumed samples: 9512960 | consumed tokens: 19482542080 | elapsed time per iteration (s): 0.22 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 2.907544E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1161.499 | TFLOPs: 7.40 | 31: iteration 37170/ 37905 | consumed samples: 9515520 | consumed tokens: 19487784960 | elapsed time per iteration (s): 0.22 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 2.862463E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1159.759 | TFLOPs: 7.39 | 31: iteration 37180/ 37905 | consumed samples: 9518080 | consumed tokens: 19493027840 | elapsed time per iteration (s): 0.25 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 2.889137E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1020.889 | TFLOPs: 6.50 | 31: iteration 37190/ 37905 | consumed samples: 9520640 | consumed tokens: 19498270720 | elapsed time per iteration (s): 0.27 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 2.883177E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 948.178 | TFLOPs: 6.04 | 31: iteration 37200/ 37905 | consumed samples: 9523200 | consumed tokens: 19503513600 | elapsed time per iteration (s): 0.22 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 2.868573E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1169.778 | TFLOPs: 7.45 | 31: iteration 37210/ 37905 | consumed samples: 9525760 | consumed tokens: 19508756480 | elapsed time per iteration (s): 0.25 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 2.884327E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1021.714 | TFLOPs: 6.51 | 31: iteration 37220/ 37905 | consumed samples: 9528320 | consumed tokens: 19513999360 | elapsed time per iteration (s): 0.27 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 2.905689E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 961.549 | TFLOPs: 6.12 | 31: iteration 37230/ 37905 | consumed samples: 9530880 | consumed tokens: 19519242240 | elapsed time per iteration (s): 0.25 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 2.909416E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1037.292 | TFLOPs: 6.61 | 31: iteration 37240/ 37905 | consumed samples: 9533440 | consumed tokens: 19524485120 | elapsed time per iteration (s): 0.24 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 2.858282E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1073.459 | TFLOPs: 6.84 | 31: iteration 37250/ 37905 | consumed samples: 9536000 | consumed tokens: 19529728000 | elapsed time per iteration (s): 0.26 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 2.894456E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 990.320 | TFLOPs: 6.31 | 31: iteration 37260/ 37905 | consumed samples: 9538560 | consumed tokens: 19534970880 | elapsed time per iteration (s): 0.23 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 2.870854E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1089.655 | TFLOPs: 6.94 | 31: iteration 37270/ 37905 | consumed samples: 9541120 | consumed tokens: 19540213760 | elapsed time per iteration (s): 0.26 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 2.919415E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 995.596 | TFLOPs: 6.34 | 31: iteration 37280/ 37905 | consumed samples: 9543680 | consumed tokens: 19545456640 | elapsed time per iteration (s): 0.24 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 2.899510E+00 | grad norm: 0.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1060.928 | TFLOPs: 6.76 | 31: iteration 37290/ 37905 | consumed samples: 9546240 | consumed tokens: 19550699520 | elapsed time per iteration (s): 0.27 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 2.893467E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 945.100 | TFLOPs: 6.02 | 31: iteration 37300/ 37905 | consumed samples: 9548800 | consumed tokens: 19555942400 | elapsed time per iteration (s): 0.27 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 2.885123E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 962.175 | TFLOPs: 6.13 | 31: iteration 37310/ 37905 | consumed samples: 9551360 | consumed tokens: 19561185280 | elapsed time per iteration (s): 0.25 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 2.891409E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1029.495 | TFLOPs: 6.56 | 31: iteration 37320/ 37905 | consumed samples: 9553920 | consumed tokens: 19566428160 | elapsed time per iteration (s): 0.25 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 2.887790E+00 | grad norm: 0.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1021.657 | TFLOPs: 6.51 | 31: iteration 37330/ 37905 | consumed samples: 9556480 | consumed tokens: 19571671040 | elapsed time per iteration (s): 0.25 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 2.884281E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1010.417 | TFLOPs: 6.44 | 31: iteration 37340/ 37905 | consumed samples: 9559040 | consumed tokens: 19576913920 | elapsed time per iteration (s): 0.25 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 2.872714E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1028.063 | TFLOPs: 6.55 | 31: iteration 37350/ 37905 | consumed samples: 9561600 | consumed tokens: 19582156800 | elapsed time per iteration (s): 0.24 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 2.887691E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1075.468 | TFLOPs: 6.85 | 31: iteration 37360/ 37905 | consumed samples: 9564160 | consumed tokens: 19587399680 | elapsed time per iteration (s): 0.27 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 2.894159E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 960.245 | TFLOPs: 6.12 | 31: iteration 37370/ 37905 | consumed samples: 9566720 | consumed tokens: 19592642560 | elapsed time per iteration (s): 0.24 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 2.870950E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1074.295 | TFLOPs: 6.84 | 31: iteration 37380/ 37905 | consumed samples: 9569280 | consumed tokens: 19597885440 | elapsed time per iteration (s): 0.26 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 2.925849E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 991.666 | TFLOPs: 6.32 | 31: iteration 37390/ 37905 | consumed samples: 9571840 | consumed tokens: 19603128320 | elapsed time per iteration (s): 0.23 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 2.874467E+00 | grad norm: 0.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1110.369 | TFLOPs: 7.07 | 31: iteration 37400/ 37905 | consumed samples: 9574400 | consumed tokens: 19608371200 | elapsed time per iteration (s): 0.23 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 2.875175E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1094.027 | TFLOPs: 6.97 | 31: iteration 37410/ 37905 | consumed samples: 9576960 | consumed tokens: 19613614080 | elapsed time per iteration (s): 0.26 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 2.842850E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 996.729 | TFLOPs: 6.35 | 31: iteration 37420/ 37905 | consumed samples: 9579520 | consumed tokens: 19618856960 | elapsed time per iteration (s): 0.27 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 2.864546E+00 | grad norm: 0.179 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 961.306 | TFLOPs: 6.12 | 31: iteration 37430/ 37905 | consumed samples: 9582080 | consumed tokens: 19624099840 | elapsed time per iteration (s): 0.27 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 2.881429E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 960.545 | TFLOPs: 6.12 | 31: iteration 37440/ 37905 | consumed samples: 9584640 | consumed tokens: 19629342720 | elapsed time per iteration (s): 0.25 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 2.867531E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1021.570 | TFLOPs: 6.51 | 31: iteration 37450/ 37905 | consumed samples: 9587200 | consumed tokens: 19634585600 | elapsed time per iteration (s): 0.24 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 2.899371E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1068.492 | TFLOPs: 6.81 | 31: iteration 37460/ 37905 | consumed samples: 9589760 | consumed tokens: 19639828480 | elapsed time per iteration (s): 0.21 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 2.900565E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1231.631 | TFLOPs: 7.84 | 31: iteration 37470/ 37905 | consumed samples: 9592320 | consumed tokens: 19645071360 | elapsed time per iteration (s): 0.24 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 2.910359E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1065.133 | TFLOPs: 6.78 | 31: iteration 37480/ 37905 | consumed samples: 9594880 | consumed tokens: 19650314240 | elapsed time per iteration (s): 0.24 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 2.884232E+00 | grad norm: 0.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1086.515 | TFLOPs: 6.92 | 31: iteration 37490/ 37905 | consumed samples: 9597440 | consumed tokens: 19655557120 | elapsed time per iteration (s): 0.24 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 2.896680E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1048.586 | TFLOPs: 6.68 | 31: iteration 37500/ 37905 | consumed samples: 9600000 | consumed tokens: 19660800000 | elapsed time per iteration (s): 0.25 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 2.906549E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1024.666 | TFLOPs: 6.53 | 31: iteration 37510/ 37905 | consumed samples: 9602560 | consumed tokens: 19666042880 | elapsed time per iteration (s): 0.28 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 2.897319E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 928.297 | TFLOPs: 5.91 | 31: iteration 37520/ 37905 | consumed samples: 9605120 | consumed tokens: 19671285760 | elapsed time per iteration (s): 0.25 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 2.854589E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1018.339 | TFLOPs: 6.49 | 31: iteration 37530/ 37905 | consumed samples: 9607680 | consumed tokens: 19676528640 | elapsed time per iteration (s): 0.24 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 2.887199E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1079.021 | TFLOPs: 6.87 | 31: iteration 37540/ 37905 | consumed samples: 9610240 | consumed tokens: 19681771520 | elapsed time per iteration (s): 0.22 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 2.810824E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1158.875 | TFLOPs: 7.38 | 31: iteration 37550/ 37905 | consumed samples: 9612800 | consumed tokens: 19687014400 | elapsed time per iteration (s): 0.25 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 2.869536E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1010.034 | TFLOPs: 6.43 | 31: iteration 37560/ 37905 | consumed samples: 9615360 | consumed tokens: 19692257280 | elapsed time per iteration (s): 0.24 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 2.840566E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1059.379 | TFLOPs: 6.75 | 31: iteration 37570/ 37905 | consumed samples: 9617920 | consumed tokens: 19697500160 | elapsed time per iteration (s): 0.25 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.887637E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1018.624 | TFLOPs: 6.49 | 31: iteration 37580/ 37905 | consumed samples: 9620480 | consumed tokens: 19702743040 | elapsed time per iteration (s): 0.24 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.924227E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1058.951 | TFLOPs: 6.74 | 31: iteration 37590/ 37905 | consumed samples: 9623040 | consumed tokens: 19707985920 | elapsed time per iteration (s): 0.23 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.864487E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1121.640 | TFLOPs: 7.14 | 31: iteration 37600/ 37905 | consumed samples: 9625600 | consumed tokens: 19713228800 | elapsed time per iteration (s): 0.23 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.909999E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1123.757 | TFLOPs: 7.16 | 31: iteration 37610/ 37905 | consumed samples: 9628160 | consumed tokens: 19718471680 | elapsed time per iteration (s): 0.27 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.895321E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 954.399 | TFLOPs: 6.08 | 31: iteration 37620/ 37905 | consumed samples: 9630720 | consumed tokens: 19723714560 | elapsed time per iteration (s): 0.24 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.915758E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1055.012 | TFLOPs: 6.72 | 31: iteration 37630/ 37905 | consumed samples: 9633280 | consumed tokens: 19728957440 | elapsed time per iteration (s): 0.23 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.868991E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1122.157 | TFLOPs: 7.15 | 31: iteration 37640/ 37905 | consumed samples: 9635840 | consumed tokens: 19734200320 | elapsed time per iteration (s): 0.24 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.876520E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1068.526 | TFLOPs: 6.81 | 31: iteration 37650/ 37905 | consumed samples: 9638400 | consumed tokens: 19739443200 | elapsed time per iteration (s): 0.23 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.884371E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1121.396 | TFLOPs: 7.14 | 31: iteration 37660/ 37905 | consumed samples: 9640960 | consumed tokens: 19744686080 | elapsed time per iteration (s): 0.23 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.869857E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1089.894 | TFLOPs: 6.94 | 31: iteration 37670/ 37905 | consumed samples: 9643520 | consumed tokens: 19749928960 | elapsed time per iteration (s): 0.23 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.855338E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1132.925 | TFLOPs: 7.22 | 31: iteration 37680/ 37905 | consumed samples: 9646080 | consumed tokens: 19755171840 | elapsed time per iteration (s): 0.26 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.880435E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 993.472 | TFLOPs: 6.33 | 31: iteration 37690/ 37905 | consumed samples: 9648640 | consumed tokens: 19760414720 | elapsed time per iteration (s): 0.23 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.846978E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1117.426 | TFLOPs: 7.12 | 31: iteration 37700/ 37905 | consumed samples: 9651200 | consumed tokens: 19765657600 | elapsed time per iteration (s): 0.25 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.925961E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1025.670 | TFLOPs: 6.53 | 31: iteration 37710/ 37905 | consumed samples: 9653760 | consumed tokens: 19770900480 | elapsed time per iteration (s): 0.27 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.931453E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 960.190 | TFLOPs: 6.12 | 31: iteration 37720/ 37905 | consumed samples: 9656320 | consumed tokens: 19776143360 | elapsed time per iteration (s): 0.24 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.917255E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1084.486 | TFLOPs: 6.91 | 31: iteration 37730/ 37905 | consumed samples: 9658880 | consumed tokens: 19781386240 | elapsed time per iteration (s): 0.25 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.889783E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1029.260 | TFLOPs: 6.56 | 31: iteration 37740/ 37905 | consumed samples: 9661440 | consumed tokens: 19786629120 | elapsed time per iteration (s): 0.23 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.895304E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1121.248 | TFLOPs: 7.14 | 31: iteration 37750/ 37905 | consumed samples: 9664000 | consumed tokens: 19791872000 | elapsed time per iteration (s): 0.26 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.886700E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 986.213 | TFLOPs: 6.28 | 31: iteration 37760/ 37905 | consumed samples: 9666560 | consumed tokens: 19797114880 | elapsed time per iteration (s): 0.26 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.879058E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 991.591 | TFLOPs: 6.32 | 31: iteration 37770/ 37905 | consumed samples: 9669120 | consumed tokens: 19802357760 | elapsed time per iteration (s): 0.26 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.881442E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 982.331 | TFLOPs: 6.26 | 31: iteration 37780/ 37905 | consumed samples: 9671680 | consumed tokens: 19807600640 | elapsed time per iteration (s): 0.22 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.891814E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1139.352 | TFLOPs: 7.26 | 31: iteration 37790/ 37905 | consumed samples: 9674240 | consumed tokens: 19812843520 | elapsed time per iteration (s): 0.23 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.873662E+00 | grad norm: 0.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1095.187 | TFLOPs: 6.98 | 31: iteration 37800/ 37905 | consumed samples: 9676800 | consumed tokens: 19818086400 | elapsed time per iteration (s): 0.25 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.856424E+00 | grad norm: 0.185 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1033.149 | TFLOPs: 6.58 | 31: iteration 37810/ 37905 | consumed samples: 9679360 | consumed tokens: 19823329280 | elapsed time per iteration (s): 0.23 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.843416E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1096.965 | TFLOPs: 6.99 | 31: iteration 37820/ 37905 | consumed samples: 9681920 | consumed tokens: 19828572160 | elapsed time per iteration (s): 0.27 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.873091E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 944.774 | TFLOPs: 6.02 | 31: iteration 37830/ 37905 | consumed samples: 9684480 | consumed tokens: 19833815040 | elapsed time per iteration (s): 0.25 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.917586E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1008.794 | TFLOPs: 6.42 | 31: iteration 37840/ 37905 | consumed samples: 9687040 | consumed tokens: 19839057920 | elapsed time per iteration (s): 0.25 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.860797E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1012.706 | TFLOPs: 6.45 | 31: iteration 37850/ 37905 | consumed samples: 9689600 | consumed tokens: 19844300800 | elapsed time per iteration (s): 0.26 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.887733E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 992.282 | TFLOPs: 6.32 | 31: iteration 37860/ 37905 | consumed samples: 9692160 | consumed tokens: 19849543680 | elapsed time per iteration (s): 0.23 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.911167E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1121.920 | TFLOPs: 7.15 | 31: iteration 37870/ 37905 | consumed samples: 9694720 | consumed tokens: 19854786560 | elapsed time per iteration (s): 0.24 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.897402E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1050.629 | TFLOPs: 6.69 | 31: iteration 37880/ 37905 | consumed samples: 9697280 | consumed tokens: 19860029440 | elapsed time per iteration (s): 0.25 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.891303E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1034.542 | TFLOPs: 6.59 | 31: iteration 37890/ 37905 | consumed samples: 9699840 | consumed tokens: 19865272320 | elapsed time per iteration (s): 0.24 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.901179E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1067.682 | TFLOPs: 6.80 | 31: iteration 37900/ 37905 | consumed samples: 9702400 | consumed tokens: 19870515200 | elapsed time per iteration (s): 0.23 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.862957E+00 | grad norm: 0.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1112.287 | TFLOPs: 7.08 | 0: [after training is done] datetime: 2022-11-24 17:13:45 0: saving checkpoint at iteration 37905 to checkpoints_83m 31: ------------------------------------------------------------------------------------------------------------ 31: valid loss at the end of training for val data | lm loss value: 2.851934E+00 | lm loss PPL: 1.732125E+01 | 31: ------------------------------------------------------------------------------------------------------------ 0: [2022-11-24 17:13:45,254] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step37905 is begin to save! 0: [2022-11-24 17:13:45,265] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/layer_01-model_00-model_states.pt... 0: [2022-11-24 17:13:45,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/layer_01-model_00-model_states.pt. 0: [2022-11-24 17:13:45,338] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/layer_03-model_00-model_states.pt... 0: [2022-11-24 17:13:45,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/layer_03-model_00-model_states.pt. 0: [2022-11-24 17:13:45,351] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/layer_04-model_00-model_states.pt... 0: [2022-11-24 17:13:45,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/layer_04-model_00-model_states.pt. 0: [2022-11-24 17:13:45,362] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/layer_05-model_00-model_states.pt... 0: [2022-11-24 17:13:45,373] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/layer_05-model_00-model_states.pt. 0: [2022-11-24 17:13:45,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/layer_06-model_00-model_states.pt... 0: [2022-11-24 17:13:45,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/layer_06-model_00-model_states.pt. 0: [2022-11-24 17:13:45,384] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/layer_07-model_00-model_states.pt... 0: [2022-11-24 17:13:45,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/layer_07-model_00-model_states.pt. 0: [2022-11-24 17:13:45,395] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/layer_08-model_00-model_states.pt... 0: [2022-11-24 17:13:45,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/layer_08-model_00-model_states.pt. 0: [2022-11-24 17:13:45,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/layer_09-model_00-model_states.pt... 0: [2022-11-24 17:13:45,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/layer_09-model_00-model_states.pt. 0: [2022-11-24 17:13:45,418] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/layer_10-model_00-model_states.pt... 0: [2022-11-24 17:13:45,429] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/layer_10-model_00-model_states.pt. 0: [2022-11-24 17:13:45,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/layer_11-model_00-model_states.pt... 0: [2022-11-24 17:13:45,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/layer_11-model_00-model_states.pt. 0: [2022-11-24 17:13:45,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/layer_12-model_00-model_states.pt... 0: [2022-11-24 17:13:45,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/layer_12-model_00-model_states.pt. 0: [2022-11-24 17:13:45,451] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/layer_14-model_00-model_states.pt... 0: [2022-11-24 17:13:45,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/layer_14-model_00-model_states.pt. 0: [2022-11-24 17:13:45,452] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_83m/global_step37905/mp_rank_00_model_states.pt 0: [2022-11-24 17:13:45,452] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/mp_rank_00_model_states.pt... 0: [2022-11-24 17:13:45,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/mp_rank_00_model_states.pt. 0: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt... 24: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... 12: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt... 23: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt... 31: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt... 14: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... 14: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... 14: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... 14: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... 14: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... 14: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt... 22: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt... 16: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:13:45,521] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:13:45,521] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:13:45,521] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:13:45,521] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:13:45,521] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:13:45,521] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt... 29: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt... 11: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 26: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 14: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... 14: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 15: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt... 27: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt... 20: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt... 10: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... 4: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt... 18: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt... 25: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt... 30: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:13:45,521] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt... 21: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt... 13: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 5: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 6: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 1: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 19: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 8: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... 9: [2022-11-24 17:13:45,521] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt... 0: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 17: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt... 28: [2022-11-24 17:13:45,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_83m/global_step37905/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt... 2: [2022-11-24 17:13:45,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:13:45,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-24 17:13:45,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 15: [2022-11-24 17:13:45,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:13:45,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt 0: [2022-11-24 17:13:45,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:13:45,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 9: [2022-11-24 17:13:45,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:13:45,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt 9: [2022-11-24 17:13:45,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 10: [2022-11-24 17:13:45,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:13:45,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt 10: [2022-11-24 17:13:45,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 17: [2022-11-24 17:13:45,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:13:45,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt 17: [2022-11-24 17:13:45,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 29: [2022-11-24 17:13:45,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:13:45,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt 29: [2022-11-24 17:13:45,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 7: [2022-11-24 17:13:45,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:13:45,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 11: [2022-11-24 17:13:45,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:13:45,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt 7: [2022-11-24 17:13:45,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 11: [2022-11-24 17:13:45,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 0: [2022-11-24 17:13:45,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 20: [2022-11-24 17:13:45,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:13:45,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:13:45,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 20: [2022-11-24 17:13:45,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt 4: [2022-11-24 17:13:45,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 20: [2022-11-24 17:13:45,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 4: [2022-11-24 17:13:45,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 8: [2022-11-24 17:13:45,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:13:45,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt 8: [2022-11-24 17:13:45,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 13: [2022-11-24 17:13:45,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:13:45,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt 13: [2022-11-24 17:13:45,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 14: [2022-11-24 17:13:45,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. 14: [2022-11-24 17:13:45,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt 14: [2022-11-24 17:13:45,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 16: [2022-11-24 17:13:45,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:13:45,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt 16: [2022-11-24 17:13:45,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 28: [2022-11-24 17:13:45,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:13:45,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt 28: [2022-11-24 17:13:45,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 25: [2022-11-24 17:13:45,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:13:45,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:13:45,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt 5: [2022-11-24 17:13:45,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:13:45,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:13:45,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt 25: [2022-11-24 17:13:45,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 5: [2022-11-24 17:13:45,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 19: [2022-11-24 17:13:45,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt 22: [2022-11-24 17:13:45,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 5: [2022-11-24 17:13:45,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 19: [2022-11-24 17:13:45,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 26: [2022-11-24 17:13:45,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt. 26: [2022-11-24 17:13:45,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt 26: [2022-11-24 17:13:45,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 1: [2022-11-24 17:13:45,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 3: [2022-11-24 17:13:45,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-24 17:13:45,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:13:45,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 3: [2022-11-24 17:13:45,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 1: [2022-11-24 17:13:45,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 3: [2022-11-24 17:13:45,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-24 17:13:45,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 24: [2022-11-24 17:13:45,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt. 3: [2022-11-24 17:13:45,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 24: [2022-11-24 17:13:45,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt 24: [2022-11-24 17:13:45,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 12: [2022-11-24 17:13:45,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:13:45,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:13:45,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt 2: [2022-11-24 17:13:45,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 12: [2022-11-24 17:13:45,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 2: [2022-11-24 17:13:45,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 15: [2022-11-24 17:13:45,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:13:45,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt 15: [2022-11-24 17:13:45,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 27: [2022-11-24 17:13:45,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:13:45,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:13:45,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:13:45,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt 27: [2022-11-24 17:13:45,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt 9: [2022-11-24 17:13:45,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 27: [2022-11-24 17:13:45,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 0: [2022-11-24 17:13:45,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-24 17:13:45,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 11: [2022-11-24 17:13:45,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:13:45,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt 11: [2022-11-24 17:13:45,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 17: [2022-11-24 17:13:45,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:13:45,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt 17: [2022-11-24 17:13:45,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 4: [2022-11-24 17:13:45,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:13:45,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:13:45,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 18: [2022-11-24 17:13:45,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:13:45,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:13:45,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt 4: [2022-11-24 17:13:45,533] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 18: [2022-11-24 17:13:45,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt 20: [2022-11-24 17:13:45,533] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 23: [2022-11-24 17:13:45,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt 18: [2022-11-24 17:13:45,533] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 23: [2022-11-24 17:13:45,533] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 7: [2022-11-24 17:13:45,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:13:45,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:13:45,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:13:45,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt 10: [2022-11-24 17:13:45,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt 7: [2022-11-24 17:13:45,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 29: [2022-11-24 17:13:45,533] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 10: [2022-11-24 17:13:45,533] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 7: [2022-11-24 17:13:45,533] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 14: [2022-11-24 17:13:45,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. 14: [2022-11-24 17:13:45,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt 14: [2022-11-24 17:13:45,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 21: [2022-11-24 17:13:45,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:13:45,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt 21: [2022-11-24 17:13:45,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 16: [2022-11-24 17:13:45,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:13:45,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:13:45,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt 13: [2022-11-24 17:13:45,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt 26: [2022-11-24 17:13:45,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:13:45,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 13: [2022-11-24 17:13:45,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 26: [2022-11-24 17:13:45,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt 26: [2022-11-24 17:13:45,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 8: [2022-11-24 17:13:45,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:13:45,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:13:45,535] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt 28: [2022-11-24 17:13:45,535] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt 8: [2022-11-24 17:13:45,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 28: [2022-11-24 17:13:45,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 25: [2022-11-24 17:13:45,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:13:45,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:13:45,535] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt 25: [2022-11-24 17:13:45,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 6: [2022-11-24 17:13:45,535] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-24 17:13:45,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 19: [2022-11-24 17:13:45,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:13:45,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt 19: [2022-11-24 17:13:45,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 5: [2022-11-24 17:13:45,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:13:45,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:13:45,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-24 17:13:45,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 22: [2022-11-24 17:13:45,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt 22: [2022-11-24 17:13:45,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 30: [2022-11-24 17:13:45,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:13:45,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt 30: [2022-11-24 17:13:45,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 1: [2022-11-24 17:13:45,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:13:45,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 31: [2022-11-24 17:13:45,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:13:45,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 31: [2022-11-24 17:13:45,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt 31: [2022-11-24 17:13:45,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 12: [2022-11-24 17:13:45,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:13:45,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt 2: [2022-11-24 17:13:45,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:13:45,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 2: [2022-11-24 17:13:45,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-24 17:13:45,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 0: [2022-11-24 17:13:45,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:13:45,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:13:45,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt 0: [2022-11-24 17:13:45,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 27: [2022-11-24 17:13:45,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 0: [2022-11-24 17:13:45,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 15: [2022-11-24 17:13:45,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:13:45,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:13:45,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt 9: [2022-11-24 17:13:45,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 15: [2022-11-24 17:13:45,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt 15: [2022-11-24 17:13:45,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 24: [2022-11-24 17:13:45,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:13:45,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt 24: [2022-11-24 17:13:45,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 11: [2022-11-24 17:13:45,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:13:45,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt 17: [2022-11-24 17:13:45,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:13:45,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 17: [2022-11-24 17:13:45,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt 17: [2022-11-24 17:13:45,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 20: [2022-11-24 17:13:45,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:13:45,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:13:45,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt 20: [2022-11-24 17:13:45,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 4: [2022-11-24 17:13:45,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-24 17:13:45,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 29: [2022-11-24 17:13:45,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:13:45,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt 29: [2022-11-24 17:13:45,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 10: [2022-11-24 17:13:45,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:13:45,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:13:45,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 14: [2022-11-24 17:13:45,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:13:45,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt 10: [2022-11-24 17:13:45,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 14: [2022-11-24 17:13:45,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt 14: [2022-11-24 17:13:45,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 7: [2022-11-24 17:13:45,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 16: [2022-11-24 17:13:45,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:13:45,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt 16: [2022-11-24 17:13:45,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 18: [2022-11-24 17:13:45,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt. 26: [2022-11-24 17:13:45,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:13:45,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt 18: [2022-11-24 17:13:45,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 26: [2022-11-24 17:13:45,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt 26: [2022-11-24 17:13:45,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 21: [2022-11-24 17:13:45,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:13:45,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:13:45,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt 21: [2022-11-24 17:13:45,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 13: [2022-11-24 17:13:45,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt 13: [2022-11-24 17:13:45,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 23: [2022-11-24 17:13:45,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:13:45,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:13:45,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt 8: [2022-11-24 17:13:45,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt 23: [2022-11-24 17:13:45,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 8: [2022-11-24 17:13:45,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 28: [2022-11-24 17:13:45,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:13:45,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt 28: [2022-11-24 17:13:45,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 6: [2022-11-24 17:13:45,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:13:45,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:13:45,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-24 17:13:45,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 25: [2022-11-24 17:13:45,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt 25: [2022-11-24 17:13:45,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 19: [2022-11-24 17:13:45,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:13:45,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt 19: [2022-11-24 17:13:45,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 5: [2022-11-24 17:13:45,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:13:45,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-24 17:13:45,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 22: [2022-11-24 17:13:45,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:13:45,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt 22: [2022-11-24 17:13:45,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 30: [2022-11-24 17:13:45,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:13:45,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt 3: [2022-11-24 17:13:45,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:13:45,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 3: [2022-11-24 17:13:45,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-24 17:13:45,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 31: [2022-11-24 17:13:45,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt. 31: [2022-11-24 17:13:45,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt 31: [2022-11-24 17:13:45,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 2: [2022-11-24 17:13:45,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:13:45,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:13:45,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-24 17:13:45,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 1: [2022-11-24 17:13:45,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-24 17:13:45,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 12: [2022-11-24 17:13:45,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:13:45,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt 12: [2022-11-24 17:13:45,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 0: [2022-11-24 17:13:45,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:13:45,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-24 17:13:45,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 15: [2022-11-24 17:13:45,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:13:45,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt 15: [2022-11-24 17:13:45,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 24: [2022-11-24 17:13:45,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:13:45,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt 24: [2022-11-24 17:13:45,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 9: [2022-11-24 17:13:45,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:13:45,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt 9: [2022-11-24 17:13:45,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 27: [2022-11-24 17:13:45,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:13:45,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt 27: [2022-11-24 17:13:45,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 17: [2022-11-24 17:13:45,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:13:45,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt 4: [2022-11-24 17:13:45,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:13:45,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 4: [2022-11-24 17:13:45,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-24 17:13:45,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 11: [2022-11-24 17:13:45,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:13:45,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt 11: [2022-11-24 17:13:45,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 20: [2022-11-24 17:13:45,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:13:45,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt 20: [2022-11-24 17:13:45,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 7: [2022-11-24 17:13:45,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:13:45,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:13:45,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt 7: [2022-11-24 17:13:45,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 29: [2022-11-24 17:13:45,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 7: [2022-11-24 17:13:45,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 10: [2022-11-24 17:13:45,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:13:45,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt 10: [2022-11-24 17:13:45,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 14: [2022-11-24 17:13:45,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. 14: [2022-11-24 17:13:45,546] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt 14: [2022-11-24 17:13:45,546] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 18: [2022-11-24 17:13:45,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:13:45,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:13:45,546] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt 16: [2022-11-24 17:13:45,546] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 18: [2022-11-24 17:13:45,546] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt 18: [2022-11-24 17:13:45,546] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 26: [2022-11-24 17:13:45,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt. 26: [2022-11-24 17:13:45,546] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt 26: [2022-11-24 17:13:45,546] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 13: [2022-11-24 17:13:45,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:13:45,546] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt 13: [2022-11-24 17:13:45,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 21: [2022-11-24 17:13:45,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:13:45,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt 21: [2022-11-24 17:13:45,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 8: [2022-11-24 17:13:45,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:13:45,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt 8: [2022-11-24 17:13:45,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 23: [2022-11-24 17:13:45,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:13:45,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt 23: [2022-11-24 17:13:45,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 28: [2022-11-24 17:13:45,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:13:45,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt 28: [2022-11-24 17:13:45,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 25: [2022-11-24 17:13:45,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:13:45,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt 25: [2022-11-24 17:13:45,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 6: [2022-11-24 17:13:45,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:13:45,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-24 17:13:45,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 19: [2022-11-24 17:13:45,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:13:45,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt 19: [2022-11-24 17:13:45,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 30: [2022-11-24 17:13:45,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt. 31: [2022-11-24 17:13:45,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:13:45,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt 5: [2022-11-24 17:13:45,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 31: [2022-11-24 17:13:45,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt 30: [2022-11-24 17:13:45,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 31: [2022-11-24 17:13:45,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 5: [2022-11-24 17:13:45,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-24 17:13:45,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 22: [2022-11-24 17:13:45,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:13:45,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt 22: [2022-11-24 17:13:45,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 12: [2022-11-24 17:13:45,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:13:45,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:13:45,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt 12: [2022-11-24 17:13:45,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 1: [2022-11-24 17:13:45,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-24 17:13:45,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 2: [2022-11-24 17:13:45,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:13:45,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 3: [2022-11-24 17:13:45,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:13:45,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 3: [2022-11-24 17:13:45,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-24 17:13:45,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 0: [2022-11-24 17:13:45,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:13:45,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-24 17:13:45,550] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 15: [2022-11-24 17:13:45,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:13:45,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt 15: [2022-11-24 17:13:45,550] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 9: [2022-11-24 17:13:45,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:13:45,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt 9: [2022-11-24 17:13:45,550] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 24: [2022-11-24 17:13:45,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:13:45,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt 24: [2022-11-24 17:13:45,550] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 27: [2022-11-24 17:13:45,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:13:45,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt 27: [2022-11-24 17:13:45,550] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 11: [2022-11-24 17:13:45,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:13:45,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:13:45,551] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt 17: [2022-11-24 17:13:45,551] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt 11: [2022-11-24 17:13:45,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 17: [2022-11-24 17:13:45,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 4: [2022-11-24 17:13:45,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:13:45,551] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-24 17:13:45,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 20: [2022-11-24 17:13:45,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:13:45,551] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt 20: [2022-11-24 17:13:45,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 29: [2022-11-24 17:13:45,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:13:45,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt 7: [2022-11-24 17:13:45,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:13:45,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 7: [2022-11-24 17:13:45,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-24 17:13:45,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 18: [2022-11-24 17:13:45,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:13:45,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt 14: [2022-11-24 17:13:45,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:13:45,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 14: [2022-11-24 17:13:45,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt 14: [2022-11-24 17:13:45,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 10: [2022-11-24 17:13:45,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:13:45,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt 10: [2022-11-24 17:13:45,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 26: [2022-11-24 17:13:45,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:13:45,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt. 26: [2022-11-24 17:13:45,553] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt 16: [2022-11-24 17:13:45,553] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt 26: [2022-11-24 17:13:45,553] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 16: [2022-11-24 17:13:45,553] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 13: [2022-11-24 17:13:45,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:13:45,553] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt 8: [2022-11-24 17:13:45,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:13:45,553] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 8: [2022-11-24 17:13:45,553] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt 8: [2022-11-24 17:13:45,553] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 21: [2022-11-24 17:13:45,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:13:45,553] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt 21: [2022-11-24 17:13:45,553] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 23: [2022-11-24 17:13:45,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:13:45,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt 23: [2022-11-24 17:13:45,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 28: [2022-11-24 17:13:45,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:13:45,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt 28: [2022-11-24 17:13:45,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 6: [2022-11-24 17:13:45,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:13:45,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:13:45,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-24 17:13:45,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 25: [2022-11-24 17:13:45,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt 25: [2022-11-24 17:13:45,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 19: [2022-11-24 17:13:45,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:13:45,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt 19: [2022-11-24 17:13:45,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 5: [2022-11-24 17:13:45,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:13:45,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-24 17:13:45,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 30: [2022-11-24 17:13:45,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:13:45,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt 30: [2022-11-24 17:13:45,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 3: [2022-11-24 17:13:45,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 31: [2022-11-24 17:13:45,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt. 31: [2022-11-24 17:13:45,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt. 3: [2022-11-24 17:13:45,556] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-24 17:13:45,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 31: [2022-11-24 17:13:45,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt 31: [2022-11-24 17:13:45,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt 31: [2022-11-24 17:13:45,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 31: [2022-11-24 17:13:45,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 22: [2022-11-24 17:13:45,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:13:45,556] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt 22: [2022-11-24 17:13:45,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 12: [2022-11-24 17:13:45,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:13:45,556] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt 12: [2022-11-24 17:13:45,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 1: [2022-11-24 17:13:45,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:13:45,556] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 2: [2022-11-24 17:13:45,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:13:45,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 2: [2022-11-24 17:13:45,556] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 0: [2022-11-24 17:13:45,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:13:45,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 0: [2022-11-24 17:13:45,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-24 17:13:45,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 15: [2022-11-24 17:13:45,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:13:45,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt 15: [2022-11-24 17:13:45,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 24: [2022-11-24 17:13:45,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:13:45,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt 24: [2022-11-24 17:13:45,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 9: [2022-11-24 17:13:45,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:13:45,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt 9: [2022-11-24 17:13:45,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 27: [2022-11-24 17:13:45,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:13:45,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt 27: [2022-11-24 17:13:45,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 11: [2022-11-24 17:13:45,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:13:45,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt 11: [2022-11-24 17:13:45,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 17: [2022-11-24 17:13:45,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:13:45,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt 4: [2022-11-24 17:13:45,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:13:45,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 4: [2022-11-24 17:13:45,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-24 17:13:45,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 20: [2022-11-24 17:13:45,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:13:45,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt 20: [2022-11-24 17:13:45,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 29: [2022-11-24 17:13:45,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:13:45,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt 7: [2022-11-24 17:13:45,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:13:45,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 7: [2022-11-24 17:13:45,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-24 17:13:45,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 10: [2022-11-24 17:13:45,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:13:45,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt 10: [2022-11-24 17:13:45,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 14: [2022-11-24 17:13:45,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. 14: [2022-11-24 17:13:45,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt 14: [2022-11-24 17:13:45,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 18: [2022-11-24 17:13:45,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:13:45,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt 18: [2022-11-24 17:13:45,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 16: [2022-11-24 17:13:45,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:13:45,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt 16: [2022-11-24 17:13:45,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 26: [2022-11-24 17:13:45,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt. 26: [2022-11-24 17:13:45,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt 26: [2022-11-24 17:13:45,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 8: [2022-11-24 17:13:45,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:13:45,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt 8: [2022-11-24 17:13:45,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 23: [2022-11-24 17:13:45,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:13:45,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt 13: [2022-11-24 17:13:45,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:13:45,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 13: [2022-11-24 17:13:45,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt 21: [2022-11-24 17:13:45,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:13:45,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt 13: [2022-11-24 17:13:45,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 21: [2022-11-24 17:13:45,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 6: [2022-11-24 17:13:45,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:13:45,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-24 17:13:45,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 25: [2022-11-24 17:13:45,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:13:45,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt 25: [2022-11-24 17:13:45,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 28: [2022-11-24 17:13:45,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:13:45,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:13:45,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt 28: [2022-11-24 17:13:45,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 19: [2022-11-24 17:13:45,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt 19: [2022-11-24 17:13:45,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 30: [2022-11-24 17:13:45,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:13:45,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt 5: [2022-11-24 17:13:45,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:13:45,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 5: [2022-11-24 17:13:45,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-24 17:13:45,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 22: [2022-11-24 17:13:45,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:13:45,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt 22: [2022-11-24 17:13:45,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 3: [2022-11-24 17:13:45,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-24 17:13:45,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-24 17:13:45,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 12: [2022-11-24 17:13:45,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:13:45,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt 12: [2022-11-24 17:13:45,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 1: [2022-11-24 17:13:45,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:13:45,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 2: [2022-11-24 17:13:45,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:13:45,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 2: [2022-11-24 17:13:45,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 0: [2022-11-24 17:13:45,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:13:45,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 0: [2022-11-24 17:13:45,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-24 17:13:45,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 15: [2022-11-24 17:13:45,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. 15: [2022-11-24 17:13:45,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt 15: [2022-11-24 17:13:45,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 27: [2022-11-24 17:13:45,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:13:45,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt 27: [2022-11-24 17:13:45,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 24: [2022-11-24 17:13:45,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:13:45,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt 9: [2022-11-24 17:13:45,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:13:45,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 9: [2022-11-24 17:13:45,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt 9: [2022-11-24 17:13:45,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 11: [2022-11-24 17:13:45,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:13:45,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt 11: [2022-11-24 17:13:45,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 17: [2022-11-24 17:13:45,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt. 17: [2022-11-24 17:13:45,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt 17: [2022-11-24 17:13:45,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 4: [2022-11-24 17:13:45,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:13:45,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-24 17:13:45,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 20: [2022-11-24 17:13:45,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:13:45,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt 20: [2022-11-24 17:13:45,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 7: [2022-11-24 17:13:45,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:13:45,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 29: [2022-11-24 17:13:45,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:13:45,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 29: [2022-11-24 17:13:45,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt 29: [2022-11-24 17:13:45,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 10: [2022-11-24 17:13:45,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:13:45,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt 10: [2022-11-24 17:13:45,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 18: [2022-11-24 17:13:45,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:13:45,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt 18: [2022-11-24 17:13:45,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 14: [2022-11-24 17:13:45,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. 14: [2022-11-24 17:13:45,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt 16: [2022-11-24 17:13:45,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt. 14: [2022-11-24 17:13:45,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 16: [2022-11-24 17:13:45,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt 16: [2022-11-24 17:13:45,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 26: [2022-11-24 17:13:45,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt. 26: [2022-11-24 17:13:45,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt 26: [2022-11-24 17:13:45,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 13: [2022-11-24 17:13:45,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:13:45,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt 13: [2022-11-24 17:13:45,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 8: [2022-11-24 17:13:45,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:13:45,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt 8: [2022-11-24 17:13:45,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 21: [2022-11-24 17:13:45,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:13:45,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:13:45,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt 21: [2022-11-24 17:13:45,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 23: [2022-11-24 17:13:45,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt 23: [2022-11-24 17:13:45,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 28: [2022-11-24 17:13:45,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:13:45,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt 28: [2022-11-24 17:13:45,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 25: [2022-11-24 17:13:45,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:13:45,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt 25: [2022-11-24 17:13:45,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 6: [2022-11-24 17:13:45,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:13:45,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-24 17:13:45,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 19: [2022-11-24 17:13:45,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:13:45,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt 19: [2022-11-24 17:13:45,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 30: [2022-11-24 17:13:45,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:13:45,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt 30: [2022-11-24 17:13:45,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 5: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 1: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 12: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt 1: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 12: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 23: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt. 14: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt 31: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt. 19: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt 2: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. 16: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. 13: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 12: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. 29: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt. 11: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. 23: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 31: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt 14: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt 19: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 2: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 15: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. 25: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt 9: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt 17: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt. 28: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt. 0: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 7: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 12: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt 29: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt 11: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt 23: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt. 31: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 14: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 1: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 2: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 27: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt 10: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. 4: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 8: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt 16: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt 18: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt 25: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 30: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt. 9: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 13: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt 0: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 7: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 12: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 29: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 11: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 23: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt 26: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt. 31: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt. 1: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 15: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt 27: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt 20: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt. 22: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 10: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt 4: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 8: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 16: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 18: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 30: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt 17: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt 28: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt 13: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 0: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 23: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 26: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt 31: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt 1: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 15: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 27: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 24: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt. 10: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 4: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 18: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt. 30: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 17: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 28: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 21: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt. 21: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 26: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 3: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 31: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 27: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt. 20: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt 22: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt. 24: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt. 18: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt 30: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt. 6: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 3: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 31: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt. 27: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt 20: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 22: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt 24: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt 18: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 30: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt 21: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt 6: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 6: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 3: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 31: [2022-11-24 17:13:45,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt 27: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 22: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 24: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt 30: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 21: [2022-11-24 17:13:45,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt 3: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 31: [2022-11-24 17:13:45,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 24: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 21: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 3: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 24: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 21: [2022-11-24 17:13:45,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 5: [2022-11-24 17:13:45,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-24 17:13:45,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_83m/global_step37905/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-24 17:13:45,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37905 is ready now! 0: successfully saved checkpoint at iteration 37905 to checkpoints_83m 31: ------------------------------------------------------------------------------------------------------------ 31: test loss at the end of training for test data | lm loss value: 2.767530E+00 | lm loss PPL: 1.591926E+01 | 31: ------------------------------------------------------------------------------------------------------------ 28: END 2058110: Thu Nov 24 17:13:52 EET 2022 4: END 2058110: Thu Nov 24 17:13:52 EET 2022 20: END 2058110: Thu Nov 24 17:13:52 EET 2022 17: END 2058110: Thu Nov 24 17:13:52 EET 2022 7: END 2058110: Thu Nov 24 17:13:52 EET 2022 16: END 2058110: Thu Nov 24 17:13:52 EET 2022 12: END 2058110: Thu Nov 24 17:13:52 EET 2022 19: END 2058110: Thu Nov 24 17:13:52 EET 2022 25: END 2058110: Thu Nov 24 17:13:52 EET 2022 26: END 2058110: Thu Nov 24 17:13:52 EET 2022 15: END 2058110: Thu Nov 24 17:13:52 EET 2022 14: END 2058110: Thu Nov 24 17:13:52 EET 2022 27: END 2058110: Thu Nov 24 17:13:52 EET 2022 24: END 2058110: Thu Nov 24 17:13:52 EET 2022 5: END 2058110: Thu Nov 24 17:13:52 EET 2022 9: END 2058110: Thu Nov 24 17:13:52 EET 2022 18: END 2058110: Thu Nov 24 17:13:52 EET 2022 6: END 2058110: Thu Nov 24 17:13:52 EET 2022 11: END 2058110: Thu Nov 24 17:13:52 EET 2022 22: END 2058110: Thu Nov 24 17:13:52 EET 2022 29: END 2058110: Thu Nov 24 17:13:52 EET 2022 0: END 2058110: Thu Nov 24 17:13:52 EET 2022 31: END 2058110: Thu Nov 24 17:13:52 EET 2022 1: END 2058110: Thu Nov 24 17:13:52 EET 2022 2: END 2058110: Thu Nov 24 17:13:52 EET 2022 23: END 2058110: Thu Nov 24 17:13:52 EET 2022 10: END 2058110: Thu Nov 24 17:13:52 EET 2022 8: END 2058110: Thu Nov 24 17:13:52 EET 2022 3: END 2058110: Thu Nov 24 17:13:52 EET 2022 13: END 2058110: Thu Nov 24 17:13:52 EET 2022 30: END 2058110: Thu Nov 24 17:13:52 EET 2022 21: END 2058110: Thu Nov 24 17:13:52 EET 2022