[2021-09-01 15:50:18,187] [WARNING] [runner.py:122:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2021-09-01 15:50:18,224] [INFO] [runner.py:360:main] cmd = /home/patrick/anaconda3/envs/hugging_face/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 ./run_pretrain_no_trainer.py --output_dir=./test --max_train_steps=200000 --num_warmup_steps=1000 --gradient_accumulation_steps=4 --learning_rate=0.0005 --weight_decay=0.01 --max_duration_in_seconds=8.0 --model_name_or_path=./ --dataset_name=patrickvonplaten/librispeech_local --manual_data_dir=/home/patrick/wav2vec2_reproduce --dataset_config_name=clean --logging_steps=5 --per_device_train_batch_size=16 --per_device_eval_batch_size=16 [2021-09-01 15:50:18,601] [INFO] [launch.py:80:main] WORLD INFO DICT: {'localhost': [0, 1]} [2021-09-01 15:50:18,601] [INFO] [launch.py:86:main] nnodes=1, num_local_procs=2, node_rank=0 [2021-09-01 15:50:18,601] [INFO] [launch.py:101:main] global_rank_mapping=defaultdict(, {'localhost': [0, 1]}) [2021-09-01 15:50:18,601] [INFO] [launch.py:102:main] dist_world_size=2 [2021-09-01 15:50:18,601] [INFO] [launch.py:104:main] Setting CUDA_VISIBLE_DEVICES=0,1 [2021-09-01 15:54:19,112] [INFO] [utils.py:11:_initialize_parameter_parallel_groups] data_parallel_size: 2, parameter_parallel_size: 2 [2021-09-01 15:54:24,481] [INFO] [utils.py:11:_initialize_parameter_parallel_groups] data_parallel_size: 2, parameter_parallel_size: 2 [2021-09-01 15:54:24,527] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-01 15:54:24,527] [INFO] [engine.py:702:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-01 15:54:24,527] [INFO] [engine.py:707:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-01 15:54:24,527] [INFO] [engine.py:716:_configure_optimizer] DeepSpeed Basic Optimizer = AdamW [2021-09-01 15:54:24,527] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type= [2021-09-01 15:54:24,527] [WARNING] [engine.py:726:_configure_optimizer] **** You are using ZeRO with an untested optimizer, proceed with caution ***** [2021-09-01 15:54:24,527] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 2 optimizer [2021-09-01 15:54:24,527] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-01 15:54:24,527] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-01 15:54:24,527] [INFO] [stage2.py:108:__init__] CPU Offload: True [2021-09-01 15:54:24,527] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False Using /home/patrick/.cache/torch_extensions as PyTorch extensions root... Using /home/patrick/.cache/torch_extensions as PyTorch extensions root... Emitting ninja build file /home/patrick/.cache/torch_extensions/utils/build.ninja... Building extension module utils... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module utils... Loading extension module utils... Time to load utils op: 0.5212671756744385 seconds Time to load utils op: 0.5031814575195312 seconds Using /home/patrick/.cache/torch_extensions as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.00027632713317871094 seconds [2021-09-01 15:54:27,578] [INFO] [stage2.py:416:__init__] optimizer state initialized [2021-09-01 15:54:27,578] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW [2021-09-01 15:54:27,578] [INFO] [engine.py:519:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-01 15:54:27,578] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = None [2021-09-01 15:54:27,579] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0005, 0.0005], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-01 15:54:27,579] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] amp_params ................... False [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] dump_state ................... False [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... None [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] gradient_accumulation_steps .. 4 [2021-09-01 15:54:27,579] [INFO] [config.py:904:print] gradient_clipping ............ 0.0 [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4294967296 [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] pld_params ................... False [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] steps_per_print .............. inf [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] train_batch_size ............. 128 [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 16 [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] world_size ................... 2 [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] zero_allow_untested_optimizer True [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] zero_config .................. { "stage": 2, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": { "device": "cpu", "nvme_path": null, "buffer_count": 4, "pin_memory": false, "pipeline_read": false, "pipeline_write": false, "fast_init": false, "pipeline": false }, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-01 15:54:27,580] [INFO] [config.py:904:print] zero_optimization_stage ...... 2 [2021-09-01 15:54:27,580] [INFO] [config.py:906:print] json = { "train_batch_size": 128, "gradient_accumulation_steps": 4, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "cpu" } }, "steps_per_print": inf, "zero_allow_untested_optimizer": true, "fp16": { "enabled": true } } Using /home/patrick/.cache/torch_extensions as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.00030040740966796875 seconds | loss: 1.17578| constrast_loss: 4.61181| div_loss: 0.91299| %_mask_idx: 0.35103| ppl: 55.68639| %_neg_is_pos: 0.03191| lr: 0.0| temp: 1.99999 | loss: 1.17585| constrast_loss: 4.61203| div_loss: 0.91357| %_mask_idx: 0.36529| ppl: 55.31592| %_neg_is_pos: 0.02729| lr: 0.0| temp: 1.99999 | loss: 1.1757| constrast_loss: 4.61192| div_loss: 0.90877| %_mask_idx: 0.38174| ppl: 58.38503| %_neg_is_pos: 0.02501| lr: 0.0| temp: 1.99998 | loss: 1.17466| constrast_loss: 4.60775| div_loss: 0.90909| %_mask_idx: 0.33615| ppl: 58.18555| %_neg_is_pos: 0.0399| lr: 0.0| temp: 1.99998 | loss: 1.17426| constrast_loss: 4.60603| div_loss: 0.91005| %_mask_idx: 0.39474| ppl: 57.56554| %_neg_is_pos: 0.02635| lr: 0.0| temp: 1.99997 | loss: 1.17481| constrast_loss: 4.60787| div_loss: 0.91358| %_mask_idx: 0.36153| ppl: 55.3088| %_neg_is_pos: 0.03798| lr: 0.0| temp: 1.99997 | loss: 1.17484| constrast_loss: 4.60806| div_loss: 0.91288| %_mask_idx: 0.40539| ppl: 55.7592| %_neg_is_pos: 0.03332| lr: 0.0| temp: 1.99996 | loss: 1.17531| constrast_loss: 4.60991| div_loss: 0.91343| %_mask_idx: 0.39568| ppl: 55.40624| %_neg_is_pos: 0.02894| lr: 0.0| temp: 1.99996 | loss: 1.17766| constrast_loss: 4.61987| div_loss: 0.90764| %_mask_idx: 0.32957| ppl: 59.10902| %_neg_is_pos: 0.0321| lr: 0.0| temp: 1.99994 | loss: 1.17636| constrast_loss: 4.61495| div_loss: 0.90487| %_mask_idx: 0.388| ppl: 60.8806| %_neg_is_pos: 0.02328| lr: 0.0| temp: 1.99994 | loss: 1.17486| constrast_loss: 4.60787| div_loss: 0.91562| %_mask_idx: 0.40883| ppl: 54.00384| %_neg_is_pos: 0.02847| lr: 0.0| temp: 1.99993 | loss: 1.17615| constrast_loss: 4.61366| div_loss: 0.90941| %_mask_idx: 0.36059| ppl: 57.97858| %_neg_is_pos: 0.02471| lr: 0.0| temp: 1.99993 | loss: 1.17508| constrast_loss: 4.60879| div_loss: 0.91549| %_mask_idx: 0.36858| ppl: 54.08878| %_neg_is_pos: 0.03435| lr: 0.0| temp: 1.99992 | loss: 1.1775| constrast_loss: 4.61901| div_loss: 0.90973| %_mask_idx: 0.39756| ppl: 57.77439| %_neg_is_pos: 0.02542| lr: 0.0| temp: 1.99992 | loss: 1.17633| constrast_loss: 4.6139| div_loss: 0.91408| %_mask_idx: 0.41667| ppl: 54.98813| %_neg_is_pos: 0.02686| lr: 0.0| temp: 1.99991 | loss: 1.17672| constrast_loss: 4.61585| div_loss: 0.91019| %_mask_idx: 0.4104| ppl: 57.48013| %_neg_is_pos: 0.02092| lr: 0.0| temp: 1.99991 | loss: 1.17664| constrast_loss: 4.6158| div_loss: 0.90751| %_mask_idx: 0.40398| ppl: 59.19445| %_neg_is_pos: 0.01925| lr: 1e-05| temp: 1.99989 | loss: 1.17663| constrast_loss: 4.61514| div_loss: 0.91396| %_mask_idx: 0.36012| ppl: 55.06248| %_neg_is_pos: 0.03508| lr: 1e-05| temp: 1.99989 | loss: 1.17539| constrast_loss: 4.61053| div_loss: 0.91037| %_mask_idx: 0.39004| ppl: 57.36224| %_neg_is_pos: 0.03362| lr: 1e-05| temp: 1.99988 | loss: 1.17645| constrast_loss: 4.615| div_loss: 0.90792| %_mask_idx: 0.40398| ppl: 58.93089| %_neg_is_pos: 0.01926| lr: 1e-05| temp: 1.99988 | loss: 1.17635| constrast_loss: 4.61416| div_loss: 0.9123| %_mask_idx: 0.37954| ppl: 56.13014| %_neg_is_pos: 0.02705| lr: 1e-05| temp: 1.99987 | loss: 1.17637| constrast_loss: 4.61443| div_loss: 0.9104| %_mask_idx: 0.35495| ppl: 57.34497| %_neg_is_pos: 0.03183| lr: 1e-05| temp: 1.99987 | loss: 1.17791| constrast_loss: 4.62037| div_loss: 0.9128| %_mask_idx: 0.36294| ppl: 55.81025| %_neg_is_pos: 0.02547| lr: 1e-05| temp: 1.99986 | loss: 1.17534| constrast_loss: 4.61014| div_loss: 0.9122| %_mask_idx: 0.38142| ppl: 56.19057| %_neg_is_pos: 0.03056| lr: 1e-05| temp: 1.99986 | loss: 1.17389| constrast_loss: 4.60401| div_loss: 0.91553| %_mask_idx: 0.3916| ppl: 54.06084| %_neg_is_pos: 0.04584| lr: 1e-05| temp: 1.99984 | loss: 1.1761| constrast_loss: 4.61303| div_loss: 0.91373| %_mask_idx: 0.41761| ppl: 55.21391| %_neg_is_pos: 0.02174| lr: 1e-05| temp: 1.99984 | loss: 1.17646| constrast_loss: 4.61473| div_loss: 0.9113| %_mask_idx: 0.35558| ppl: 56.76888| %_neg_is_pos: 0.02469| lr: 1e-05| temp: 1.99983 | loss: 1.17531| constrast_loss: 4.60975| div_loss: 0.91496| %_mask_idx: 0.41134| ppl: 54.4265| %_neg_is_pos: 0.02867| lr: 1e-05| temp: 1.99983 | loss: 1.17594| constrast_loss: 4.61215| div_loss: 0.91616| %_mask_idx: 0.38972| ppl: 53.65632| %_neg_is_pos: 0.03745| lr: 1e-05| temp: 1.99981 | loss: 1.17561| constrast_loss: 4.6114| div_loss: 0.91052| %_mask_idx: 0.33286| ppl: 57.26833| %_neg_is_pos: 0.03816| lr: 1e-05| temp: 1.99981 | loss: 1.17571| constrast_loss: 4.6121| div_loss: 0.90762| %_mask_idx: 0.32315| ppl: 59.12518| %_neg_is_pos: 0.02476| lr: 1e-05| temp: 1.9998 | loss: 1.17729| constrast_loss: 4.61784| div_loss: 0.91299| %_mask_idx: 0.39317| ppl: 55.68805| %_neg_is_pos: 0.02009| lr: 1e-05| temp: 1.9998 | loss: 1.1745| constrast_loss: 4.60691| div_loss: 0.91098| %_mask_idx: 0.35338| ppl: 56.97355| %_neg_is_pos: 0.03922| lr: 1e-05| temp: 1.99979 | loss: 1.17231| constrast_loss: 4.59778| div_loss: 0.91446| %_mask_idx: 0.3703| ppl: 54.74596| %_neg_is_pos: 0.06405| lr: 1e-05| temp: 1.99979 | loss: 1.1774| constrast_loss: 4.61886| div_loss: 0.90754| %_mask_idx: 0.39066| ppl: 59.17693| %_neg_is_pos: 0.02516| lr: 1e-05| temp: 1.99978 | loss: 1.17648| constrast_loss: 4.61488| div_loss: 0.91054| %_mask_idx: 0.41447| ppl: 57.25401| %_neg_is_pos: 0.01643| lr: 1e-05| temp: 1.99978 | loss: 1.17584| constrast_loss: 4.61212| div_loss: 0.91245| %_mask_idx: 0.41823| ppl: 56.03413| %_neg_is_pos: 0.02195| lr: 1e-05| temp: 1.99976 | loss: 1.17536| constrast_loss: 4.60945| div_loss: 0.91971| %_mask_idx: 0.33835| ppl: 51.38247| %_neg_is_pos: 0.03413| lr: 1e-05| temp: 1.99976 | loss: 1.17518| constrast_loss: 4.60979| div_loss: 0.90922| %_mask_idx: 0.33459| ppl: 58.10165| %_neg_is_pos: 0.04076| lr: 1e-05| temp: 1.99975 | loss: 1.1754| constrast_loss: 4.61052| div_loss: 0.9107| %_mask_idx: 0.35808| ppl: 57.15482| %_neg_is_pos: 0.02477| lr: 1e-05| temp: 1.99975 | loss: 1.17417| constrast_loss: 4.60513| div_loss: 0.91548| %_mask_idx: 0.36936| ppl: 54.09599| %_neg_is_pos: 0.03906| lr: 1e-05| temp: 1.99974 | loss: 1.17709| constrast_loss: 4.61781| div_loss: 0.90571| %_mask_idx: 0.45724| ppl: 60.34857| %_neg_is_pos: 0.01431| lr: 1e-05| temp: 1.99974 | loss: 1.17671| constrast_loss: 4.61604| div_loss: 0.90807| %_mask_idx: 0.43421| ppl: 58.83605| %_neg_is_pos: 0.02516| lr: 1e-05| temp: 1.99973 | loss: 1.17682| constrast_loss: 4.61601| div_loss: 0.91263| %_mask_idx: 0.39677| ppl: 55.91811| %_neg_is_pos: 0.03724| lr: 1e-05| temp: 1.99973 | loss: 1.17596| constrast_loss: 4.61228| div_loss: 0.91577| %_mask_idx: 0.39176| ppl: 53.90988| %_neg_is_pos: 0.02683| lr: 1e-05| temp: 1.99971 | loss: 1.17684| constrast_loss: 4.61634| div_loss: 0.91005| %_mask_idx: 0.42481| ppl: 57.56659| %_neg_is_pos: 0.02022| lr: 1e-05| temp: 1.99971 | loss: 1.17827| constrast_loss: 4.62213| div_loss: 0.90931| %_mask_idx: 0.33537| ppl: 58.03905| %_neg_is_pos: 0.02762| lr: 1e-05| temp: 1.9997 | loss: 1.17481| constrast_loss: 4.60873| div_loss: 0.90528| %_mask_idx: 0.43452| ppl: 60.62358| %_neg_is_pos: 0.03148| lr: 1e-05| temp: 1.9997 | loss: 1.17591| constrast_loss: 4.61224| div_loss: 0.91398| %_mask_idx: 0.39348| ppl: 55.05308| %_neg_is_pos: 0.03635| lr: 2e-05| temp: 1.99969 | loss: 1.17552| constrast_loss: 4.61128| div_loss: 0.90798| %_mask_idx: 0.35934| ppl: 58.89192| %_neg_is_pos: 0.02223| lr: 2e-05| temp: 1.99969 | loss: 1.17674| constrast_loss: 4.61648| div_loss: 0.90474| %_mask_idx: 0.39223| ppl: 60.96569| %_neg_is_pos: 0.03709| lr: 2e-05| temp: 1.99968 | loss: 1.17645| constrast_loss: 4.61495| div_loss: 0.9084| %_mask_idx: 0.35808| ppl: 58.62615| %_neg_is_pos: 0.04108| lr: 2e-05| temp: 1.99968 | loss: 1.17709| constrast_loss: 4.61749| div_loss: 0.90863| %_mask_idx: 0.39239| ppl: 58.47826| %_neg_is_pos: 0.02588| lr: 2e-05| temp: 1.99966 | loss: 1.17759| constrast_loss: 4.6195| div_loss: 0.90873| %_mask_idx: 0.4198| ppl: 58.41544| %_neg_is_pos: 0.01396| lr: 2e-05| temp: 1.99966 | loss: 1.17831| constrast_loss: 4.62245| div_loss: 0.90768| %_mask_idx: 0.3786| ppl: 59.08169| %_neg_is_pos: 0.01795| lr: 2e-05| temp: 1.99965 | loss: 1.17585| constrast_loss: 4.61192| div_loss: 0.91494| %_mask_idx: 0.35714| ppl: 54.43921| %_neg_is_pos: 0.02883| lr: 2e-05| temp: 1.99965 | loss: 1.1787| constrast_loss: 4.62409| div_loss: 0.90714| %_mask_idx: 0.43781| ppl: 59.42849| %_neg_is_pos: 0.01159| lr: 2e-05| temp: 1.99963 | loss: 1.17663| constrast_loss: 4.61574| div_loss: 0.90776| %_mask_idx: 0.3656| ppl: 59.03217| %_neg_is_pos: 0.02383| lr: 2e-05| temp: 1.99963 | loss: 1.17672| constrast_loss: 4.61542| div_loss: 0.91465| %_mask_idx: 0.41729| ppl: 54.62701| %_neg_is_pos: 0.01836| lr: 2e-05| temp: 1.99962 | loss: 1.17758| constrast_loss: 4.61983| div_loss: 0.90477| %_mask_idx: 0.36779| ppl: 60.94737| %_neg_is_pos: 0.01407| lr: 2e-05| temp: 1.99962 | loss: 1.17871| constrast_loss: 4.62396| div_loss: 0.90858| %_mask_idx: 0.4104| ppl: 58.50957| %_neg_is_pos: 0.01304| lr: 2e-05| temp: 1.99961 | loss: 1.1766| constrast_loss: 4.61521| div_loss: 0.9121| %_mask_idx: 0.39615| ppl: 56.25848| %_neg_is_pos: 0.0303| lr: 2e-05| temp: 1.99961 | loss: 1.17659| constrast_loss: 4.61589| div_loss: 0.90459| %_mask_idx: 0.39552| ppl: 61.06527| %_neg_is_pos: 0.01867| lr: 2e-05| temp: 1.9996 | loss: 1.176| constrast_loss: 4.61301| div_loss: 0.90998| %_mask_idx: 0.34336| ppl: 57.61039| %_neg_is_pos: 0.01924| lr: 2e-05| temp: 1.9996 | loss: 1.17487| constrast_loss: 4.60814| div_loss: 0.9134| %_mask_idx: 0.36357| ppl: 55.42474| %_neg_is_pos: 0.03424| lr: 2e-05| temp: 1.99958 | loss: 1.17666| constrast_loss: 4.61594| div_loss: 0.90711| %_mask_idx: 0.35276| ppl: 59.45123| %_neg_is_pos: 0.03223| lr: 2e-05| temp: 1.99958 | loss: 1.17531| constrast_loss: 4.60991| div_loss: 0.91341| %_mask_idx: 0.35855| ppl: 55.42039| %_neg_is_pos: 0.03317| lr: 2e-05| temp: 1.99957 | loss: 1.17523| constrast_loss: 4.60951| div_loss: 0.91404| %_mask_idx: 0.32033| ppl: 55.01416| %_neg_is_pos: 0.03723| lr: 2e-05| temp: 1.99957 | loss: 1.17773| constrast_loss: 4.62001| div_loss: 0.90911| %_mask_idx: 0.35041| ppl: 58.17036| %_neg_is_pos: 0.02471| lr: 2e-05| temp: 1.99956 | loss: 1.17583| constrast_loss: 4.61195| div_loss: 0.91383| %_mask_idx: 0.36059| ppl: 55.1492| %_neg_is_pos: 0.02321| lr: 2e-05| temp: 1.99956 | loss: 1.17744| constrast_loss: 4.61929| div_loss: 0.90474| %_mask_idx: 0.4057| ppl: 60.96789| %_neg_is_pos: 0.02034| lr: 2e-05| temp: 1.99955 | loss: 1.1772| constrast_loss: 4.618| div_loss: 0.90793| %_mask_idx: 0.40883| ppl: 58.92258| %_neg_is_pos: 0.01634| lr: 2e-05| temp: 1.99955 | loss: 1.17358| constrast_loss: 4.60301| div_loss: 0.91317| %_mask_idx: 0.37657| ppl: 55.57052| %_neg_is_pos: 0.03821| lr: 2e-05| temp: 1.99953 | loss: 1.17621| constrast_loss: 4.61402| div_loss: 0.90819| %_mask_idx: 0.37829| ppl: 58.75539| %_neg_is_pos: 0.03057| lr: 2e-05| temp: 1.99953 | loss: 1.17641| constrast_loss: 4.61436| div_loss: 0.91297| %_mask_idx: 0.41651| ppl: 55.70022| %_neg_is_pos: 0.02415| lr: 2e-05| temp: 1.99952 | loss: 1.17711| constrast_loss: 4.6178| div_loss: 0.90659| %_mask_idx: 0.38894| ppl: 59.78046| %_neg_is_pos: 0.02694| lr: 2e-05| temp: 1.99952 | loss: 1.17686| constrast_loss: 4.61637| div_loss: 0.91068| %_mask_idx: 0.36889| ppl: 57.16692| %_neg_is_pos: 0.0305| lr: 2e-05| temp: 1.99951 | loss: 1.1757| constrast_loss: 4.61137| div_loss: 0.91414| %_mask_idx: 0.37672| ppl: 54.95292| %_neg_is_pos: 0.03055| lr: 2e-05| temp: 1.99951 | loss: 1.17815| constrast_loss: 4.62198| div_loss: 0.90613| %_mask_idx: 0.3432| ppl: 60.07767| %_neg_is_pos: 0.02267| lr: 2e-05| temp: 1.9995 | loss: 1.17538| constrast_loss: 4.61016| div_loss: 0.91346| %_mask_idx: 0.40194| ppl: 55.38877| %_neg_is_pos: 0.02368| lr: 2e-05| temp: 1.9995 | loss: 1.17676| constrast_loss: 4.61644| div_loss: 0.90606| %_mask_idx: 0.3891| ppl: 60.12093| %_neg_is_pos: 0.02559| lr: 3e-05| temp: 1.99948 | loss: 1.17493| constrast_loss: 4.60784| div_loss: 0.91871| %_mask_idx: 0.39709| ppl: 52.02501| %_neg_is_pos: 0.03952| lr: 3e-05| temp: 1.99948 | loss: 1.174| constrast_loss: 4.60451| div_loss: 0.91508| %_mask_idx: 0.38581| ppl: 54.3511| %_neg_is_pos: 0.04264| lr: 3e-05| temp: 1.99947 | loss: 1.17468| constrast_loss: 4.60731| div_loss: 0.9142| %_mask_idx: 0.3797| ppl: 54.90921| %_neg_is_pos: 0.0333| lr: 3e-05| temp: 1.99947 [2021-09-01 16:01:08,279] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 4294967296, reducing to 2147483648.0 [2021-09-01 16:01:08,279] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 4294967296, reducing to 2147483648.0 | loss: 1.17619| constrast_loss: 4.61371| div_loss: 0.91042| %_mask_idx: 0.36983| ppl: 57.33159| %_neg_is_pos: 0.03046| lr: 3e-05| temp: 1.99945 | loss: 1.17807| constrast_loss: 4.62157| div_loss: 0.90698| %_mask_idx: 0.36967| ppl: 59.52995| %_neg_is_pos: 0.01991| lr: 3e-05| temp: 1.99945 [2021-09-01 16:01:16,061] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 2147483648.0, reducing to 1073741824.0 [2021-09-01 16:01:16,061] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 2147483648.0, reducing to 1073741824.0 | loss: 1.1771| constrast_loss: 4.6175| div_loss: 0.90901| %_mask_idx: 0.40852| ppl: 58.23128| %_neg_is_pos: 0.02264| lr: 3e-05| temp: 1.99944 | loss: 1.17491| constrast_loss: 4.60927| div_loss: 0.90369| %_mask_idx: 0.39082| ppl: 61.63878| %_neg_is_pos: 0.03104| lr: 3e-05| temp: 1.99944 [2021-09-01 16:01:23,827] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1073741824.0, reducing to 536870912.0 [2021-09-01 16:01:23,827] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1073741824.0, reducing to 536870912.0 | loss: 1.17598| constrast_loss: 4.61316| div_loss: 0.90747| %_mask_idx: 0.37563| ppl: 59.22166| %_neg_is_pos: 0.03294| lr: 3e-05| temp: 1.99943 | loss: 1.17796| constrast_loss: 4.6211| div_loss: 0.9074| %_mask_idx: 0.36294| ppl: 59.26299| %_neg_is_pos: 0.01663| lr: 3e-05| temp: 1.99943 [2021-09-01 16:01:32,004] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 536870912.0, reducing to 268435456.0 [2021-09-01 16:01:32,004] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 536870912.0, reducing to 268435456.0 | loss: 1.17712| constrast_loss: 4.61716| div_loss: 0.91315| %_mask_idx: 0.39615| ppl: 55.58507| %_neg_is_pos: 0.0245| lr: 3e-05| temp: 1.99942 | loss: 1.17512| constrast_loss: 4.60906| div_loss: 0.9142| %_mask_idx: 0.3349| ppl: 54.91278| %_neg_is_pos: 0.0385| lr: 3e-05| temp: 1.99942 [2021-09-01 16:01:40,290] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 268435456.0, reducing to 134217728.0 [2021-09-01 16:01:40,290] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 268435456.0, reducing to 134217728.0 [2021-09-01 16:01:48,085] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 134217728.0, reducing to 67108864.0 [2021-09-01 16:01:48,085] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 134217728.0, reducing to 67108864.0 | loss: 1.17546| constrast_loss: 4.61085| div_loss: 0.90987| %_mask_idx: 0.35103| ppl: 57.68369| %_neg_is_pos: 0.03979| lr: 3e-05| temp: 1.9994 | loss: 1.17601| constrast_loss: 4.61289| div_loss: 0.91158| %_mask_idx: 0.33678| ppl: 56.59046| %_neg_is_pos: 0.05197| lr: 3e-05| temp: 1.9994 [2021-09-01 16:01:55,854] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 67108864.0, reducing to 33554432.0 [2021-09-01 16:01:55,854] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 67108864.0, reducing to 33554432.0 | loss: 1.17625| constrast_loss: 4.61383| div_loss: 0.91167| %_mask_idx: 0.44627| ppl: 56.53142| %_neg_is_pos: 0.01432| lr: 3e-05| temp: 1.99939 | loss: 1.17493| constrast_loss: 4.60822| div_loss: 0.91509| %_mask_idx: 0.37453| ppl: 54.33966| %_neg_is_pos: 0.02521| lr: 3e-05| temp: 1.99939 [2021-09-01 16:02:03,868] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 33554432.0, reducing to 16777216.0 [2021-09-01 16:02:03,868] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 33554432.0, reducing to 16777216.0 | loss: 1.17352| constrast_loss: 4.60288| div_loss: 0.91193| %_mask_idx: 0.34602| ppl: 56.36583| %_neg_is_pos: 0.05294| lr: 3e-05| temp: 1.99938 | loss: 1.17298| constrast_loss: 4.60007| div_loss: 0.91835| %_mask_idx: 0.30749| ppl: 52.25537| %_neg_is_pos: 0.05545| lr: 3e-05| temp: 1.99938 | loss: 1.17054| constrast_loss: 4.59105| div_loss: 0.91106| %_mask_idx: 0.34978| ppl: 56.9186| %_neg_is_pos: 0.0494| lr: 3e-05| temp: 1.99937 | loss: 1.17285| constrast_loss: 4.60037| div_loss: 0.91022| %_mask_idx: 0.3631| ppl: 57.45747| %_neg_is_pos: 0.05246| lr: 3e-05| temp: 1.99937 | loss: 1.17923| constrast_loss: 4.62713| div_loss: 0.89785| %_mask_idx: 0.35056| ppl: 65.37701| %_neg_is_pos: 0.01871| lr: 3e-05| temp: 1.99935 | loss: 1.1768| constrast_loss: 4.61798| div_loss: 0.89222| %_mask_idx: 0.41228| ppl: 68.97978| %_neg_is_pos: 0.013| lr: 3e-05| temp: 1.99935 | loss: 1.16882| constrast_loss: 4.58529| div_loss: 0.89995| %_mask_idx: 0.41839| ppl: 64.03309| %_neg_is_pos: 0.06147| lr: 3e-05| temp: 1.99934 | loss: 1.15621| constrast_loss: 4.53319| div_loss: 0.91642| %_mask_idx: 0.39082| ppl: 53.49074| %_neg_is_pos: 0.09418| lr: 3e-05| temp: 1.99934 | loss: 1.16647| constrast_loss: 4.5752| div_loss: 0.90664| %_mask_idx: 0.35887| ppl: 59.75007| %_neg_is_pos: 0.10363| lr: 3e-05| temp: 1.99933 | loss: 1.17336| constrast_loss: 4.60494| div_loss: 0.88506| %_mask_idx: 0.40711| ppl: 73.56082| %_neg_is_pos: 0.05367| lr: 3e-05| temp: 1.99933 | loss: 1.16897| constrast_loss: 4.58804| div_loss: 0.87827| %_mask_idx: 0.41729| ppl: 77.90423| %_neg_is_pos: 0.05893| lr: 3e-05| temp: 1.99932 | loss: 1.16785| constrast_loss: 4.58228| div_loss: 0.8913| %_mask_idx: 0.35511| ppl: 69.56693| %_neg_is_pos: 0.07738| lr: 3e-05| temp: 1.99932 [2021-09-01 16:03:06,071] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 16777216.0, reducing to 8388608.0 [2021-09-01 16:03:06,071] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 16777216.0, reducing to 8388608.0 | loss: 1.17069| constrast_loss: 4.59401| div_loss: 0.88725| %_mask_idx: 0.40664| ppl: 72.15707| %_neg_is_pos: 0.04776| lr: 4e-05| temp: 1.9993| loss: 1.16761| constrast_loss: 4.581| div_loss: 0.89459| %_mask_idx: 0.38737| ppl: 67.4599| %_neg_is_pos: 0.04867| lr: 4e-05| temp: 1.9993 | loss: 1.17269| constrast_loss: 4.60381| div_loss: 0.8695| %_mask_idx: 0.39364| ppl: 83.51937| %_neg_is_pos: 0.03366| lr: 4e-05| temp: 1.99929 | loss: 1.17277| constrast_loss: 4.60475| div_loss: 0.86318| %_mask_idx: 0.4339| ppl: 87.56596| %_neg_is_pos: 0.02893| lr: 4e-05| temp: 1.99929 | loss: 1.17189| constrast_loss: 4.60032| div_loss: 0.87247| %_mask_idx: 0.35714| ppl: 81.6179| %_neg_is_pos: 0.04137| lr: 4e-05| temp: 1.99927 | loss: 1.17364| constrast_loss: 4.60974| div_loss: 0.84831| %_mask_idx: 0.41056| ppl: 97.08211| %_neg_is_pos: 0.01648| lr: 4e-05| temp: 1.99927 | loss: 1.1724| constrast_loss: 4.60287| div_loss: 0.86711| %_mask_idx: 0.42011| ppl: 85.05111| %_neg_is_pos: 0.02449| lr: 4e-05| temp: 1.99926 | loss: 1.17297| constrast_loss: 4.60716| div_loss: 0.84713| %_mask_idx: 0.39959| ppl: 97.83882| %_neg_is_pos: 0.0305| lr: 4e-05| temp: 1.99926 | loss: 1.17027| constrast_loss: 4.59425| div_loss: 0.86823| %_mask_idx: 0.38957| ppl: 84.3298| %_neg_is_pos: 0.04523| lr: 4e-05| temp: 1.99925 | loss: 1.17188| constrast_loss: 4.60146| div_loss: 0.8604| %_mask_idx: 0.40226| ppl: 89.3433| %_neg_is_pos: 0.03977| lr: 4e-05| temp: 1.99925 | loss: 1.17026| constrast_loss: 4.59425| div_loss: 0.86788| %_mask_idx: 0.44048| ppl: 84.55753| %_neg_is_pos: 0.03277| lr: 4e-05| temp: 1.99924 | loss: 1.17252| constrast_loss: 4.60361| div_loss: 0.86467| %_mask_idx: 0.39646| ppl: 86.61179| %_neg_is_pos: 0.02758| lr: 4e-05| temp: 1.99924 | loss: 1.17202| constrast_loss: 4.60218| div_loss: 0.8589| %_mask_idx: 0.37594| ppl: 90.30325| %_neg_is_pos: 0.03447| lr: 4e-05| temp: 1.99922 | loss: 1.17013| constrast_loss: 4.59372| div_loss: 0.86811| %_mask_idx: 0.34054| ppl: 84.40642| %_neg_is_pos: 0.04354| lr: 4e-05| temp: 1.99922 | loss: 1.17375| constrast_loss: 4.60932| div_loss: 0.85668| %_mask_idx: 0.4245| ppl: 91.72482| %_neg_is_pos: 0.01446| lr: 4e-05| temp: 1.99921 | loss: 1.17237| constrast_loss: 4.60311| div_loss: 0.86378| %_mask_idx: 0.41917| ppl: 87.18323| %_neg_is_pos: 0.02936| lr: 4e-05| temp: 1.99921 | loss: 1.17118| constrast_loss: 4.59778| div_loss: 0.86932| %_mask_idx: 0.38675| ppl: 83.63232| %_neg_is_pos: 0.04072| lr: 4e-05| temp: 1.9992 | loss: 1.17354| constrast_loss: 4.60967| div_loss: 0.84474| %_mask_idx: 0.40461| ppl: 99.36648| %_neg_is_pos: 0.02485| lr: 4e-05| temp: 1.9992 | loss: 1.17103| constrast_loss: 4.59815| div_loss: 0.85974| %_mask_idx: 0.38518| ppl: 89.76608| %_neg_is_pos: 0.04577| lr: 4e-05| temp: 1.99919 | loss: 1.17199| constrast_loss: 4.60218| div_loss: 0.8577| %_mask_idx: 0.37578| ppl: 91.07265| %_neg_is_pos: 0.0487| lr: 4e-05| temp: 1.99919 | loss: 1.17115| constrast_loss: 4.59815| div_loss: 0.8647| %_mask_idx: 0.35511| ppl: 86.59425| %_neg_is_pos: 0.04846| lr: 4e-05| temp: 1.99917 | loss: 1.17074| constrast_loss: 4.5959| div_loss: 0.87055| %_mask_idx: 0.33631| ppl: 82.85058| %_neg_is_pos: 0.04417| lr: 4e-05| temp: 1.99917 | loss: 1.17083| constrast_loss: 4.59712| div_loss: 0.86197| %_mask_idx: 0.35182| ppl: 88.3388| %_neg_is_pos: 0.05113| lr: 4e-05| temp: 1.99916 | loss: 1.17318| constrast_loss: 4.6074| div_loss: 0.85298| %_mask_idx: 0.414| ppl: 94.09529| %_neg_is_pos: 0.02183| lr: 4e-05| temp: 1.99916 | loss: 1.17165| constrast_loss: 4.60089| div_loss: 0.85707| %_mask_idx: 0.41009| ppl: 91.47368| %_neg_is_pos: 0.03622| lr: 4e-05| temp: 1.99915 | loss: 1.17044| constrast_loss: 4.59473| div_loss: 0.87053| %_mask_idx: 0.37014| ppl: 82.86288| %_neg_is_pos: 0.03549| lr: 4e-05| temp: 1.99915 | loss: 1.17349| constrast_loss: 4.6089| div_loss: 0.85058| %_mask_idx: 0.37375| ppl: 95.63095| %_neg_is_pos: 0.0293| lr: 4e-05| temp: 1.99914 | loss: 1.17226| constrast_loss: 4.60232| div_loss: 0.86717| %_mask_idx: 0.40085| ppl: 85.01165| %_neg_is_pos: 0.03096| lr: 4e-05| temp: 1.99914 | loss: 1.1743| constrast_loss: 4.61167| div_loss: 0.85528| %_mask_idx: 0.41385| ppl: 92.62087| %_neg_is_pos: 0.01031| lr: 4e-05| temp: 1.99912 | loss: 1.17313| constrast_loss: 4.60686| div_loss: 0.85675| %_mask_idx: 0.37798| ppl: 91.67754| %_neg_is_pos: 0.03251| lr: 4e-05| temp: 1.99912 | loss: 1.17217| constrast_loss: 4.60259| div_loss: 0.86108| %_mask_idx: 0.39458| ppl: 88.91098| %_neg_is_pos: 0.03072| lr: 4e-05| temp: 1.99911 | loss: 1.16991| constrast_loss: 4.59261| div_loss: 0.87033| %_mask_idx: 0.35197| ppl: 82.98979| %_neg_is_pos: 0.05481| lr: 4e-05| temp: 1.99911 | loss: 1.17195| constrast_loss: 4.60146| div_loss: 0.86357| %_mask_idx: 0.40132| ppl: 87.31404| %_neg_is_pos: 0.03095| lr: 5e-05| temp: 1.99909 | loss: 1.1727| constrast_loss: 4.60529| div_loss: 0.8549| %_mask_idx: 0.37249| ppl: 92.86414| %_neg_is_pos: 0.02649| lr: 5e-05| temp: 1.99909 | loss: 1.17231| constrast_loss: 4.60278| div_loss: 0.86464| %_mask_idx: 0.34727| ppl: 86.63258| %_neg_is_pos: 0.03546| lr: 5e-05| temp: 1.99908 | loss: 1.17314| constrast_loss: 4.60668| div_loss: 0.85883| %_mask_idx: 0.41855| ppl: 90.3466| %_neg_is_pos: 0.01473| lr: 5e-05| temp: 1.99908 | loss: 1.17312| constrast_loss: 4.6069| div_loss: 0.85601| %_mask_idx: 0.35636| ppl: 92.15088| %_neg_is_pos: 0.01781| lr: 5e-05| temp: 1.99907 | loss: 1.17191| constrast_loss: 4.60172| div_loss: 0.85906| %_mask_idx: 0.36137| ppl: 90.20335| %_neg_is_pos: 0.03706| lr: 5e-05| temp: 1.99907 | loss: 1.17361| constrast_loss: 4.6079| div_loss: 0.86552| %_mask_idx: 0.37328| ppl: 86.06757| %_neg_is_pos: 0.03302| lr: 5e-05| temp: 1.99906 | loss: 1.17083| constrast_loss: 4.59658| div_loss: 0.8673| %_mask_idx: 0.38017| ppl: 84.92503| %_neg_is_pos: 0.05077| lr: 5e-05| temp: 1.99906 | loss: 1.1728| constrast_loss: 4.60561| div_loss: 0.85595| %_mask_idx: 0.37986| ppl: 92.19234| %_neg_is_pos: 0.02976| lr: 5e-05| temp: 1.99904 | loss: 1.17128| constrast_loss: 4.59938| div_loss: 0.85731| %_mask_idx: 0.40085| ppl: 91.31873| %_neg_is_pos: 0.03882| lr: 5e-05| temp: 1.99904 | loss: 1.17233| constrast_loss: 4.60394| div_loss: 0.85397| %_mask_idx: 0.39364| ppl: 93.4616| %_neg_is_pos: 0.03458| lr: 5e-05| temp: 1.99903 | loss: 1.17188| constrast_loss: 4.60132| div_loss: 0.86212| %_mask_idx: 0.41526| ppl: 88.24014| %_neg_is_pos: 0.02446| lr: 5e-05| temp: 1.99903 | loss: 1.17354| constrast_loss: 4.60784| div_loss: 0.86324| %_mask_idx: 0.40398| ppl: 87.52479| %_neg_is_pos: 0.01944| lr: 5e-05| temp: 1.99902 | loss: 1.17155| constrast_loss: 4.59984| div_loss: 0.86364| %_mask_idx: 0.37657| ppl: 87.26924| %_neg_is_pos: 0.0422| lr: 5e-05| temp: 1.99902 | loss: 1.16906| constrast_loss: 4.58939| div_loss: 0.86847| %_mask_idx: 0.3739| ppl: 84.17699| %_neg_is_pos: 0.0636| lr: 5e-05| temp: 1.99901 | loss: 1.17121| constrast_loss: 4.59764| div_loss: 0.87199| %_mask_idx: 0.38534| ppl: 81.92712| %_neg_is_pos: 0.045| lr: 5e-05| temp: 1.99901 | loss: 1.169| constrast_loss: 4.5894| div_loss: 0.86595| %_mask_idx: 0.35041| ppl: 85.79281| %_neg_is_pos: 0.05418| lr: 5e-05| temp: 1.99899 | loss: 1.17137| constrast_loss: 4.59963| div_loss: 0.85834| %_mask_idx: 0.38549| ppl: 90.65922| %_neg_is_pos: 0.0459| lr: 5e-05| temp: 1.99899 | loss: 1.17295| constrast_loss: 4.60622| div_loss: 0.85579| %_mask_idx: 0.43202| ppl: 92.29183| %_neg_is_pos: 0.02578| lr: 5e-05| temp: 1.99898 | loss: 1.17196| constrast_loss: 4.60204| div_loss: 0.85822| %_mask_idx: 0.39881| ppl: 90.74142| %_neg_is_pos: 0.03232| lr: 5e-05| temp: 1.99898 | loss: 1.1729| constrast_loss: 4.60589| div_loss: 0.85713| %_mask_idx: 0.40492| ppl: 91.43864| %_neg_is_pos: 0.02994| lr: 5e-05| temp: 1.99897 | loss: 1.17245| constrast_loss: 4.60369| div_loss: 0.86106| %_mask_idx: 0.39677| ppl: 88.92412| %_neg_is_pos: 0.02976| lr: 5e-05| temp: 1.99897 | loss: 1.17194| constrast_loss: 4.60141| div_loss: 0.86348| %_mask_idx: 0.401| ppl: 87.37152| %_neg_is_pos: 0.03509| lr: 5e-05| temp: 1.99896 | loss: 1.17234| constrast_loss: 4.60374| div_loss: 0.85615| %_mask_idx: 0.38675| ppl: 92.06203| %_neg_is_pos: 0.03902| lr: 5e-05| temp: 1.99896 | loss: 1.17346| constrast_loss: 4.60827| div_loss: 0.85568| %_mask_idx: 0.36779| ppl: 92.36694| %_neg_is_pos: 0.02976| lr: 5e-05| temp: 1.99894 | loss: 1.17181| constrast_loss: 4.60117| div_loss: 0.86083| %_mask_idx: 0.38471| ppl: 89.07178| %_neg_is_pos: 0.03421| lr: 5e-05| temp: 1.99894 | loss: 1.17003| constrast_loss: 4.59352| div_loss: 0.86617| %_mask_idx: 0.40335| ppl: 85.64892| %_neg_is_pos: 0.04597| lr: 5e-05| temp: 1.99893 | loss: 1.17112| constrast_loss: 4.59781| div_loss: 0.86687| %_mask_idx: 0.39286| ppl: 85.20495| %_neg_is_pos: 0.03716| lr: 5e-05| temp: 1.99893 | loss: 1.17083| constrast_loss: 4.59714| div_loss: 0.86185| %_mask_idx: 0.40132| ppl: 88.41391| %_neg_is_pos: 0.03525| lr: 5e-05| temp: 1.99891 | loss: 1.17073| constrast_loss: 4.59514| div_loss: 0.87782| %_mask_idx: 0.36638| ppl: 78.19512| %_neg_is_pos: 0.03931| lr: 5e-05| temp: 1.99891 | loss: 1.16995| constrast_loss: 4.5922| div_loss: 0.87588| %_mask_idx: 0.38456| ppl: 79.43959| %_neg_is_pos: 0.04861| lr: 6e-05| temp: 1.9989 | loss: 1.171| constrast_loss: 4.59807| div_loss: 0.8593| %_mask_idx: 0.32832| ppl: 90.04594| %_neg_is_pos: 0.04604| lr: 6e-05| temp: 1.9989 | loss: 1.17205| constrast_loss: 4.6015| div_loss: 0.86712| %_mask_idx: 0.40695| ppl: 85.04396| %_neg_is_pos: 0.02714| lr: 6e-05| temp: 1.99889 | loss: 1.17346| constrast_loss: 4.60817| div_loss: 0.85666| %_mask_idx: 0.37688| ppl: 91.73979| %_neg_is_pos: 0.02763| lr: 6e-05| temp: 1.99889 | loss: 1.1732| constrast_loss: 4.60801| div_loss: 0.84802| %_mask_idx: 0.42544| ppl: 97.26487| %_neg_is_pos: 0.02105| lr: 6e-05| temp: 1.99888 | loss: 1.17216| constrast_loss: 4.60267| div_loss: 0.85969| %_mask_idx: 0.3692| ppl: 89.79901| %_neg_is_pos: 0.03076| lr: 6e-05| temp: 1.99888 | loss: 1.17108| constrast_loss: 4.59783| div_loss: 0.86476| %_mask_idx: 0.43092| ppl: 86.55247| %_neg_is_pos: 0.02733| lr: 6e-05| temp: 1.99886 | loss: 1.17156| constrast_loss: 4.599| div_loss: 0.87243| %_mask_idx: 0.40617| ppl: 81.64401| %_neg_is_pos: 0.03771| lr: 6e-05| temp: 1.99886 | loss: 1.17428| constrast_loss: 4.61195| div_loss: 0.85155| %_mask_idx: 0.38315| ppl: 95.00526| %_neg_is_pos: 0.02189| lr: 6e-05| temp: 1.99885 | loss: 1.16954| constrast_loss: 4.59142| div_loss: 0.86755| %_mask_idx: 0.38643| ppl: 84.76587| %_neg_is_pos: 0.05252| lr: 6e-05| temp: 1.99885 | loss: 1.17118| constrast_loss: 4.59826| div_loss: 0.86442| %_mask_idx: 0.38095| ppl: 86.77023| %_neg_is_pos: 0.04269| lr: 6e-05| temp: 1.99884 | loss: 1.17244| constrast_loss: 4.60288| div_loss: 0.86891| %_mask_idx: 0.34398| ppl: 83.89486| %_neg_is_pos: 0.0343| lr: 6e-05| temp: 1.99884 | loss: 1.17176| constrast_loss: 4.60028| div_loss: 0.86755| %_mask_idx: 0.44659| ppl: 84.76722| %_neg_is_pos: 0.02143| lr: 6e-05| temp: 1.99883 | loss: 1.17075| constrast_loss: 4.59602| div_loss: 0.86982| %_mask_idx: 0.36607| ppl: 83.31275| %_neg_is_pos: 0.04409| lr: 6e-05| temp: 1.99883 | loss: 1.17206| constrast_loss: 4.60166| div_loss: 0.86587| %_mask_idx: 0.39881| ppl: 85.84221| %_neg_is_pos: 0.02977| lr: 6e-05| temp: 1.99881 | loss: 1.17159| constrast_loss: 4.59988| div_loss: 0.86482| %_mask_idx: 0.41714| ppl: 86.51615| %_neg_is_pos: 0.03568| lr: 6e-05| temp: 1.99881 | loss: 1.17265| constrast_loss: 4.60517| div_loss: 0.85422| %_mask_idx: 0.43546| ppl: 93.30028| %_neg_is_pos: 0.02572| lr: 6e-05| temp: 1.9988 | loss: 1.17255| constrast_loss: 4.60385| div_loss: 0.86355| %_mask_idx: 0.42575| ppl: 87.33006| %_neg_is_pos: 0.02832| lr: 6e-05| temp: 1.9988 | loss: 1.17096| constrast_loss: 4.59685| div_loss: 0.86979| %_mask_idx: 0.40429| ppl: 83.33246| %_neg_is_pos: 0.03816| lr: 6e-05| temp: 1.99879 | loss: 1.17123| constrast_loss: 4.59782| div_loss: 0.87082| %_mask_idx: 0.37312| ppl: 82.67305| %_neg_is_pos: 0.03854| lr: 6e-05| temp: 1.99879 | loss: 1.17152| constrast_loss: 4.60013| div_loss: 0.85965| %_mask_idx: 0.36858| ppl: 89.82468| %_neg_is_pos: 0.04745| lr: 6e-05| temp: 1.99878 | loss: 1.17357| constrast_loss: 4.60736| div_loss: 0.86927| %_mask_idx: 0.34367| ppl: 83.66785| %_neg_is_pos: 0.03236| lr: 6e-05| temp: 1.99878 | loss: 1.17046| constrast_loss: 4.59524| div_loss: 0.86619| %_mask_idx: 0.33177| ppl: 85.6362| %_neg_is_pos: 0.05349| lr: 6e-05| temp: 1.99876 | loss: 1.17323| constrast_loss: 4.606| div_loss: 0.86931| %_mask_idx: 0.34712| ppl: 83.63935| %_neg_is_pos: 0.02983| lr: 6e-05| temp: 1.99876 | loss: 1.17034| constrast_loss: 4.59464| div_loss: 0.86714| %_mask_idx: 0.37328| ppl: 85.02991| %_neg_is_pos: 0.04815| lr: 6e-05| temp: 1.99875 | loss: 1.17177| constrast_loss: 4.6001| div_loss: 0.8697| %_mask_idx: 0.34336| ppl: 83.39519| %_neg_is_pos: 0.04765| lr: 6e-05| temp: 1.99875 [2021-09-01 16:10:15,247] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 8388608.0, reducing to 4194304.0 [2021-09-01 16:10:15,247] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 8388608.0, reducing to 4194304.0 | loss: 1.17224| constrast_loss: 4.60241| div_loss: 0.8656| %_mask_idx: 0.35949| ppl: 86.01823| %_neg_is_pos: 0.0406| lr: 6e-05| temp: 1.99873 | loss: 1.1691| constrast_loss: 4.58934| div_loss: 0.87055| %_mask_idx: 0.42716| ppl: 82.84634| %_neg_is_pos: 0.04416| lr: 6e-05| temp: 1.99873 | loss: 1.17312| constrast_loss: 4.60768| div_loss: 0.84797| %_mask_idx: 0.41996| ppl: 97.29803| %_neg_is_pos: 0.02313| lr: 6e-05| temp: 1.99872 | loss: 1.17367| constrast_loss: 4.61019| div_loss: 0.84488| %_mask_idx: 0.42732| ppl: 99.2796| %_neg_is_pos: 0.0207| lr: 6e-05| temp: 1.99872 | loss: 1.17316| constrast_loss: 4.60892| div_loss: 0.83719| %_mask_idx: 0.39004| ppl: 104.19592| %_neg_is_pos: 0.0132| lr: 6e-05| temp: 1.99871 | loss: 1.17164| constrast_loss: 4.60272| div_loss: 0.83835| %_mask_idx: 0.3515| ppl: 103.45877| %_neg_is_pos: 0.02873| lr: 6e-05| temp: 1.99871 | loss: 1.17176| constrast_loss: 4.60388| div_loss: 0.83144| %_mask_idx: 0.34273| ppl: 107.87695| %_neg_is_pos: 0.02115| lr: 7e-05| temp: 1.9987 | loss: 1.17268| constrast_loss: 4.60736| div_loss: 0.83355| %_mask_idx: 0.39865| ppl: 106.52897| %_neg_is_pos: 0.01313| lr: 7e-05| temp: 1.9987 | loss: 1.17311| constrast_loss: 4.61078| div_loss: 0.81645| %_mask_idx: 0.36372| ppl: 117.47179| %_neg_is_pos: 0.02498| lr: 7e-05| temp: 1.99868 | loss: 1.17243| constrast_loss: 4.60712| div_loss: 0.82608| %_mask_idx: 0.41259| ppl: 111.31165| %_neg_is_pos: 0.02234| lr: 7e-05| temp: 1.99868 | loss: 1.16927| constrast_loss: 4.5953| div_loss: 0.81772| %_mask_idx: 0.35479| ppl: 116.65913| %_neg_is_pos: 0.04529| lr: 7e-05| temp: 1.99867 | loss: 1.16974| constrast_loss: 4.59584| div_loss: 0.83131| %_mask_idx: 0.44799| ppl: 107.95876| %_neg_is_pos: 0.0287| lr: 7e-05| temp: 1.99867 | loss: 1.17233| constrast_loss: 4.60781| div_loss: 0.81495| %_mask_idx: 0.40962| ppl: 118.42995| %_neg_is_pos: 0.01556| lr: 7e-05| temp: 1.99866 | loss: 1.17158| constrast_loss: 4.60469| div_loss: 0.81623| %_mask_idx: 0.38127| ppl: 117.614| %_neg_is_pos: 0.01703| lr: 7e-05| temp: 1.99866 | loss: 1.16957| constrast_loss: 4.59739| div_loss: 0.80891| %_mask_idx: 0.40147| ppl: 122.29533| %_neg_is_pos: 0.03357| lr: 7e-05| temp: 1.99865 | loss: 1.1702| constrast_loss: 4.60081| div_loss: 0.79996| %_mask_idx: 0.41698| ppl: 128.02766| %_neg_is_pos: 0.01597| lr: 7e-05| temp: 1.99865 | loss: 1.1663| constrast_loss: 4.58594| div_loss: 0.79249| %_mask_idx: 0.32566| ppl: 132.80457| %_neg_is_pos: 0.04568| lr: 7e-05| temp: 1.99863 | loss: 1.17007| constrast_loss: 4.60131| div_loss: 0.78986| %_mask_idx: 0.34367| ppl: 134.48657| %_neg_is_pos: 0.02871| lr: 7e-05| temp: 1.99863 | loss: 1.16944| constrast_loss: 4.60074| div_loss: 0.77032| %_mask_idx: 0.34947| ppl: 146.99725| %_neg_is_pos: 0.03061| lr: 7e-05| temp: 1.99862 | loss: 1.16876| constrast_loss: 4.59858| div_loss: 0.76474| %_mask_idx: 0.47556| ppl: 150.56496| %_neg_is_pos: 0.01567| lr: 7e-05| temp: 1.99862 | loss: 1.16907| constrast_loss: 4.60114| div_loss: 0.75133| %_mask_idx: 0.40664| ppl: 159.14998| %_neg_is_pos: 0.01834| lr: 7e-05| temp: 1.99861 | loss: 1.1689| constrast_loss: 4.59959| div_loss: 0.76019| %_mask_idx: 0.35777| ppl: 153.47906| %_neg_is_pos: 0.0167| lr: 7e-05| temp: 1.99861 | loss: 1.16563| constrast_loss: 4.58355| div_loss: 0.78958| %_mask_idx: 0.35636| ppl: 134.66904| %_neg_is_pos: 0.03601| lr: 7e-05| temp: 1.9986 | loss: 1.16905| constrast_loss: 4.60212| div_loss: 0.74066| %_mask_idx: 0.38643| ppl: 165.97583| %_neg_is_pos: 0.01093| lr: 7e-05| temp: 1.9986 | loss: 1.16684| constrast_loss: 4.59264| div_loss: 0.74728| %_mask_idx: 0.45457| ppl: 161.739| %_neg_is_pos: 0.02854| lr: 7e-05| temp: 1.99858 | loss: 1.16692| constrast_loss: 4.59563| div_loss: 0.72058| %_mask_idx: 0.37688| ppl: 178.82721| %_neg_is_pos: 0.023| lr: 7e-05| temp: 1.99858 | loss: 1.15646| constrast_loss: 4.55318| div_loss: 0.72677| %_mask_idx: 0.38221| ppl: 174.86464| %_neg_is_pos: 0.05191| lr: 7e-05| temp: 1.99857 | loss: 1.15707| constrast_loss: 4.5513| div_loss: 0.76983| %_mask_idx: 0.3786| ppl: 147.30597| %_neg_is_pos: 0.05527| lr: 7e-05| temp: 1.99857 | loss: 1.14725| constrast_loss: 4.51576| div_loss: 0.73257| %_mask_idx: 0.36795| ppl: 171.15265| %_neg_is_pos: 0.06586| lr: 7e-05| temp: 1.99855 | loss: 1.15512| constrast_loss: 4.54826| div_loss: 0.7223| %_mask_idx: 0.401| ppl: 177.73068| %_neg_is_pos: 0.05406| lr: 7e-05| temp: 1.99855 | loss: 1.16308| constrast_loss: 4.58622| div_loss: 0.66121| %_mask_idx: 0.35855| ppl: 216.82715| %_neg_is_pos: 0.02865| lr: 7e-05| temp: 1.99854 | loss: 1.15861| constrast_loss: 4.56338| div_loss: 0.71046| %_mask_idx: 0.4245| ppl: 185.3073| %_neg_is_pos: 0.02633| lr: 7e-05| temp: 1.99854 | loss: 1.15418| constrast_loss: 4.54759| div_loss: 0.6912| %_mask_idx: 0.38737| ppl: 197.63158| %_neg_is_pos: 0.04878| lr: 7e-05| temp: 1.99853 | loss: 1.15492| constrast_loss: 4.55022| div_loss: 0.69462| %_mask_idx: 0.41792| ppl: 195.44273| %_neg_is_pos: 0.05637| lr: 7e-05| temp: 1.99853 | loss: 1.15892| constrast_loss: 4.56751| div_loss: 0.68155| %_mask_idx: 0.3938| ppl: 203.80681| %_neg_is_pos: 0.03554| lr: 7e-05| temp: 1.99852 | loss: 1.158| constrast_loss: 4.56339| div_loss: 0.68629| %_mask_idx: 0.38628| ppl: 200.77515| %_neg_is_pos: 0.05391| lr: 7e-05| temp: 1.99852 | loss: 1.1582| constrast_loss: 4.55965| div_loss: 0.73145| %_mask_idx: 0.36341| ppl: 171.87454| %_neg_is_pos: 0.05684| lr: 8e-05| temp: 1.9985 | loss: 1.15734| constrast_loss: 4.55991| div_loss: 0.69447| %_mask_idx: 0.43327| ppl: 195.53851| %_neg_is_pos: 0.03867| lr: 8e-05| temp: 1.9985 | loss: 1.15282| constrast_loss: 4.53983| div_loss: 0.71435| %_mask_idx: 0.40398| ppl: 182.81908| %_neg_is_pos: 0.04687| lr: 8e-05| temp: 1.99849 | loss: 1.14206| constrast_loss: 4.49113| div_loss: 0.77099| %_mask_idx: 0.3537| ppl: 146.56601| %_neg_is_pos: 0.07319| lr: 8e-05| temp: 1.99849 | loss: 1.15885| constrast_loss: 4.56728| div_loss: 0.68134| %_mask_idx: 0.41212| ppl: 203.94446| %_neg_is_pos: 0.02156| lr: 8e-05| temp: 1.99848 | loss: 1.15313| constrast_loss: 4.5424| div_loss: 0.70111| %_mask_idx: 0.42888| ppl: 191.29097| %_neg_is_pos: 0.03283| lr: 8e-05| temp: 1.99848 | loss: 1.15508| constrast_loss: 4.54932| div_loss: 0.7101| %_mask_idx: 0.40288| ppl: 185.53839| %_neg_is_pos: 0.05633| lr: 8e-05| temp: 1.99847 | loss: 1.16099| constrast_loss: 4.57218| div_loss: 0.71773| %_mask_idx: 0.38252| ppl: 180.6541| %_neg_is_pos: 0.05722| lr: 8e-05| temp: 1.99847 | loss: 1.15452| constrast_loss: 4.54497| div_loss: 0.73117| %_mask_idx: 0.40445| ppl: 172.05038| %_neg_is_pos: 0.04213| lr: 8e-05| temp: 1.99845 | loss: 1.15339| constrast_loss: 4.54182| div_loss: 0.71761| %_mask_idx: 0.35511| ppl: 180.73125| %_neg_is_pos: 0.06948| lr: 8e-05| temp: 1.99845 | loss: 1.16086| constrast_loss: 4.57337| div_loss: 0.70061| %_mask_idx: 0.36717| ppl: 191.61276| %_neg_is_pos: 0.04524| lr: 8e-05| temp: 1.99844 | loss: 1.15438| constrast_loss: 4.54642| div_loss: 0.71091| %_mask_idx: 0.40351| ppl: 185.01755| %_neg_is_pos: 0.05282| lr: 8e-05| temp: 1.99844 | loss: 1.1611| constrast_loss: 4.57447| div_loss: 0.69937| %_mask_idx: 0.39474| ppl: 192.40314| %_neg_is_pos: 0.03348| lr: 8e-05| temp: 1.99843 | loss: 1.1529| constrast_loss: 4.53987| div_loss: 0.7174| %_mask_idx: 0.4245| ppl: 180.86487| %_neg_is_pos: 0.04751| lr: 8e-05| temp: 1.99843 | loss: 1.15076| constrast_loss: 4.53035| div_loss: 0.72697| %_mask_idx: 0.42544| ppl: 174.74072| %_neg_is_pos: 0.05507| lr: 8e-05| temp: 1.99842 | loss: 1.15623| constrast_loss: 4.5561| div_loss: 0.68831| %_mask_idx: 0.39019| ppl: 199.48151| %_neg_is_pos: 0.03274| lr: 8e-05| temp: 1.99842 | loss: 1.15506| constrast_loss: 4.54847| div_loss: 0.71782| %_mask_idx: 0.41338| ppl: 180.59447| %_neg_is_pos: 0.06958| lr: 8e-05| temp: 1.9984 | loss: 1.15338| constrast_loss: 4.54192| div_loss: 0.71615| %_mask_idx: 0.38831| ppl: 181.66235| %_neg_is_pos: 0.05599| lr: 8e-05| temp: 1.9984 | loss: 1.14859| constrast_loss: 4.51762| div_loss: 0.76735| %_mask_idx: 0.37719| ppl: 148.89803| %_neg_is_pos: 0.06635| lr: 8e-05| temp: 1.99839 | loss: 1.15188| constrast_loss: 4.53673| div_loss: 0.70807| %_mask_idx: 0.35526| ppl: 186.83649| %_neg_is_pos: 0.06172| lr: 8e-05| temp: 1.99839 | loss: 1.14953| constrast_loss: 4.52806| div_loss: 0.70058| %_mask_idx: 0.42873| ppl: 191.63184| %_neg_is_pos: 0.04992| lr: 8e-05| temp: 1.99837 | loss: 1.14583| constrast_loss: 4.5104| div_loss: 0.72909| %_mask_idx: 0.35636| ppl: 173.38367| %_neg_is_pos: 0.0731| lr: 8e-05| temp: 1.99837 | loss: 1.14697| constrast_loss: 4.5122| div_loss: 0.75686| %_mask_idx: 0.39411| ppl: 155.61252| %_neg_is_pos: 0.05785| lr: 8e-05| temp: 1.99836 | loss: 1.15838| constrast_loss: 4.56463| div_loss: 0.68895| %_mask_idx: 0.36216| ppl: 199.07236| %_neg_is_pos: 0.04803| lr: 8e-05| temp: 1.99836 | loss: 1.15049| constrast_loss: 4.5276| div_loss: 0.74363| %_mask_idx: 0.39724| ppl: 164.07669| %_neg_is_pos: 0.0515| lr: 8e-05| temp: 1.99835 | loss: 1.15886| constrast_loss: 4.56575| div_loss: 0.69689| %_mask_idx: 0.42763| ppl: 193.98831| %_neg_is_pos: 0.03214| lr: 8e-05| temp: 1.99835 | loss: 1.15474| constrast_loss: 4.54853| div_loss: 0.70421| %_mask_idx: 0.35135| ppl: 189.30783| %_neg_is_pos: 0.06287| lr: 8e-05| temp: 1.99834 | loss: 1.15057| constrast_loss: 4.52904| div_loss: 0.73231| %_mask_idx: 0.42027| ppl: 171.31961| %_neg_is_pos: 0.04018| lr: 8e-05| temp: 1.99834 | loss: 1.14442| constrast_loss: 4.50309| div_loss: 0.74578| %_mask_idx: 0.43546| ppl: 162.7021| %_neg_is_pos: 0.0745| lr: 8e-05| temp: 1.99832 | loss: 1.14985| constrast_loss: 4.52316| div_loss: 0.76257| %_mask_idx: 0.38346| ppl: 151.9559| %_neg_is_pos: 0.0703| lr: 8e-05| temp: 1.99832 | loss: 1.14912| constrast_loss: 4.52648| div_loss: 0.70001| %_mask_idx: 0.37171| ppl: 191.99136| %_neg_is_pos: 0.05711| lr: 8e-05| temp: 1.99831 | loss: 1.16425| constrast_loss: 4.58914| div_loss: 0.6786| %_mask_idx: 0.41087| ppl: 205.69458| %_neg_is_pos: 0.02834| lr: 8e-05| temp: 1.99831 | loss: 1.15568| constrast_loss: 4.5503| div_loss: 0.72405| %_mask_idx: 0.43202| ppl: 176.60822| %_neg_is_pos: 0.04377| lr: 8e-05| temp: 1.9983 | loss: 1.15355| constrast_loss: 4.54356| div_loss: 0.70632| %_mask_idx: 0.38581| ppl: 187.95239| %_neg_is_pos: 0.04965| lr: 8e-05| temp: 1.9983 | loss: 1.15179| constrast_loss: 4.5338| div_loss: 0.73374| %_mask_idx: 0.40147| ppl: 170.4068| %_neg_is_pos: 0.04773| lr: 9e-05| temp: 1.99829 | loss: 1.15878| constrast_loss: 4.56439| div_loss: 0.70733| %_mask_idx: 0.37547| ppl: 187.30591| %_neg_is_pos: 0.04293| lr: 9e-05| temp: 1.99829 | loss: 1.15355| constrast_loss: 4.5429| div_loss: 0.71283| %_mask_idx: 0.3656| ppl: 183.79091| %_neg_is_pos: 0.07343| lr: 9e-05| temp: 1.99827 | loss: 1.1639| constrast_loss: 4.58519| div_loss: 0.70425| %_mask_idx: 0.43625| ppl: 189.27762| %_neg_is_pos: 0.04416| lr: 9e-05| temp: 1.99827 | loss: 1.15446| constrast_loss: 4.54565| div_loss: 0.72172| %_mask_idx: 0.38847| ppl: 178.09769| %_neg_is_pos: 0.03555| lr: 9e-05| temp: 1.99826 | loss: 1.1609| constrast_loss: 4.57612| div_loss: 0.67481| %_mask_idx: 0.37469| ppl: 208.11885| %_neg_is_pos: 0.0468| lr: 9e-05| temp: 1.99826 | loss: 1.15165| constrast_loss: 4.53595| div_loss: 0.70651| %_mask_idx: 0.39897| ppl: 187.8309| %_neg_is_pos: 0.03804| lr: 9e-05| temp: 1.99825 | loss: 1.15037| constrast_loss: 4.52993| div_loss: 0.71529| %_mask_idx: 0.38518| ppl: 182.21744| %_neg_is_pos: 0.0755| lr: 9e-05| temp: 1.99825 | loss: 1.15354| constrast_loss: 4.54446| div_loss: 0.69718| %_mask_idx: 0.3985| ppl: 193.80795| %_neg_is_pos: 0.03498| lr: 9e-05| temp: 1.99824 | loss: 1.15527| constrast_loss: 4.55| div_loss: 0.71079| %_mask_idx: 0.39552| ppl: 185.09489| %_neg_is_pos: 0.04037| lr: 9e-05| temp: 1.99824 | loss: 1.15617| constrast_loss: 4.55483| div_loss: 0.69846| %_mask_idx: 0.40633| ppl: 192.98416| %_neg_is_pos: 0.05115| lr: 9e-05| temp: 1.99822 | loss: 1.14872| constrast_loss: 4.52152| div_loss: 0.73366| %_mask_idx: 0.35448| ppl: 170.45535| %_neg_is_pos: 0.0517| lr: 9e-05| temp: 1.99822 | loss: 1.15252| constrast_loss: 4.53465| div_loss: 0.75436| %_mask_idx: 0.37751| ppl: 157.20921| %_neg_is_pos: 0.05823| lr: 9e-05| temp: 1.99821 | loss: 1.14825| constrast_loss: 4.51637| div_loss: 0.76611| %_mask_idx: 0.3667| ppl: 149.69073| %_neg_is_pos: 0.06823| lr: 9e-05| temp: 1.99821 | loss: 1.16069| constrast_loss: 4.57283| div_loss: 0.6992| %_mask_idx: 0.39662| ppl: 192.51305| %_neg_is_pos: 0.02897| lr: 9e-05| temp: 1.99819 | loss: 1.14444| constrast_loss: 4.50218| div_loss: 0.75572| %_mask_idx: 0.36873| ppl: 156.34061| %_neg_is_pos: 0.06192| lr: 9e-05| temp: 1.99819 | loss: 1.15111| constrast_loss: 4.53149| div_loss: 0.72963| %_mask_idx: 0.43766| ppl: 173.03799| %_neg_is_pos: 0.03686| lr: 9e-05| temp: 1.99818 | loss: 1.15803| constrast_loss: 4.56223| div_loss: 0.69897| %_mask_idx: 0.39599| ppl: 192.65866| %_neg_is_pos: 0.04963| lr: 9e-05| temp: 1.99818 | loss: 1.1512| constrast_loss: 4.53354| div_loss: 0.71258| %_mask_idx: 0.35338| ppl: 183.9505| %_neg_is_pos: 0.07422| lr: 9e-05| temp: 1.99817 | loss: 1.15442| constrast_loss: 4.54684| div_loss: 0.70845| %_mask_idx: 0.36043| ppl: 186.59329| %_neg_is_pos: 0.0456| lr: 9e-05| temp: 1.99817 | loss: 1.14816| constrast_loss: 4.51788| div_loss: 0.74758| %_mask_idx: 0.39975| ppl: 161.54907| %_neg_is_pos: 0.08263| lr: 9e-05| temp: 1.99816 | loss: 1.15939| constrast_loss: 4.56542| div_loss: 0.72133| %_mask_idx: 0.40335| ppl: 178.3508| %_neg_is_pos: 0.04652| lr: 9e-05| temp: 1.99816 | loss: 1.13972| constrast_loss: 4.48303| div_loss: 0.75844| %_mask_idx: 0.3468| ppl: 154.59821| %_neg_is_pos: 0.07879| lr: 9e-05| temp: 1.99814 | loss: 1.157| constrast_loss: 4.55893| div_loss: 0.69055| %_mask_idx: 0.40789| ppl: 198.04984| %_neg_is_pos: 0.03626| lr: 9e-05| temp: 1.99814 | loss: 1.14984| constrast_loss: 4.52337| div_loss: 0.76008| %_mask_idx: 0.36419| ppl: 153.55035| %_neg_is_pos: 0.06759| lr: 9e-05| temp: 1.99813 | loss: 1.15162| constrast_loss: 4.53176| div_loss: 0.7473| %_mask_idx: 0.39583| ppl: 161.72684| %_neg_is_pos: 0.06621| lr: 9e-05| temp: 1.99813 | loss: 1.16245| constrast_loss: 4.58395| div_loss: 0.65838| %_mask_idx: 0.34884| ppl: 218.634| %_neg_is_pos: 0.03352| lr: 9e-05| temp: 1.99812 | loss: 1.14771| constrast_loss: 4.51598| div_loss: 0.74878| %_mask_idx: 0.32112| ppl: 160.77924| %_neg_is_pos: 0.09599| lr: 9e-05| temp: 1.99812 | loss: 1.14955| constrast_loss: 4.52258| div_loss: 0.75613| %_mask_idx: 0.35761| ppl: 156.07709| %_neg_is_pos: 0.05654| lr: 9e-05| temp: 1.99811 | loss: 1.16382| constrast_loss: 4.58781| div_loss: 0.67485| %_mask_idx: 0.43907| ppl: 208.09595| %_neg_is_pos: 0.02749| lr: 9e-05| temp: 1.99811 | loss: 1.16026| constrast_loss: 4.5725| div_loss: 0.68527| %_mask_idx: 0.3974| ppl: 201.42526| %_neg_is_pos: 0.05022| lr: 0.0001| temp: 1.99809 | loss: 1.15161| constrast_loss: 4.53268| div_loss: 0.73757| %_mask_idx: 0.38205| ppl: 167.95383| %_neg_is_pos: 0.05735| lr: 0.0001| temp: 1.99809 | loss: 1.15917| constrast_loss: 4.56605| div_loss: 0.70648| %_mask_idx: 0.43233| ppl: 187.8541| %_neg_is_pos: 0.04767| lr: 0.0001| temp: 1.99808 | loss: 1.15639| constrast_loss: 4.55311| div_loss: 0.72446| %_mask_idx: 0.37578| ppl: 176.34732| %_neg_is_pos: 0.04951| lr: 0.0001| temp: 1.99808 | loss: 1.15328| constrast_loss: 4.5428| div_loss: 0.70306| %_mask_idx: 0.40962| ppl: 190.04364| %_neg_is_pos: 0.06427| lr: 0.0001| temp: 1.99807 | loss: 1.15146| constrast_loss: 4.53229| div_loss: 0.73541| %_mask_idx: 0.3739| ppl: 169.33554| %_neg_is_pos: 0.07292| lr: 0.0001| temp: 1.99807 | loss: 1.15856| constrast_loss: 4.56229| div_loss: 0.71939| %_mask_idx: 0.4032| ppl: 179.59052| %_neg_is_pos: 0.04029| lr: 0.0001| temp: 1.99806 | loss: 1.15702| constrast_loss: 4.5562| div_loss: 0.71869| %_mask_idx: 0.40523| ppl: 180.0361| %_neg_is_pos: 0.05195| lr: 0.0001| temp: 1.99806 | loss: 1.1482| constrast_loss: 4.51828| div_loss: 0.74519| %_mask_idx: 0.36278| ppl: 163.07788| %_neg_is_pos: 0.05726| lr: 0.0001| temp: 1.99804 | loss: 1.15674| constrast_loss: 4.55675| div_loss: 0.70198| %_mask_idx: 0.41353| ppl: 190.73105| %_neg_is_pos: 0.05212| lr: 0.0001| temp: 1.99804 | loss: 1.15431| constrast_loss: 4.54603| div_loss: 0.71225| %_mask_idx: 0.401| ppl: 184.1604| %_neg_is_pos: 0.04954| lr: 0.0001| temp: 1.99803 | loss: 1.15475| constrast_loss: 4.54643| div_loss: 0.72579| %_mask_idx: 0.42027| ppl: 175.49509| %_neg_is_pos: 0.04878| lr: 0.0001| temp: 1.99803 [2021-09-01 16:19:30,643] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 4194304.0, reducing to 2097152.0 [2021-09-01 16:19:30,643] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 4194304.0, reducing to 2097152.0 | loss: 1.15596| constrast_loss: 4.55612| div_loss: 0.67732| %_mask_idx: 0.41118| ppl: 206.51382| %_neg_is_pos: 0.02535| lr: 0.0001| temp: 1.99801 | loss: 1.15027| constrast_loss: 4.52848| div_loss: 0.72616| %_mask_idx: 0.37876| ppl: 175.25922| %_neg_is_pos: 0.05534| lr: 0.0001| temp: 1.99801 | loss: 1.15833| constrast_loss: 4.56296| div_loss: 0.70362| %_mask_idx: 0.41103| ppl: 189.68086| %_neg_is_pos: 0.06654| lr: 0.0001| temp: 1.998 | loss: 1.16223| constrast_loss: 4.58368| div_loss: 0.65226| %_mask_idx: 0.349| ppl: 222.55315| %_neg_is_pos: 0.04491| lr: 0.0001| temp: 1.998 | loss: 1.1641| constrast_loss: 4.58969| div_loss: 0.66698| %_mask_idx: 0.42372| ppl: 213.13173| %_neg_is_pos: 0.02026| lr: 0.0001| temp: 1.99799 | loss: 1.16161| constrast_loss: 4.57439| div_loss: 0.7205| %_mask_idx: 0.37516| ppl: 178.87968| %_neg_is_pos: 0.04436| lr: 0.0001| temp: 1.99799 | loss: 1.16556| constrast_loss: 4.58933| div_loss: 0.72902| %_mask_idx: 0.34038| ppl: 173.42856| %_neg_is_pos: 0.04736| lr: 0.0001| temp: 1.99798 | loss: 1.16414| constrast_loss: 4.58164| div_loss: 0.74941| %_mask_idx: 0.42747| ppl: 160.38081| %_neg_is_pos: 0.03804| lr: 0.0001| temp: 1.99798 | loss: 1.15938| constrast_loss: 4.56199| div_loss: 0.75538| %_mask_idx: 0.43217| ppl: 156.55994| %_neg_is_pos: 0.0695| lr: 0.0001| temp: 1.99796 | loss: 1.15781| constrast_loss: 4.557| div_loss: 0.74252| %_mask_idx: 0.37202| ppl: 164.78876| %_neg_is_pos: 0.0598| lr: 0.0001| temp: 1.99796 | loss: 1.13752| constrast_loss: 4.47506| div_loss: 0.75014| %_mask_idx: 0.42121| ppl: 159.91064| %_neg_is_pos: 0.09555| lr: 0.0001| temp: 1.99795 | loss: 1.13292| constrast_loss: 4.45299| div_loss: 0.78696| %_mask_idx: 0.41009| ppl: 136.34753| %_neg_is_pos: 0.08758| lr: 0.0001| temp: 1.99795 | loss: 1.1307| constrast_loss: 4.44857| div_loss: 0.7422| %_mask_idx: 0.38596| ppl: 164.99008| %_neg_is_pos: 0.08293| lr: 0.0001| temp: 1.99794 | loss: 1.12822| constrast_loss: 4.43649| div_loss: 0.76382| %_mask_idx: 0.39098| ppl: 151.15366| %_neg_is_pos: 0.09546| lr: 0.0001| temp: 1.99794 | loss: 1.11699| constrast_loss: 4.39378| div_loss: 0.74191| %_mask_idx: 0.34727| ppl: 165.17725| %_neg_is_pos: 0.06163| lr: 0.0001| temp: 1.99793 | loss: 1.15372| constrast_loss: 4.54482| div_loss: 0.70069| %_mask_idx: 0.388| ppl: 191.55624| %_neg_is_pos: 0.04653| lr: 0.0001| temp: 1.99793 | loss: 1.12971| constrast_loss: 4.44791| div_loss: 0.70926| %_mask_idx: 0.41165| ppl: 186.07304| %_neg_is_pos: 0.04866| lr: 0.0001| temp: 1.99791 | loss: 1.13013| constrast_loss: 4.44583| div_loss: 0.7469| %_mask_idx: 0.38111| ppl: 161.98645| %_neg_is_pos: 0.06488| lr: 0.0001| temp: 1.99791 | loss: 1.15633| constrast_loss: 4.5596| div_loss: 0.65739| %_mask_idx: 0.37954| ppl: 219.27136| %_neg_is_pos: 0.02728| lr: 0.0001| temp: 1.9979 | loss: 1.16024| constrast_loss: 4.57394| div_loss: 0.67003| %_mask_idx: 0.40993| ppl: 211.17989| %_neg_is_pos: 0.02237| lr: 0.0001| temp: 1.9979 | loss: 1.16689| constrast_loss: 4.59755| div_loss: 0.70025| %_mask_idx: 0.40617| ppl: 191.84052| %_neg_is_pos: 0.04438| lr: 0.00011| temp: 1.99789 | loss: 1.16798| constrast_loss: 4.60228| div_loss: 0.69617| %_mask_idx: 0.40742| ppl: 194.45317| %_neg_is_pos: 0.03166| lr: 0.00011| temp: 1.99789 | loss: 1.16087| constrast_loss: 4.56954| div_loss: 0.73953| %_mask_idx: 0.31532| ppl: 166.69827| %_neg_is_pos: 0.07162| lr: 0.00011| temp: 1.99788 | loss: 1.16135| constrast_loss: 4.57076| div_loss: 0.7465| %_mask_idx: 0.39897| ppl: 162.23705| %_neg_is_pos: 0.06221| lr: 0.00011| temp: 1.99788 | loss: 1.14921| constrast_loss: 4.5202| div_loss: 0.76623| %_mask_idx: 0.39176| ppl: 149.61087| %_neg_is_pos: 0.06959| lr: 0.00011| temp: 1.99786 | loss: 1.14894| constrast_loss: 4.52196| div_loss: 0.73791| %_mask_idx: 0.37845| ppl: 167.7345| %_neg_is_pos: 0.06295| lr: 0.00011| temp: 1.99786 | loss: 1.14188| constrast_loss: 4.49201| div_loss: 0.75499| %_mask_idx: 0.35511| ppl: 156.80565| %_neg_is_pos: 0.06352| lr: 0.00011| temp: 1.99785 | loss: 1.11798| constrast_loss: 4.39354| div_loss: 0.78376| %_mask_idx: 0.36482| ppl: 138.39522| %_neg_is_pos: 0.0826| lr: 0.00011| temp: 1.99785 | loss: 1.12562| constrast_loss: 4.42591| div_loss: 0.76563| %_mask_idx: 0.4104| ppl: 149.99741| %_neg_is_pos: 0.07454| lr: 0.00011| temp: 1.99783 | loss: 1.11469| constrast_loss: 4.38154| div_loss: 0.77224| %_mask_idx: 0.37124| ppl: 145.7644| %_neg_is_pos: 0.09458| lr: 0.00011| temp: 1.99783 | loss: 1.14746| constrast_loss: 4.51697| div_loss: 0.72876| %_mask_idx: 0.44126| ppl: 173.59299| %_neg_is_pos: 0.04321| lr: 0.00011| temp: 1.99782 | loss: 1.11553| constrast_loss: 4.38554| div_loss: 0.76563| %_mask_idx: 0.37798| ppl: 149.99593| %_neg_is_pos: 0.07828| lr: 0.00011| temp: 1.99782 | loss: 1.11264| constrast_loss: 4.37281| div_loss: 0.77756| %_mask_idx: 0.38581| ppl: 142.36279| %_neg_is_pos: 0.0699| lr: 0.00011| temp: 1.99781 | loss: 1.1108| constrast_loss: 4.36583| div_loss: 0.77371| %_mask_idx: 0.36404| ppl: 144.82599| %_neg_is_pos: 0.07873| lr: 0.00011| temp: 1.99781 | loss: 1.11689| constrast_loss: 4.38914| div_loss: 0.78438| %_mask_idx: 0.39646| ppl: 137.99796| %_neg_is_pos: 0.07116| lr: 0.00011| temp: 1.9978 | loss: 1.10105| constrast_loss: 4.32547| div_loss: 0.78743| %_mask_idx: 0.36826| ppl: 136.04739| %_neg_is_pos: 0.1153| lr: 0.00011| temp: 1.9978 | loss: 1.10047| constrast_loss: 4.321| div_loss: 0.80881| %_mask_idx: 0.33678| ppl: 122.35875| %_neg_is_pos: 0.09488| lr: 0.00011| temp: 1.99778 | loss: 1.12436| constrast_loss: 4.42298| div_loss: 0.74454| %_mask_idx: 0.3667| ppl: 163.49292| %_neg_is_pos: 0.05146| lr: 0.00011| temp: 1.99778 | loss: 1.11607| constrast_loss: 4.39043| div_loss: 0.73859| %_mask_idx: 0.39975| ppl: 167.30362| %_neg_is_pos: 0.08052| lr: 0.00011| temp: 1.99777 | loss: 1.1086| constrast_loss: 4.35812| div_loss: 0.76292| %_mask_idx: 0.42121| ppl: 151.73407| %_neg_is_pos: 0.06599| lr: 0.00011| temp: 1.99777 | loss: 1.12706| constrast_loss: 4.43391| div_loss: 0.74315| %_mask_idx: 0.38706| ppl: 164.38303| %_neg_is_pos: 0.06849| lr: 0.00011| temp: 1.99776 | loss: 1.11753| constrast_loss: 4.39328| div_loss: 0.76819| %_mask_idx: 0.4115| ppl: 148.35527| %_neg_is_pos: 0.05005| lr: 0.00011| temp: 1.99776 | loss: 1.12551| constrast_loss: 4.42747| div_loss: 0.74584| %_mask_idx: 0.36153| ppl: 162.66498| %_neg_is_pos: 0.08518| lr: 0.00011| temp: 1.99775 | loss: 1.13823| constrast_loss: 4.47995| div_loss: 0.72958| %_mask_idx: 0.4115| ppl: 173.07147| %_neg_is_pos: 0.04688| lr: 0.00011| temp: 1.99775 | loss: 1.10436| constrast_loss: 4.33984| div_loss: 0.77587| %_mask_idx: 0.36717| ppl: 143.4411| %_neg_is_pos: 0.08042| lr: 0.00011| temp: 1.99773 | loss: 1.11271| constrast_loss: 4.37711| div_loss: 0.73713| %_mask_idx: 0.38001| ppl: 168.23509| %_neg_is_pos: 0.07216| lr: 0.00011| temp: 1.99773 | loss: 1.10698| constrast_loss: 4.35002| div_loss: 0.77911| %_mask_idx: 0.3584| ppl: 141.36758| %_neg_is_pos: 0.08664| lr: 0.00011| temp: 1.99772 | loss: 1.13944| constrast_loss: 4.48457| div_loss: 0.73197| %_mask_idx: 0.414| ppl: 171.54196| %_neg_is_pos: 0.06605| lr: 0.00011| temp: 1.99772 | loss: 1.09764| constrast_loss: 4.31245| div_loss: 0.78097| %_mask_idx: 0.37688| ppl: 140.17871| %_neg_is_pos: 0.10725| lr: 0.00011| temp: 1.99771 | loss: 1.13353| constrast_loss: 4.45906| div_loss: 0.75057| %_mask_idx: 0.36263| ppl: 159.63339| %_neg_is_pos: 0.0669| lr: 0.00011| temp: 1.99771 | loss: 1.11041| constrast_loss: 4.36374| div_loss: 0.7788| %_mask_idx: 0.44768| ppl: 141.56746| %_neg_is_pos: 0.0476| lr: 0.00012| temp: 1.9977 | loss: 1.12673| constrast_loss: 4.43275| div_loss: 0.74175| %_mask_idx: 0.4198| ppl: 165.282| %_neg_is_pos: 0.05683| lr: 0.00012| temp: 1.9977 | loss: 1.13111| constrast_loss: 4.44915| div_loss: 0.75311| %_mask_idx: 0.41573| ppl: 158.0069| %_neg_is_pos: 0.05212| lr: 0.00012| temp: 1.99768 | loss: 1.12033| constrast_loss: 4.40641| div_loss: 0.74904| %_mask_idx: 0.39991| ppl: 160.61467| %_neg_is_pos: 0.06676| lr: 0.00012| temp: 1.99768 | loss: 1.11529| constrast_loss: 4.3854| div_loss: 0.75739| %_mask_idx: 0.43875| ppl: 155.2672| %_neg_is_pos: 0.04562| lr: 0.00012| temp: 1.99767 | loss: 1.1365| constrast_loss: 4.47433| div_loss: 0.71656| %_mask_idx: 0.3916| ppl: 181.40271| %_neg_is_pos: 0.05814| lr: 0.00012| temp: 1.99767 | loss: 1.11503| constrast_loss: 4.38328| div_loss: 0.76842| %_mask_idx: 0.41165| ppl: 148.21365| %_neg_is_pos: 0.0662| lr: 0.00012| temp: 1.99765 | loss: 1.11048| constrast_loss: 4.3662| div_loss: 0.75734| %_mask_idx: 0.40476| ppl: 155.30435| %_neg_is_pos: 0.07359| lr: 0.00012| temp: 1.99765 | loss: 1.10248| constrast_loss: 4.33235| div_loss: 0.77558| %_mask_idx: 0.38847| ppl: 143.63199| %_neg_is_pos: 0.0719| lr: 0.00012| temp: 1.99764 | loss: 1.06554| constrast_loss: 4.18035| div_loss: 0.81792| %_mask_idx: 0.38549| ppl: 116.53223| %_neg_is_pos: 0.09263| lr: 0.00012| temp: 1.99764 | loss: 1.09585| constrast_loss: 4.30572| div_loss: 0.77667| %_mask_idx: 0.39552| ppl: 142.93399| %_neg_is_pos: 0.08199| lr: 0.00012| temp: 1.99763 | loss: 1.10908| constrast_loss: 4.35787| div_loss: 0.78438| %_mask_idx: 0.42262| ppl: 137.99774| %_neg_is_pos: 0.06368| lr: 0.00012| temp: 1.99763 | loss: 1.11465| constrast_loss: 4.38216| div_loss: 0.76419| %_mask_idx: 0.44173| ppl: 150.91608| %_neg_is_pos: 0.06359| lr: 0.00012| temp: 1.99762 | loss: 1.11297| constrast_loss: 4.37465| div_loss: 0.77208| %_mask_idx: 0.39897| ppl: 145.86673| %_neg_is_pos: 0.05651| lr: 0.00012| temp: 1.99762 | loss: 1.12454| constrast_loss: 4.42235| div_loss: 0.75803| %_mask_idx: 0.41651| ppl: 154.86337| %_neg_is_pos: 0.0714| lr: 0.00012| temp: 1.9976 | loss: 1.12653| constrast_loss: 4.43181| div_loss: 0.7432| %_mask_idx: 0.38221| ppl: 164.34952| %_neg_is_pos: 0.05077| lr: 0.00012| temp: 1.9976 | loss: 1.10898| constrast_loss: 4.35997| div_loss: 0.75971| %_mask_idx: 0.37625| ppl: 153.78514| %_neg_is_pos: 0.07879| lr: 0.00012| temp: 1.99759 | loss: 1.10704| constrast_loss: 4.34913| div_loss: 0.79041| %_mask_idx: 0.37108| ppl: 134.13696| %_neg_is_pos: 0.09489| lr: 0.00012| temp: 1.99759 | loss: 1.10213| constrast_loss: 4.33003| div_loss: 0.78498| %_mask_idx: 0.40273| ppl: 137.61246| %_neg_is_pos: 0.06907| lr: 0.00012| temp: 1.99758 | loss: 1.1186| constrast_loss: 4.39874| div_loss: 0.7568| %_mask_idx: 0.40523| ppl: 155.64844| %_neg_is_pos: 0.0626| lr: 0.00012| temp: 1.99758 | loss: 1.12218| constrast_loss: 4.41116| div_loss: 0.7757| %_mask_idx: 0.30686| ppl: 143.55157| %_neg_is_pos: 0.1127| lr: 0.00012| temp: 1.99757 | loss: 1.11877| constrast_loss: 4.3965| div_loss: 0.78568| %_mask_idx: 0.41103| ppl: 137.16403| %_neg_is_pos: 0.06714| lr: 0.00012| temp: 1.99757 | loss: 1.12323| constrast_loss: 4.41763| div_loss: 0.75305| %_mask_idx: 0.41949| ppl: 158.04559| %_neg_is_pos: 0.06047| lr: 0.00012| temp: 1.99755 | loss: 1.10712| constrast_loss: 4.35308| div_loss: 0.75384| %_mask_idx: 0.39505| ppl: 157.54428| %_neg_is_pos: 0.07| lr: 0.00012| temp: 1.99755 | loss: 1.10124| constrast_loss: 4.32705| div_loss: 0.77921| %_mask_idx: 0.35887| ppl: 141.30423| %_neg_is_pos: 0.10232| lr: 0.00012| temp: 1.99754 | loss: 1.12391| constrast_loss: 4.42275| div_loss: 0.72912| %_mask_idx: 0.45959| ppl: 173.36475| %_neg_is_pos: 0.03765| lr: 0.00012| temp: 1.99754 | loss: 1.11472| constrast_loss: 4.38256| div_loss: 0.7631| %_mask_idx: 0.35573| ppl: 151.61848| %_neg_is_pos: 0.1078| lr: 0.00012| temp: 1.99753 | loss: 1.12071| constrast_loss: 4.40393| div_loss: 0.78917| %_mask_idx: 0.40116| ppl: 134.93436| %_neg_is_pos: 0.08257| lr: 0.00012| temp: 1.99753 | loss: 1.1184| constrast_loss: 4.39817| div_loss: 0.75445| %_mask_idx: 0.40382| ppl: 157.15146| %_neg_is_pos: 0.05619| lr: 0.00012| temp: 1.99752 | loss: 1.12354| constrast_loss: 4.41828| div_loss: 0.75899| %_mask_idx: 0.38377| ppl: 154.24773| %_neg_is_pos: 0.06506| lr: 0.00012| temp: 1.99752 | loss: 1.10817| constrast_loss: 4.3556| div_loss: 0.7707| %_mask_idx: 0.3631| ppl: 146.75089| %_neg_is_pos: 0.09014| lr: 0.00013| temp: 1.9975 | loss: 1.1062| constrast_loss: 4.34852| div_loss: 0.76294| %_mask_idx: 0.37672| ppl: 151.71689| %_neg_is_pos: 0.05794| lr: 0.00013| temp: 1.9975 | loss: 1.115| constrast_loss: 4.38338| div_loss: 0.76635| %_mask_idx: 0.36889| ppl: 149.53766| %_neg_is_pos: 0.07406| lr: 0.00013| temp: 1.99749 | loss: 1.14355| constrast_loss: 4.49821| div_loss: 0.75975| %_mask_idx: 0.41714| ppl: 153.76163| %_neg_is_pos: 0.05734| lr: 0.00013| temp: 1.99749 | loss: 1.10607| constrast_loss: 4.34358| div_loss: 0.80703| %_mask_idx: 0.38252| ppl: 123.50238| %_neg_is_pos: 0.08518| lr: 0.00013| temp: 1.99747 | loss: 1.10271| constrast_loss: 4.33088| div_loss: 0.79945| %_mask_idx: 0.37547| ppl: 128.35489| %_neg_is_pos: 0.07182| lr: 0.00013| temp: 1.99747 | loss: 1.13358| constrast_loss: 4.46146| div_loss: 0.72871| %_mask_idx: 0.45191| ppl: 173.62582| %_neg_is_pos: 0.0376| lr: 0.00013| temp: 1.99746 | loss: 1.13285| constrast_loss: 4.45678| div_loss: 0.74629| %_mask_idx: 0.38659| ppl: 162.37582| %_neg_is_pos: 0.0345| lr: 0.00013| temp: 1.99746 | loss: 1.11253| constrast_loss: 4.37348| div_loss: 0.76655| %_mask_idx: 0.38957| ppl: 149.4086| %_neg_is_pos: 0.0942| lr: 0.00013| temp: 1.99745 | loss: 1.12422| constrast_loss: 4.42259| div_loss: 0.74301| %_mask_idx: 0.41855| ppl: 164.47128| %_neg_is_pos: 0.07128| lr: 0.00013| temp: 1.99745 | loss: 1.10628| constrast_loss: 4.34631| div_loss: 0.78806| %_mask_idx: 0.36717| ppl: 135.64383| %_neg_is_pos: 0.08287| lr: 0.00013| temp: 1.99744 | loss: 1.11606| constrast_loss: 4.38852| div_loss: 0.75715| %_mask_idx: 0.40789| ppl: 155.42618| %_neg_is_pos: 0.06401| lr: 0.00013| temp: 1.99744 | loss: 1.11434| constrast_loss: 4.38109| div_loss: 0.76285| %_mask_idx: 0.42622| ppl: 151.77496| %_neg_is_pos: 0.05873| lr: 0.00013| temp: 1.99742 | loss: 1.13406| constrast_loss: 4.46283| div_loss: 0.73416| %_mask_idx: 0.41996| ppl: 170.13504| %_neg_is_pos: 0.0442| lr: 0.00013| temp: 1.99742 | loss: 1.10087| constrast_loss: 4.32685| div_loss: 0.76621| %_mask_idx: 0.40962| ppl: 149.62796| %_neg_is_pos: 0.07921| lr: 0.00013| temp: 1.99741 | loss: 1.1051| constrast_loss: 4.34301| div_loss: 0.77393| %_mask_idx: 0.42669| ppl: 144.68707| %_neg_is_pos: 0.06392| lr: 0.00013| temp: 1.99741 | loss: 1.11406| constrast_loss: 4.37876| div_loss: 0.775| %_mask_idx: 0.30921| ppl: 144.00104| %_neg_is_pos: 0.09457| lr: 0.00013| temp: 1.9974 | loss: 1.12298| constrast_loss: 4.41492| div_loss: 0.76999| %_mask_idx: 0.3963| ppl: 147.20343| %_neg_is_pos: 0.07531| lr: 0.00013| temp: 1.9974 | loss: 1.12739| constrast_loss: 4.43495| div_loss: 0.74618| %_mask_idx: 0.35307| ppl: 162.44724| %_neg_is_pos: 0.06951| lr: 0.00013| temp: 1.99739 | loss: 1.10647| constrast_loss: 4.34886| div_loss: 0.77015| %_mask_idx: 0.44126| ppl: 147.10329| %_neg_is_pos: 0.07419| lr: 0.00013| temp: 1.99739 | loss: 1.09987| constrast_loss: 4.31979| div_loss: 0.79705| %_mask_idx: 0.36842| ppl: 129.8875| %_neg_is_pos: 0.08225| lr: 0.00013| temp: 1.99737 | loss: 1.10854| constrast_loss: 4.35544| div_loss: 0.7871| %_mask_idx: 0.35855| ppl: 136.25665| %_neg_is_pos: 0.08235| lr: 0.00013| temp: 1.99737 | loss: 1.11858| constrast_loss: 4.39856| div_loss: 0.75746| %_mask_idx: 0.39035| ppl: 155.2233| %_neg_is_pos: 0.07195| lr: 0.00013| temp: 1.99736 | loss: 1.11187| constrast_loss: 4.37143| div_loss: 0.7605| %_mask_idx: 0.47118| ppl: 153.28131| %_neg_is_pos: 0.0384| lr: 0.00013| temp: 1.99736 | loss: 1.10029| constrast_loss: 4.32178| div_loss: 0.79381| %_mask_idx: 0.33631| ppl: 131.95943| %_neg_is_pos: 0.11085| lr: 0.00013| temp: 1.99735 | loss: 1.11696| constrast_loss: 4.39374| div_loss: 0.74089| %_mask_idx: 0.36278| ppl: 165.82806| %_neg_is_pos: 0.06638| lr: 0.00013| temp: 1.99735 | loss: 1.10788| constrast_loss: 4.35385| div_loss: 0.77693| %_mask_idx: 0.3526| ppl: 142.76488| %_neg_is_pos: 0.08433| lr: 0.00013| temp: 1.99734 | loss: 1.10431| constrast_loss: 4.33931| div_loss: 0.77921| %_mask_idx: 0.34821| ppl: 141.306| %_neg_is_pos: 0.09541| lr: 0.00013| temp: 1.99734 | loss: 1.12018| constrast_loss: 4.40293| div_loss: 0.77798| %_mask_idx: 0.34117| ppl: 142.09583| %_neg_is_pos: 0.09246| lr: 0.00013| temp: 1.99732 | loss: 1.12498| constrast_loss: 4.4266| div_loss: 0.73327| %_mask_idx: 0.44204| ppl: 170.70602| %_neg_is_pos: 0.04942| lr: 0.00013| temp: 1.99732 | loss: 1.11926| constrast_loss: 4.39938| div_loss: 0.77646| %_mask_idx: 0.35746| ppl: 143.06287| %_neg_is_pos: 0.07134| lr: 0.00013| temp: 1.99731 | loss: 1.09078| constrast_loss: 4.28221| div_loss: 0.80914| %_mask_idx: 0.36905| ppl: 122.15326| %_neg_is_pos: 0.10924| lr: 0.00013| temp: 1.99731 [2021-09-01 16:28:46,753] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 2097152.0, reducing to 1048576.0 [2021-09-01 16:28:46,753] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 2097152.0, reducing to 1048576.0 | loss: 1.12393| constrast_loss: 4.41994| div_loss: 0.75763| %_mask_idx: 0.42137| ppl: 155.11475| %_neg_is_pos: 0.06124| lr: 0.00014| temp: 1.99729 | loss: 1.09573| constrast_loss: 4.30351| div_loss: 0.79408| %_mask_idx: 0.35605| ppl: 131.79071| %_neg_is_pos: 0.08832| lr: 0.00014| temp: 1.99729 | loss: 1.10715| constrast_loss: 4.35271| div_loss: 0.7591| %_mask_idx: 0.32393| ppl: 154.17517| %_neg_is_pos: 0.09165| lr: 0.00014| temp: 1.99728 | loss: 1.13732| constrast_loss: 4.47276| div_loss: 0.76505| %_mask_idx: 0.3985| ppl: 150.3654| %_neg_is_pos: 0.04729| lr: 0.00014| temp: 1.99728 | loss: 1.12003| constrast_loss: 4.40199| div_loss: 0.78114| %_mask_idx: 0.31657| ppl: 140.06979| %_neg_is_pos: 0.06079| lr: 0.00014| temp: 1.99727 | loss: 1.11516| constrast_loss: 4.38669| div_loss: 0.7393| %_mask_idx: 0.40273| ppl: 166.84778| %_neg_is_pos: 0.02906| lr: 0.00014| temp: 1.99727 | loss: 1.12257| constrast_loss: 4.4111| div_loss: 0.79177| %_mask_idx: 0.36826| ppl: 133.26924| %_neg_is_pos: 0.07568| lr: 0.00014| temp: 1.99726 | loss: 1.13532| constrast_loss: 4.46456| div_loss: 0.76724| %_mask_idx: 0.39192| ppl: 148.96428| %_neg_is_pos: 0.0392| lr: 0.00014| temp: 1.99726 | loss: 1.13198| constrast_loss: 4.44925| div_loss: 0.78674| %_mask_idx: 0.34352| ppl: 136.48759| %_neg_is_pos: 0.05107| lr: 0.00014| temp: 1.99724| loss: 1.13436| constrast_loss: 4.45894| div_loss: 0.78504| %_mask_idx: 0.37265| ppl: 137.57158| %_neg_is_pos: 0.04431| lr: 0.00014| temp: 1.99724 | loss: 1.16266| constrast_loss: 4.57512| div_loss: 0.75522| %_mask_idx: 0.3891| ppl: 156.65797| %_neg_is_pos: 0.01879| lr: 0.00014| temp: 1.99723 | loss: 1.15226| constrast_loss: 4.53187| div_loss: 0.77168| %_mask_idx: 0.39035| ppl: 146.12665| %_neg_is_pos: 0.03528| lr: 0.00014| temp: 1.99723 | loss: 1.129| constrast_loss: 4.43297| div_loss: 0.83011| %_mask_idx: 0.38409| ppl: 108.72798| %_neg_is_pos: 0.09694| lr: 0.00014| temp: 1.99722 | loss: 1.1135| constrast_loss: 4.37148| div_loss: 0.82521| %_mask_idx: 0.36122| ppl: 111.86351| %_neg_is_pos: 0.10815| lr: 0.00014| temp: 1.99722 | loss: 1.09207| constrast_loss: 4.28479| div_loss: 0.83503| %_mask_idx: 0.38158| ppl: 105.58214| %_neg_is_pos: 0.06353| lr: 0.00014| temp: 1.99721 | loss: 1.12578| constrast_loss: 4.42219| div_loss: 0.80922| %_mask_idx: 0.42967| ppl: 122.10224| %_neg_is_pos: 0.05315| lr: 0.00014| temp: 1.99721 | loss: 1.08516| constrast_loss: 4.25634| div_loss: 0.84303| %_mask_idx: 0.32393| ppl: 100.45955| %_neg_is_pos: 0.12471| lr: 0.00014| temp: 1.99719 | loss: 1.07879| constrast_loss: 4.22916| div_loss: 0.85994| %_mask_idx: 0.32472| ppl: 89.64047| %_neg_is_pos: 0.12329| lr: 0.00014| temp: 1.99719 | loss: 1.13289| constrast_loss: 4.44811| div_loss: 0.83465| %_mask_idx: 0.38001| ppl: 105.82196| %_neg_is_pos: 0.04457| lr: 0.00014| temp: 1.99718 | loss: 1.14928| constrast_loss: 4.51402| div_loss: 0.83077| %_mask_idx: 0.39787| ppl: 108.30577| %_neg_is_pos: 0.0485| lr: 0.00014| temp: 1.99718 | loss: 1.12812| constrast_loss: 4.42803| div_loss: 0.84451| %_mask_idx: 0.4151| ppl: 99.51559| %_neg_is_pos: 0.06622| lr: 0.00014| temp: 1.99717 | loss: 1.12795| constrast_loss: 4.42938| div_loss: 0.82422| %_mask_idx: 0.38503| ppl: 112.49823| %_neg_is_pos: 0.044| lr: 0.00014| temp: 1.99717 | loss: 1.1069| constrast_loss: 4.34269| div_loss: 0.84888| %_mask_idx: 0.41228| ppl: 96.71498| %_neg_is_pos: 0.06292| lr: 0.00014| temp: 1.99716 | loss: 1.11291| constrast_loss: 4.36717| div_loss: 0.84461| %_mask_idx: 0.38424| ppl: 99.4487| %_neg_is_pos: 0.10847| lr: 0.00014| temp: 1.99716 | loss: 1.14932| constrast_loss: 4.51171| div_loss: 0.85584| %_mask_idx: 0.38643| ppl: 92.261| %_neg_is_pos: 0.06629| lr: 0.00014| temp: 1.99714 | loss: 1.15379| constrast_loss: 4.53117| div_loss: 0.84009| %_mask_idx: 0.3844| ppl: 102.34495| %_neg_is_pos: 0.03971| lr: 0.00014| temp: 1.99714 | loss: 1.12053| constrast_loss: 4.39145| div_loss: 0.90653| %_mask_idx: 0.43515| ppl: 59.82139| %_neg_is_pos: 0.0761| lr: 0.00014| temp: 1.99713 | loss: 1.0961| constrast_loss: 4.29365| div_loss: 0.90751| %_mask_idx: 0.3891| ppl: 59.19184| %_neg_is_pos: 0.09116| lr: 0.00014| temp: 1.99713 | loss: 1.10434| constrast_loss: 4.32531| div_loss: 0.92045| %_mask_idx: 0.42654| ppl: 50.91485| %_neg_is_pos: 0.06435| lr: 0.00014| temp: 1.99711 | loss: 1.12651| constrast_loss: 4.41438| div_loss: 0.91664| %_mask_idx: 0.41447| ppl: 53.34792| %_neg_is_pos: 0.05721| lr: 0.00014| temp: 1.99711 | loss: 1.10086| constrast_loss: 4.31094| div_loss: 0.92513| %_mask_idx: 0.38972| ppl: 47.91412| %_neg_is_pos: 0.07578| lr: 0.00014| temp: 1.9971 | loss: 1.09088| constrast_loss: 4.2709| div_loss: 0.92621| %_mask_idx: 0.37813| ppl: 47.22728| %_neg_is_pos: 0.09972| lr: 0.00014| temp: 1.9971 | loss: 1.11839| constrast_loss: 4.38134| div_loss: 0.92212| %_mask_idx: 0.40241| ppl: 49.84364| %_neg_is_pos: 0.06661| lr: 0.00015| temp: 1.99709 | loss: 1.1052| constrast_loss: 4.32857| div_loss: 0.92222| %_mask_idx: 0.35902| ppl: 49.779| %_neg_is_pos: 0.08221| lr: 0.00015| temp: 1.99709 | loss: 1.0767| constrast_loss: 4.21375| div_loss: 0.93036| %_mask_idx: 0.3739| ppl: 44.57189| %_neg_is_pos: 0.09234| lr: 0.00015| temp: 1.99708 | loss: 1.13152| constrast_loss: 4.43411| div_loss: 0.91967| %_mask_idx: 0.3797| ppl: 51.41335| %_neg_is_pos: 0.06578| lr: 0.00015| temp: 1.99708 | loss: 1.11222| constrast_loss: 4.35657| div_loss: 0.92314| %_mask_idx: 0.38769| ppl: 49.19356| %_neg_is_pos: 0.07897| lr: 0.00015| temp: 1.99706 | loss: 1.07919| constrast_loss: 4.22421| div_loss: 0.92535| %_mask_idx: 0.36043| ppl: 47.77638| %_neg_is_pos: 0.09784| lr: 0.00015| temp: 1.99706 | loss: 1.13479| constrast_loss: 4.4476| div_loss: 0.91559| %_mask_idx: 0.39662| ppl: 54.0202| %_neg_is_pos: 0.06464| lr: 0.00015| temp: 1.99705 | loss: 1.12508| constrast_loss: 4.40793| div_loss: 0.92375| %_mask_idx: 0.42215| ppl: 48.79961| %_neg_is_pos: 0.05291| lr: 0.00015| temp: 1.99705 | loss: 1.0932| constrast_loss: 4.2803| div_loss: 0.92511| %_mask_idx: 0.36983| ppl: 47.92791| %_neg_is_pos: 0.10813| lr: 0.00015| temp: 1.99704 | loss: 1.11301| constrast_loss: 4.36036| div_loss: 0.91696| %_mask_idx: 0.37939| ppl: 53.14291| %_neg_is_pos: 0.06879| lr: 0.00015| temp: 1.99704 | loss: 1.11394| constrast_loss: 4.36315| div_loss: 0.92596| %_mask_idx: 0.38158| ppl: 47.38801| %_neg_is_pos: 0.06855| lr: 0.00015| temp: 1.99703 | loss: 1.11488| constrast_loss: 4.36728| div_loss: 0.92226| %_mask_idx: 0.38362| ppl: 49.75201| %_neg_is_pos: 0.08522| lr: 0.00015| temp: 1.99703 | loss: 1.12159| constrast_loss: 4.39439| div_loss: 0.91964| %_mask_idx: 0.38424| ppl: 51.42899| %_neg_is_pos: 0.0661| lr: 0.00015| temp: 1.99701 | loss: 1.10419| constrast_loss: 4.32503| div_loss: 0.91728| %_mask_idx: 0.37312| ppl: 52.94353| %_neg_is_pos: 0.08018| lr: 0.00015| temp: 1.99701 | loss: 1.11279| constrast_loss: 4.35892| div_loss: 0.92223| %_mask_idx: 0.38236| ppl: 49.77504| %_neg_is_pos: 0.07585| lr: 0.00015| temp: 1.997 | loss: 1.11917| constrast_loss: 4.38442| div_loss: 0.92261| %_mask_idx: 0.41823| ppl: 49.52704| %_neg_is_pos: 0.06742| lr: 0.00015| temp: 1.997 | loss: 1.13147| constrast_loss: 4.43406| div_loss: 0.918| %_mask_idx: 0.39928| ppl: 52.47727| %_neg_is_pos: 0.0736| lr: 0.00015| temp: 1.99699 | loss: 1.11418| constrast_loss: 4.36411| div_loss: 0.92588| %_mask_idx: 0.43499| ppl: 47.43672| %_neg_is_pos: 0.05628| lr: 0.00015| temp: 1.99699 | loss: 1.11447| constrast_loss: 4.3658| div_loss: 0.92074| %_mask_idx: 0.35746| ppl: 50.72571| %_neg_is_pos: 0.09199| lr: 0.00015| temp: 1.99698 | loss: 1.12042| constrast_loss: 4.38971| div_loss: 0.91961| %_mask_idx: 0.39317| ppl: 51.45167| %_neg_is_pos: 0.05531| lr: 0.00015| temp: 1.99698 | loss: 1.15145| constrast_loss: 4.51435| div_loss: 0.91443| %_mask_idx: 0.42654| ppl: 54.76437| %_neg_is_pos: 0.04871| lr: 0.00015| temp: 1.99696 | loss: 1.10893| constrast_loss: 4.34328| div_loss: 0.9245| %_mask_idx: 0.40648| ppl: 48.32187| %_neg_is_pos: 0.0749| lr: 0.00015| temp: 1.99696 | loss: 1.14996| constrast_loss: 4.50795| div_loss: 0.91904| %_mask_idx: 0.45222| ppl: 51.81433| %_neg_is_pos: 0.04799| lr: 0.00015| temp: 1.99695 | loss: 1.09323| constrast_loss: 4.2804| div_loss: 0.9251| %_mask_idx: 0.39646| ppl: 47.93449| %_neg_is_pos: 0.07976| lr: 0.00015| temp: 1.99695 | loss: 1.10783| constrast_loss: 4.33898| div_loss: 0.9234| %_mask_idx: 0.34508| ppl: 49.02687| %_neg_is_pos: 0.08163| lr: 0.00015| temp: 1.99693 | loss: 1.10366| constrast_loss: 4.32215| div_loss: 0.92488| %_mask_idx: 0.38158| ppl: 48.07784| %_neg_is_pos: 0.0778| lr: 0.00015| temp: 1.99693 | loss: 1.10599| constrast_loss: 4.33133| div_loss: 0.92627| %_mask_idx: 0.38127| ppl: 47.18489| %_neg_is_pos: 0.06607| lr: 0.00015| temp: 1.99692 | loss: 1.10529| constrast_loss: 4.32851| div_loss: 0.92651| %_mask_idx: 0.39035| ppl: 47.03484| %_neg_is_pos: 0.07059| lr: 0.00015| temp: 1.99692 | loss: 1.11412| constrast_loss: 4.36411| div_loss: 0.92375| %_mask_idx: 0.38268| ppl: 48.79993| %_neg_is_pos: 0.07664| lr: 0.00015| temp: 1.99691 | loss: 1.10768| constrast_loss: 4.33838| div_loss: 0.92355| %_mask_idx: 0.38346| ppl: 48.92779| %_neg_is_pos: 0.08217| lr: 0.00015| temp: 1.99691 | loss: 1.13916| constrast_loss: 4.46455| div_loss: 0.92094| %_mask_idx: 0.37093| ppl: 50.59919| %_neg_is_pos: 0.06775| lr: 0.00016| temp: 1.9969 | loss: 1.1011| constrast_loss: 4.31177| div_loss: 0.92643| %_mask_idx: 0.40711| ppl: 47.0857| %_neg_is_pos: 0.07945| lr: 0.00016| temp: 1.9969 | loss: 1.09225| constrast_loss: 4.27606| div_loss: 0.92945| %_mask_idx: 0.37234| ppl: 45.15479| %_neg_is_pos: 0.08955| lr: 0.00016| temp: 1.99688 | loss: 1.11288| constrast_loss: 4.35862| div_loss: 0.9289| %_mask_idx: 0.41792| ppl: 45.50698| %_neg_is_pos: 0.06923| lr: 0.00016| temp: 1.99688 | loss: 1.14698| constrast_loss: 4.4963| div_loss: 0.91616| %_mask_idx: 0.40382| ppl: 53.65982| %_neg_is_pos: 0.05932| lr: 0.00016| temp: 1.99687 | loss: 1.10483| constrast_loss: 4.32718| div_loss: 0.9216| %_mask_idx: 0.37751| ppl: 50.17696| %_neg_is_pos: 0.08301| lr: 0.00016| temp: 1.99687 | loss: 1.0929| constrast_loss: 4.27924| div_loss: 0.92361| %_mask_idx: 0.37829| ppl: 48.88981| %_neg_is_pos: 0.079| lr: 0.00016| temp: 1.99686 | loss: 1.12046| constrast_loss: 4.38924| div_loss: 0.92601| %_mask_idx: 0.38268| ppl: 47.35151| %_neg_is_pos: 0.06935| lr: 0.00016| temp: 1.99686 | loss: 1.11053| constrast_loss: 4.34973| div_loss: 0.92398| %_mask_idx: 0.3985| ppl: 48.65358| %_neg_is_pos: 0.07085| lr: 0.00016| temp: 1.99685 | loss: 1.08299| constrast_loss: 4.23924| div_loss: 0.92736| %_mask_idx: 0.35981| ppl: 46.49268| %_neg_is_pos: 0.10641| lr: 0.00016| temp: 1.99685 | loss: 1.12399| constrast_loss: 4.40356| div_loss: 0.92389| %_mask_idx: 0.42278| ppl: 48.70838| %_neg_is_pos: 0.06005| lr: 0.00016| temp: 1.99683 | loss: 1.09386| constrast_loss: 4.28326| div_loss: 0.92197| %_mask_idx: 0.37359| ppl: 49.94233| %_neg_is_pos: 0.08892| lr: 0.00016| temp: 1.99683 | loss: 1.10442| constrast_loss: 4.32545| div_loss: 0.92248| %_mask_idx: 0.37594| ppl: 49.61222| %_neg_is_pos: 0.07935| lr: 0.00016| temp: 1.99682 | loss: 1.14256| constrast_loss: 4.47893| div_loss: 0.91322| %_mask_idx: 0.36638| ppl: 55.53928| %_neg_is_pos: 0.05364| lr: 0.00016| temp: 1.99682 | loss: 1.09601| constrast_loss: 4.29148| div_loss: 0.9258| %_mask_idx: 0.36482| ppl: 47.48847| %_neg_is_pos: 0.09898| lr: 0.00016| temp: 1.99681 | loss: 1.10546| constrast_loss: 4.32911| div_loss: 0.92716| %_mask_idx: 0.38894| ppl: 46.61543| %_neg_is_pos: 0.06572| lr: 0.00016| temp: 1.99681 | loss: 1.08831| constrast_loss: 4.26096| div_loss: 0.92269| %_mask_idx: 0.3443| ppl: 49.47723| %_neg_is_pos: 0.11879| lr: 0.00016| temp: 1.9968 | loss: 1.13432| constrast_loss: 4.44608| div_loss: 0.91204| %_mask_idx: 0.44126| ppl: 56.29337| %_neg_is_pos: 0.05474| lr: 0.00016| temp: 1.9968 | loss: 1.10955| constrast_loss: 4.34618| div_loss: 0.92032| %_mask_idx: 0.39051| ppl: 50.99635| %_neg_is_pos: 0.11168| lr: 0.00016| temp: 1.99678 | loss: 1.10297| constrast_loss: 4.31912| div_loss: 0.92773| %_mask_idx: 0.39239| ppl: 46.25134| %_neg_is_pos: 0.0735| lr: 0.00016| temp: 1.99678 | loss: 1.07082| constrast_loss: 4.19002| div_loss: 0.93279| %_mask_idx: 0.37046| ppl: 43.01296| %_neg_is_pos: 0.10167| lr: 0.00016| temp: 1.99677 | loss: 1.09558| constrast_loss: 4.28946| div_loss: 0.92866| %_mask_idx: 0.42011| ppl: 45.65609| %_neg_is_pos: 0.07312| lr: 0.00016| temp: 1.99677 | loss: 1.06655| constrast_loss: 4.17304| div_loss: 0.93169| %_mask_idx: 0.36905| ppl: 43.72092| %_neg_is_pos: 0.10277| lr: 0.00016| temp: 1.99675 | loss: 1.12676| constrast_loss: 4.41534| div_loss: 0.91691| %_mask_idx: 0.39865| ppl: 53.17482| %_neg_is_pos: 0.05585| lr: 0.00016| temp: 1.99675 | loss: 1.09538| constrast_loss: 4.2888| div_loss: 0.92712| %_mask_idx: 0.36717| ppl: 46.64479| %_neg_is_pos: 0.08961| lr: 0.00016| temp: 1.99674 | loss: 1.10542| constrast_loss: 4.32911| div_loss: 0.92551| %_mask_idx: 0.39286| ppl: 47.6725| %_neg_is_pos: 0.07807| lr: 0.00016| temp: 1.99674 | loss: 1.11219| constrast_loss: 4.35667| div_loss: 0.92106| %_mask_idx: 0.4162| ppl: 50.52301| %_neg_is_pos: 0.05223| lr: 0.00016| temp: 1.99673 | loss: 1.10148| constrast_loss: 4.31341| div_loss: 0.92521| %_mask_idx: 0.42434| ppl: 47.86622| %_neg_is_pos: 0.07208| lr: 0.00016| temp: 1.99673 | loss: 1.0725| constrast_loss: 4.19706| div_loss: 0.92963| %_mask_idx: 0.39348| ppl: 45.03415| %_neg_is_pos: 0.10018| lr: 0.00016| temp: 1.99672 | loss: 1.12764| constrast_loss: 4.41889| div_loss: 0.91675| %_mask_idx: 0.34633| ppl: 53.28111| %_neg_is_pos: 0.07818| lr: 0.00016| temp: 1.99672 | loss: 1.10043| constrast_loss: 4.30959| div_loss: 0.92142| %_mask_idx: 0.39552| ppl: 50.2887| %_neg_is_pos: 0.08644| lr: 0.00017| temp: 1.9967 | loss: 1.09358| constrast_loss: 4.28156| div_loss: 0.92768| %_mask_idx: 0.43358| ppl: 46.28678| %_neg_is_pos: 0.0635| lr: 0.00017| temp: 1.9967 | loss: 1.09324| constrast_loss: 4.28043| div_loss: 0.92538| %_mask_idx: 0.35056| ppl: 47.75809| %_neg_is_pos: 0.09552| lr: 0.00017| temp: 1.99669 | loss: 1.09945| constrast_loss: 4.30513| div_loss: 0.92678| %_mask_idx: 0.37249| ppl: 46.86064| %_neg_is_pos: 0.09636| lr: 0.00017| temp: 1.99669 | loss: 1.13281| constrast_loss: 4.43976| div_loss: 0.91476| %_mask_idx: 0.38283| ppl: 54.55511| %_neg_is_pos: 0.04659| lr: 0.00017| temp: 1.99668 | loss: 1.12279| constrast_loss: 4.39946| div_loss: 0.91709| %_mask_idx: 0.36388| ppl: 53.06144| %_neg_is_pos: 0.08513| lr: 0.00017| temp: 1.99668 | loss: 1.12098| constrast_loss: 4.39117| div_loss: 0.92763| %_mask_idx: 0.41432| ppl: 46.31411| %_neg_is_pos: 0.06436| lr: 0.00017| temp: 1.99667 | loss: 1.12986| constrast_loss: 4.42772| div_loss: 0.91733| %_mask_idx: 0.40179| ppl: 52.91029| %_neg_is_pos: 0.04991| lr: 0.00017| temp: 1.99667 | loss: 1.09812| constrast_loss: 4.29948| div_loss: 0.93006| %_mask_idx: 0.37093| ppl: 44.76325| %_neg_is_pos: 0.10168| lr: 0.00017| temp: 1.99665 | loss: 1.12957| constrast_loss: 4.42558| div_loss: 0.92689| %_mask_idx: 0.39019| ppl: 46.79025| %_neg_is_pos: 0.07753| lr: 0.00017| temp: 1.99665 | loss: 1.10677| constrast_loss: 4.33397| div_loss: 0.93126| %_mask_idx: 0.39693| ppl: 43.99393| %_neg_is_pos: 0.08344| lr: 0.00017| temp: 1.99664 | loss: 1.05547| constrast_loss: 4.12823| div_loss: 0.93633| %_mask_idx: 0.36936| ppl: 40.74962| %_neg_is_pos: 0.10285| lr: 0.00017| temp: 1.99664 | loss: 1.12333| constrast_loss: 4.40125| div_loss: 0.92091| %_mask_idx: 0.41494| ppl: 50.62063| %_neg_is_pos: 0.0629| lr: 0.00017| temp: 1.99663 | loss: 1.10383| constrast_loss: 4.32236| div_loss: 0.92961| %_mask_idx: 0.38565| ppl: 45.0489| %_neg_is_pos: 0.1042| lr: 0.00017| temp: 1.99663 | loss: 1.09834| constrast_loss: 4.30074| div_loss: 0.92632| %_mask_idx: 0.36999| ppl: 47.15218| %_neg_is_pos: 0.0828| lr: 0.00017| temp: 1.99662 | loss: 1.09494| constrast_loss: 4.28735| div_loss: 0.92419| %_mask_idx: 0.37516| ppl: 48.5162| %_neg_is_pos: 0.10668| lr: 0.00017| temp: 1.99662 | loss: 1.11819| constrast_loss: 4.38064| div_loss: 0.92129| %_mask_idx: 0.4198| ppl: 50.37287| %_neg_is_pos: 0.06074| lr: 0.00017| temp: 1.9966 | loss: 1.12344| constrast_loss: 4.40195| div_loss: 0.9181| %_mask_idx: 0.388| ppl: 52.41452| %_neg_is_pos: 0.07171| lr: 0.00017| temp: 1.9966 | loss: 1.10965| constrast_loss: 4.34615| div_loss: 0.9244| %_mask_idx: 0.3396| ppl: 48.38355| %_neg_is_pos: 0.10676| lr: 0.00017| temp: 1.99659 | loss: 1.09566| constrast_loss: 4.29033| div_loss: 0.92295| %_mask_idx: 0.42622| ppl: 49.31435| %_neg_is_pos: 0.06716| lr: 0.00017| temp: 1.99659 [2021-09-01 16:38:02,785] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1048576.0, reducing to 524288.0 [2021-09-01 16:38:02,785] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1048576.0, reducing to 524288.0 | loss: 1.11329| constrast_loss: 4.36109| div_loss: 0.9206| %_mask_idx: 0.37704| ppl: 50.81306| %_neg_is_pos: 0.07282| lr: 0.00017| temp: 1.99657 | loss: 1.11845| constrast_loss: 4.38178| div_loss: 0.92018| %_mask_idx: 0.43687| ppl: 51.08488| %_neg_is_pos: 0.05597| lr: 0.00017| temp: 1.99657 | loss: 1.16305| constrast_loss: 4.55849| div_loss: 0.93728| %_mask_idx: 0.41933| ppl: 40.14246| %_neg_is_pos: 0.04776| lr: 0.00017| temp: 1.99656 | loss: 1.15928| constrast_loss: 4.54343| div_loss: 0.93684| %_mask_idx: 0.35793| ppl: 40.42457| %_neg_is_pos: 0.04658| lr: 0.00017| temp: 1.99656 | loss: 1.12805| constrast_loss: 4.41683| div_loss: 0.95392| %_mask_idx: 0.37531| ppl: 29.48965| %_neg_is_pos: 0.11676| lr: 0.00017| temp: 1.99655 | loss: 1.12595| constrast_loss: 4.40787| div_loss: 0.95948| %_mask_idx: 0.4162| ppl: 25.93264| %_neg_is_pos: 0.10913| lr: 0.00017| temp: 1.99655 | loss: 1.11786| constrast_loss: 4.37479| div_loss: 0.96645| %_mask_idx: 0.35166| ppl: 21.46978| %_neg_is_pos: 0.15364| lr: 0.00017| temp: 1.99654 | loss: 1.12257| constrast_loss: 4.39364| div_loss: 0.96636| %_mask_idx: 0.3692| ppl: 21.53121| %_neg_is_pos: 0.11234| lr: 0.00017| temp: 1.99654 | loss: 1.12844| constrast_loss: 4.41666| div_loss: 0.97107| %_mask_idx: 0.40053| ppl: 18.51592| %_neg_is_pos: 0.08141| lr: 0.00017| temp: 1.99652 | loss: 1.11601| constrast_loss: 4.36692| div_loss: 0.97136| %_mask_idx: 0.35887| ppl: 18.32697| %_neg_is_pos: 0.09063| lr: 0.00017| temp: 1.99652 | loss: 1.08174| constrast_loss: 4.22934| div_loss: 0.97619| %_mask_idx: 0.40351| ppl: 15.23972| %_neg_is_pos: 0.10802| lr: 0.00017| temp: 1.99651 | loss: 1.11289| constrast_loss: 4.35398| div_loss: 0.97563| %_mask_idx: 0.42544| ppl: 15.59682| %_neg_is_pos: 0.09903| lr: 0.00017| temp: 1.99651 | loss: 1.12928| constrast_loss: 4.41959| div_loss: 0.97539| %_mask_idx: 0.41118| ppl: 15.75005| %_neg_is_pos: 0.0937| lr: 0.00017| temp: 1.9965 | loss: 1.1053| constrast_loss: 4.32359| div_loss: 0.97615| %_mask_idx: 0.375| ppl: 15.266| %_neg_is_pos: 0.11407| lr: 0.00017| temp: 1.9965 | loss: 1.14165| constrast_loss: 4.46845| div_loss: 0.98144| %_mask_idx: 0.39286| ppl: 11.87565| %_neg_is_pos: 0.11093| lr: 0.00018| temp: 1.99649 | loss: 1.14586| constrast_loss: 4.48536| div_loss: 0.98098| %_mask_idx: 0.35667| ppl: 12.17381| %_neg_is_pos: 0.12073| lr: 0.00018| temp: 1.99649 | loss: 1.13024| constrast_loss: 4.42251| div_loss: 0.98435| %_mask_idx: 0.3797| ppl: 10.01563| %_neg_is_pos: 0.11483| lr: 0.00018| temp: 1.99647 | loss: 1.12312| constrast_loss: 4.39399| div_loss: 0.98479| %_mask_idx: 0.40304| ppl: 9.73327| %_neg_is_pos: 0.12338| lr: 0.00018| temp: 1.99647 | loss: 1.05558| constrast_loss: 4.12384| div_loss: 0.98486| %_mask_idx: 0.35981| ppl: 9.6877| %_neg_is_pos: 0.1885| lr: 0.00018| temp: 1.99646 | loss: 1.03407| constrast_loss: 4.03774| div_loss: 0.98553| %_mask_idx: 0.35197| ppl: 9.26276| %_neg_is_pos: 0.18659| lr: 0.00018| temp: 1.99646 | loss: 1.04826| constrast_loss: 4.09453| div_loss: 0.98523| %_mask_idx: 0.36811| ppl: 9.45522| %_neg_is_pos: 0.17239| lr: 0.00018| temp: 1.99645 | loss: 1.03312| constrast_loss: 4.03392| div_loss: 0.98545| %_mask_idx: 0.39928| ppl: 9.30959| %_neg_is_pos: 0.14819| lr: 0.00018| temp: 1.99645 | loss: 1.06334| constrast_loss: 4.15476| div_loss: 0.98583| %_mask_idx: 0.36184| ppl: 9.06862| %_neg_is_pos: 0.16772| lr: 0.00018| temp: 1.99644 | loss: 1.05948| constrast_loss: 4.13938| div_loss: 0.98545| %_mask_idx: 0.40727| ppl: 9.31262| %_neg_is_pos: 0.18488| lr: 0.00018| temp: 1.99644 | loss: 1.03745| constrast_loss: 4.05113| div_loss: 0.98693| %_mask_idx: 0.43562| ppl: 8.36244| %_neg_is_pos: 0.18762| lr: 0.00018| temp: 1.99642| loss: 1.03294| constrast_loss: 4.03306| div_loss: 0.98694| %_mask_idx: 0.38972| ppl: 8.35937| %_neg_is_pos: 0.21263| lr: 0.00018| temp: 1.99642 | loss: 1.07949| constrast_loss: 4.2193| div_loss: 0.98654| %_mask_idx: 0.45457| ppl: 8.61226| %_neg_is_pos: 0.16622| lr: 0.00018| temp: 1.99641 | loss: 1.078| constrast_loss: 4.21333| div_loss: 0.98672| %_mask_idx: 0.388| ppl: 8.49733| %_neg_is_pos: 0.17772| lr: 0.00018| temp: 1.99641 | loss: 1.04892| constrast_loss: 4.0968| div_loss: 0.98884| %_mask_idx: 0.37171| ppl: 7.14191| %_neg_is_pos: 0.23374| lr: 0.00018| temp: 1.99639 | loss: 1.03474| constrast_loss: 4.04009| div_loss: 0.98858| %_mask_idx: 0.38769| ppl: 7.30604| %_neg_is_pos: 0.20278| lr: 0.00018| temp: 1.99639 | loss: 1.04538| constrast_loss: 4.08267| div_loss: 0.98853| %_mask_idx: 0.40116| ppl: 7.33779| %_neg_is_pos: 0.19289| lr: 0.00018| temp: 1.99638 | loss: 1.04172| constrast_loss: 4.068| div_loss: 0.98869| %_mask_idx: 0.41902| ppl: 7.24043| %_neg_is_pos: 0.19959| lr: 0.00018| temp: 1.99638 | loss: 0.98495| constrast_loss: 3.84093| div_loss: 0.98847| %_mask_idx: 0.40382| ppl: 7.38094| %_neg_is_pos: 0.21227| lr: 0.00018| temp: 1.99637 | loss: 1.0166| constrast_loss: 3.96756| div_loss: 0.98853| %_mask_idx: 0.41479| ppl: 7.33796| %_neg_is_pos: 0.20814| lr: 0.00018| temp: 1.99637 | loss: 1.03941| constrast_loss: 4.0588| div_loss: 0.98843| %_mask_idx: 0.38252| ppl: 7.40613| %_neg_is_pos: 0.19722| lr: 0.00018| temp: 1.99636 | loss: 1.05852| constrast_loss: 4.13515| div_loss: 0.98919| %_mask_idx: 0.41729| ppl: 6.92068| %_neg_is_pos: 0.1915| lr: 0.00018| temp: 1.99636 | loss: 1.03131| constrast_loss: 4.02638| div_loss: 0.98858| %_mask_idx: 0.40492| ppl: 7.30814| %_neg_is_pos: 0.20387| lr: 0.00018| temp: 1.99634 | loss: 1.01678| constrast_loss: 3.96822| div_loss: 0.98894| %_mask_idx: 0.37484| ppl: 7.07647| %_neg_is_pos: 0.21862| lr: 0.00018| temp: 1.99634 | loss: 1.02893| constrast_loss: 4.01686| div_loss: 0.98859| %_mask_idx: 0.44298| ppl: 7.30267| %_neg_is_pos: 0.19902| lr: 0.00018| temp: 1.99633 | loss: 1.01| constrast_loss: 3.94112| div_loss: 0.98869| %_mask_idx: 0.37876| ppl: 7.24125| %_neg_is_pos: 0.21332| lr: 0.00018| temp: 1.99633 | loss: 1.03113| constrast_loss: 4.02571| div_loss: 0.98821| %_mask_idx: 0.36435| ppl: 7.54682| %_neg_is_pos: 0.20112| lr: 0.00018| temp: 1.99632 | loss: 1.05012| constrast_loss: 4.10164| div_loss: 0.98831| %_mask_idx: 0.40194| ppl: 7.48399| %_neg_is_pos: 0.1927| lr: 0.00018| temp: 1.99632 | loss: 1.04621| constrast_loss: 4.08599| div_loss: 0.98866| %_mask_idx: 0.38925| ppl: 7.25978| %_neg_is_pos: 0.19381| lr: 0.00018| temp: 1.99631 | loss: 1.04299| constrast_loss: 4.07311| div_loss: 0.98855| %_mask_idx: 0.39145| ppl: 7.32968| %_neg_is_pos: 0.20227| lr: 0.00018| temp: 1.99631 | loss: 1.0127| constrast_loss: 3.95191| div_loss: 0.98886| %_mask_idx: 0.39912| ppl: 7.12716| %_neg_is_pos: 0.22383| lr: 0.00019| temp: 1.99629 | loss: 1.03153| constrast_loss: 4.02728| div_loss: 0.98851| %_mask_idx: 0.38878| ppl: 7.35672| %_neg_is_pos: 0.21391| lr: 0.00019| temp: 1.99629 | loss: 1.01658| constrast_loss: 3.96749| div_loss: 0.98835| %_mask_idx: 0.38518| ppl: 7.45813| %_neg_is_pos: 0.20796| lr: 0.00019| temp: 1.99628 | loss: 1.01014| constrast_loss: 3.94174| div_loss: 0.98833| %_mask_idx: 0.36795| ppl: 7.46846| %_neg_is_pos: 0.18546| lr: 0.00019| temp: 1.99628 | loss: 1.0535| constrast_loss: 4.11517| div_loss: 0.98853| %_mask_idx: 0.31657| ppl: 7.3387| %_neg_is_pos: 0.20622| lr: 0.00019| temp: 1.99627 | loss: 0.99996| constrast_loss: 3.90099| div_loss: 0.98864| %_mask_idx: 0.36106| ppl: 7.26882| %_neg_is_pos: 0.20547| lr: 0.00019| temp: 1.99627 | loss: 1.00889| constrast_loss: 3.93666| div_loss: 0.98891| %_mask_idx: 0.35871| ppl: 7.0948| %_neg_is_pos: 0.22432| lr: 0.00019| temp: 1.99626 | loss: 1.04062| constrast_loss: 4.0636| div_loss: 0.98862| %_mask_idx: 0.38315| ppl: 7.28053| %_neg_is_pos: 0.19765| lr: 0.00019| temp: 1.99626 | loss: 1.01149| constrast_loss: 3.94708| div_loss: 0.98863| %_mask_idx: 0.41118| ppl: 7.27637| %_neg_is_pos: 0.21483| lr: 0.00019| temp: 1.99624 | loss: 1.03729| constrast_loss: 4.05032| div_loss: 0.9885| %_mask_idx: 0.38487| ppl: 7.36316| %_neg_is_pos: 0.21332| lr: 0.00019| temp: 1.99624 | loss: 1.0262| constrast_loss: 4.00597| div_loss: 0.98826| %_mask_idx: 0.36529| ppl: 7.51362| %_neg_is_pos: 0.19444| lr: 0.00019| temp: 1.99623 | loss: 1.03233| constrast_loss: 4.03044| div_loss: 0.9886| %_mask_idx: 0.41369| ppl: 7.29484| %_neg_is_pos: 0.21206| lr: 0.00019| temp: 1.99623 | loss: 1.01572| constrast_loss: 3.964| div_loss: 0.98888| %_mask_idx: 0.35887| ppl: 7.11396| %_neg_is_pos: 0.19465| lr: 0.00019| temp: 1.99621 | loss: 0.99461| constrast_loss: 3.87959| div_loss: 0.98834| %_mask_idx: 0.35041| ppl: 7.46539| %_neg_is_pos: 0.20817| lr: 0.00019| temp: 1.99621 | loss: 1.01635| constrast_loss: 3.96653| div_loss: 0.98875| %_mask_idx: 0.45332| ppl: 7.19759| %_neg_is_pos: 0.20838| lr: 0.00019| temp: 1.9962 | loss: 1.04768| constrast_loss: 4.09188| div_loss: 0.98852| %_mask_idx: 0.40445| ppl: 7.3446| %_neg_is_pos: 0.19626| lr: 0.00019| temp: 1.9962 | loss: 1.00486| constrast_loss: 3.92058| div_loss: 0.98873| %_mask_idx: 0.34571| ppl: 7.20972| %_neg_is_pos: 0.22045| lr: 0.00019| temp: 1.99619 | loss: 1.03623| constrast_loss: 4.04608| div_loss: 0.98855| %_mask_idx: 0.36137| ppl: 7.32914| %_neg_is_pos: 0.21474| lr: 0.00019| temp: 1.99619 | loss: 1.02099| constrast_loss: 3.98509| div_loss: 0.98883| %_mask_idx: 0.39615| ppl: 7.14987| %_neg_is_pos: 0.21179| lr: 0.00019| temp: 1.99618 | loss: 1.0875| constrast_loss: 4.25114| div_loss: 0.98854| %_mask_idx: 0.36701| ppl: 7.33607| %_neg_is_pos: 0.19133| lr: 0.00019| temp: 1.99618 | loss: 1.01584| constrast_loss: 3.96449| div_loss: 0.98866| %_mask_idx: 0.38064| ppl: 7.25543| %_neg_is_pos: 0.21095| lr: 0.00019| temp: 1.99616 | loss: 1.06166| constrast_loss: 4.14778| div_loss: 0.98881| %_mask_idx: 0.43139| ppl: 7.16246| %_neg_is_pos: 0.2131| lr: 0.00019| temp: 1.99616 | loss: 1.01626| constrast_loss: 3.96623| div_loss: 0.98825| %_mask_idx: 0.362| ppl: 7.52013| %_neg_is_pos: 0.213| lr: 0.00019| temp: 1.99615 | loss: 0.99019| constrast_loss: 3.86191| div_loss: 0.98848| %_mask_idx: 0.37578| ppl: 7.37065| %_neg_is_pos: 0.2032| lr: 0.00019| temp: 1.99615 | loss: 1.04858| constrast_loss: 4.09545| div_loss: 0.98865| %_mask_idx: 0.42654| ppl: 7.26244| %_neg_is_pos: 0.1929| lr: 0.00019| temp: 1.99614 | loss: 1.04809| constrast_loss: 4.09348| div_loss: 0.9887| %_mask_idx: 0.36513| ppl: 7.23445| %_neg_is_pos: 0.2261| lr: 0.00019| temp: 1.99614 | loss: 1.01064| constrast_loss: 3.94371| div_loss: 0.98851| %_mask_idx: 0.44126| ppl: 7.3533| %_neg_is_pos: 0.21328| lr: 0.00019| temp: 1.99613 | loss: 1.05102| constrast_loss: 4.10523| div_loss: 0.98846| %_mask_idx: 0.35824| ppl: 7.38355| %_neg_is_pos: 0.19773| lr: 0.00019| temp: 1.99613 | loss: 1.04011| constrast_loss: 4.06152| div_loss: 0.98914| %_mask_idx: 0.45128| ppl: 6.94794| %_neg_is_pos: 0.19776| lr: 0.00019| temp: 1.99611 | loss: 1.03308| constrast_loss: 4.03345| div_loss: 0.98883| %_mask_idx: 0.36357| ppl: 7.14844| %_neg_is_pos: 0.20207| lr: 0.00019| temp: 1.99611 | loss: 1.02677| constrast_loss: 4.00821| div_loss: 0.98875| %_mask_idx: 0.34085| ppl: 7.20167| %_neg_is_pos: 0.21731| lr: 0.0002| temp: 1.9961 | loss: 1.0368| constrast_loss: 4.04836| div_loss: 0.9884| %_mask_idx: 0.34336| ppl: 7.42365| %_neg_is_pos: 0.20562| lr: 0.0002| temp: 1.9961 | loss: 1.03096| constrast_loss: 4.02497| div_loss: 0.98858| %_mask_idx: 0.38565| ppl: 7.30957| %_neg_is_pos: 0.21241| lr: 0.0002| temp: 1.99609 | loss: 1.00687| constrast_loss: 3.92862| div_loss: 0.98877| %_mask_idx: 0.35683| ppl: 7.18667| %_neg_is_pos: 0.19452| lr: 0.0002| temp: 1.99609 | loss: 1.06363| constrast_loss: 4.15563| div_loss: 0.98872| %_mask_idx: 0.40883| ppl: 7.21888| %_neg_is_pos: 0.21921| lr: 0.0002| temp: 1.99608 | loss: 1.04995| constrast_loss: 4.10093| div_loss: 0.98871| %_mask_idx: 0.39364| ppl: 7.22774| %_neg_is_pos: 0.2091| lr: 0.0002| temp: 1.99608 | loss: 1.06698| constrast_loss: 4.16906| div_loss: 0.98882| %_mask_idx: 0.33976| ppl: 7.15205| %_neg_is_pos: 0.20455| lr: 0.0002| temp: 1.99606 | loss: 1.00372| constrast_loss: 3.91601| div_loss: 0.98885| %_mask_idx: 0.41416| ppl: 7.13703| %_neg_is_pos: 0.21374| lr: 0.0002| temp: 1.99606 | loss: 0.99559| constrast_loss: 3.88352| div_loss: 0.98849| %_mask_idx: 0.36153| ppl: 7.36735| %_neg_is_pos: 0.20783| lr: 0.0002| temp: 1.99605 | loss: 1.01137| constrast_loss: 3.94662| div_loss: 0.98858| %_mask_idx: 0.33349| ppl: 7.30573| %_neg_is_pos: 0.23115| lr: 0.0002| temp: 1.99605 | loss: 1.02411| constrast_loss: 3.99761| div_loss: 0.98848| %_mask_idx: 0.35558| ppl: 7.37261| %_neg_is_pos: 0.2182| lr: 0.0002| temp: 1.99603 | loss: 1.03981| constrast_loss: 4.06042| div_loss: 0.98835| %_mask_idx: 0.45113| ppl: 7.45758| %_neg_is_pos: 0.19771| lr: 0.0002| temp: 1.99603 | loss: 1.05493| constrast_loss: 4.12088| div_loss: 0.98846| %_mask_idx: 0.40836| ppl: 7.3835| %_neg_is_pos: 0.18592| lr: 0.0002| temp: 1.99602 | loss: 1.05505| constrast_loss: 4.12134| div_loss: 0.98845| %_mask_idx: 0.35573| ppl: 7.3923| %_neg_is_pos: 0.19456| lr: 0.0002| temp: 1.99602 | loss: 1.02672| constrast_loss: 4.00804| div_loss: 0.98851| %_mask_idx: 0.40586| ppl: 7.35642| %_neg_is_pos: 0.22911| lr: 0.0002| temp: 1.99601 | loss: 1.02094| constrast_loss: 3.98491| div_loss: 0.98844| %_mask_idx: 0.37845| ppl: 7.40104| %_neg_is_pos: 0.20519| lr: 0.0002| temp: 1.99601 | loss: 1.04917| constrast_loss: 4.09781| div_loss: 0.98857| %_mask_idx: 0.37657| ppl: 7.31379| %_neg_is_pos: 0.19319| lr: 0.0002| temp: 1.996 | loss: 1.01065| constrast_loss: 3.94374| div_loss: 0.98841| %_mask_idx: 0.37375| ppl: 7.4181| %_neg_is_pos: 0.20869| lr: 0.0002| temp: 1.996 | loss: 1.0483| constrast_loss: 4.09435| div_loss: 0.98853| %_mask_idx: 0.40335| ppl: 7.34019| %_neg_is_pos: 0.21312| lr: 0.0002| temp: 1.99598 | loss: 1.03517| constrast_loss: 4.04184| div_loss: 0.98825| %_mask_idx: 0.3869| ppl: 7.51982| %_neg_is_pos: 0.18587| lr: 0.0002| temp: 1.99598 | loss: 1.00184| constrast_loss: 3.90849| div_loss: 0.98854| %_mask_idx: 0.36169| ppl: 7.33558| %_neg_is_pos: 0.20841| lr: 0.0002| temp: 1.99597 | loss: 0.99973| constrast_loss: 3.90008| div_loss: 0.98863| %_mask_idx: 0.41729| ppl: 7.2737| %_neg_is_pos: 0.20275| lr: 0.0002| temp: 1.99597 | loss: 1.03238| constrast_loss: 4.03066| div_loss: 0.98842| %_mask_idx: 0.39035| ppl: 7.4135| %_neg_is_pos: 0.2001| lr: 0.0002| temp: 1.99596 | loss: 1.01705| constrast_loss: 3.96936| div_loss: 0.98843| %_mask_idx: 0.35902| ppl: 7.40321| %_neg_is_pos: 0.20722| lr: 0.0002| temp: 1.99596 | loss: 0.99324| constrast_loss: 3.8741| div_loss: 0.98866| %_mask_idx: 0.34618| ppl: 7.25469| %_neg_is_pos: 0.21443| lr: 0.0002| temp: 1.99595 | loss: 1.05709| constrast_loss: 4.12953| div_loss: 0.98838| %_mask_idx: 0.42873| ppl: 7.4395| %_neg_is_pos: 0.19568| lr: 0.0002| temp: 1.99595 | loss: 1.02513| constrast_loss: 4.00166| div_loss: 0.98872| %_mask_idx: 0.38346| ppl: 7.21666| %_neg_is_pos: 0.2125| lr: 0.0002| temp: 1.99593 | loss: 0.98538| constrast_loss: 3.84265| div_loss: 0.98873| %_mask_idx: 0.37594| ppl: 7.21058| %_neg_is_pos: 0.21695| lr: 0.0002| temp: 1.99593 | loss: 1.0666| constrast_loss: 4.16754| div_loss: 0.9887| %_mask_idx: 0.42732| ppl: 7.23151| %_neg_is_pos: 0.19681| lr: 0.0002| temp: 1.99592 | loss: 1.0617| constrast_loss: 4.14797| div_loss: 0.98845| %_mask_idx: 0.40868| ppl: 7.39442| %_neg_is_pos: 0.17076| lr: 0.0002| temp: 1.99592 | loss: 1.04414| constrast_loss: 4.07769| div_loss: 0.98876| %_mask_idx: 0.35652| ppl: 7.19443| %_neg_is_pos: 0.19386| lr: 0.0002| temp: 1.99591 | loss: 1.01054| constrast_loss: 3.9433| div_loss: 0.98866| %_mask_idx: 0.36169| ppl: 7.25496| %_neg_is_pos: 0.21544| lr: 0.0002| temp: 1.99591 | loss: 1.06052| constrast_loss: 4.14321| div_loss: 0.98884| %_mask_idx: 0.388| ppl: 7.14525| %_neg_is_pos: 0.21857| lr: 0.00021| temp: 1.9959 | loss: 1.06346| constrast_loss: 4.15499| div_loss: 0.98836| %_mask_idx: 0.45269| ppl: 7.44893| %_neg_is_pos: 0.20109| lr: 0.00021| temp: 1.9959 | loss: 1.00738| constrast_loss: 3.93067| div_loss: 0.98848| %_mask_idx: 0.39223| ppl: 7.37317| %_neg_is_pos: 0.19477| lr: 0.00021| temp: 1.99588 | loss: 0.97454| constrast_loss: 3.79929| div_loss: 0.98867| %_mask_idx: 0.36826| ppl: 7.24897| %_neg_is_pos: 0.21909| lr: 0.00021| temp: 1.99588 | loss: 1.0484| constrast_loss: 4.09472| div_loss: 0.98875| %_mask_idx: 0.289| ppl: 7.20303| %_neg_is_pos: 0.20938| lr: 0.00021| temp: 1.99587 | loss: 1.03867| constrast_loss: 4.05581| div_loss: 0.98875| %_mask_idx: 0.39677| ppl: 7.1987| %_neg_is_pos: 0.2183| lr: 0.00021| temp: 1.99587 [2021-09-01 16:47:17,643] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 524288.0, reducing to 262144.0 [2021-09-01 16:47:17,643] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 524288.0, reducing to 262144.0 | loss: 1.0141| constrast_loss: 3.95752| div_loss: 0.98887| %_mask_idx: 0.36075| ppl: 7.12543| %_neg_is_pos: 0.21089| lr: 0.00021| temp: 1.99585 | loss: 1.0258| constrast_loss: 4.00434| div_loss: 0.98867| %_mask_idx: 0.43609| ppl: 7.25021| %_neg_is_pos: 0.1974| lr: 0.00021| temp: 1.99585 | loss: 1.038| constrast_loss: 4.05306| div_loss: 0.98921| %_mask_idx: 0.39959| ppl: 6.90852| %_neg_is_pos: 0.1903| lr: 0.00021| temp: 1.99584 | loss: 1.03307| constrast_loss: 4.03336| div_loss: 0.98929| %_mask_idx: 0.40774| ppl: 6.85689| %_neg_is_pos: 0.18435| lr: 0.00021| temp: 1.99584 | loss: 1.06425| constrast_loss: 4.15801| div_loss: 0.98979| %_mask_idx: 0.36858| ppl: 6.53469| %_neg_is_pos: 0.18268| lr: 0.00021| temp: 1.99583 | loss: 1.03643| constrast_loss: 4.04674| div_loss: 0.98958| %_mask_idx: 0.38268| ppl: 6.6656| %_neg_is_pos: 0.20464| lr: 0.00021| temp: 1.99583 | loss: 1.05477| constrast_loss: 4.12018| div_loss: 0.98913| %_mask_idx: 0.40116| ppl: 6.95852| %_neg_is_pos: 0.1853| lr: 0.00021| temp: 1.99582 | loss: 1.05658| constrast_loss: 4.1274| div_loss: 0.98939| %_mask_idx: 0.42732| ppl: 6.78833| %_neg_is_pos: 0.18893| lr: 0.00021| temp: 1.99582 | loss: 1.03644| constrast_loss: 4.04679| div_loss: 0.98984| %_mask_idx: 0.42841| ppl: 6.50226| %_neg_is_pos: 0.21299| lr: 0.00021| temp: 1.9958 | loss: 1.02341| constrast_loss: 3.99461| div_loss: 0.99027| %_mask_idx: 0.38863| ppl: 6.22461| %_neg_is_pos: 0.20963| lr: 0.00021| temp: 1.9958 | loss: 1.00179| constrast_loss: 3.90807| div_loss: 0.9908| %_mask_idx: 0.37296| ppl: 5.8902| %_neg_is_pos: 0.2486| lr: 0.00021| temp: 1.99579 | loss: 1.00615| constrast_loss: 3.92551| div_loss: 0.99102| %_mask_idx: 0.38252| ppl: 5.74535| %_neg_is_pos: 0.25797| lr: 0.00021| temp: 1.99579 | loss: 0.92198| constrast_loss: 3.58872| div_loss: 0.99189| %_mask_idx: 0.37171| ppl: 5.19183| %_neg_is_pos: 0.3248| lr: 0.00021| temp: 1.99578 | loss: 0.96015| constrast_loss: 3.74145| div_loss: 0.99171| %_mask_idx: 0.42372| ppl: 5.30649| %_neg_is_pos: 0.30261| lr: 0.00021| temp: 1.99578 | loss: 1.01951| constrast_loss: 3.97886| div_loss: 0.99161| %_mask_idx: 0.37516| ppl: 5.3713| %_neg_is_pos: 0.27312| lr: 0.00021| temp: 1.99577 | loss: 1.04764| constrast_loss: 4.09139| div_loss: 0.99165| %_mask_idx: 0.3869| ppl: 5.34176| %_neg_is_pos: 0.26329| lr: 0.00021| temp: 1.99577 | loss: 0.75669| constrast_loss: 2.9273| div_loss: 0.99458| %_mask_idx: 0.38863| ppl: 3.46737| %_neg_is_pos: 0.5319| lr: 0.00021| temp: 1.99575 | loss: 0.79962| constrast_loss: 3.09902| div_loss: 0.99453| %_mask_idx: 0.42027| ppl: 3.49814| %_neg_is_pos: 0.54281| lr: 0.00021| temp: 1.99575 | loss: 0.53677| constrast_loss: 2.04762| div_loss: 0.99444| %_mask_idx: 0.32033| ppl: 3.55534| %_neg_is_pos: 0.48599| lr: 0.00021| temp: 1.99574 | loss: 0.6114| constrast_loss: 2.34618| div_loss: 0.99426| %_mask_idx: 0.35871| ppl: 3.6759| %_neg_is_pos: 0.46669| lr: 0.00021| temp: 1.99574 | loss: 0.70083| constrast_loss: 2.70393| div_loss: 0.9937| %_mask_idx: 0.38033| ppl: 4.0337| %_neg_is_pos: 0.44589| lr: 0.00021| temp: 1.99573 | loss: 0.76043| constrast_loss: 2.94236| div_loss: 0.99364| %_mask_idx: 0.39991| ppl: 4.07193| %_neg_is_pos: 0.47667| lr: 0.00021| temp: 1.99573 | loss: 1.02859| constrast_loss: 4.01511| div_loss: 0.99269| %_mask_idx: 0.38737| ppl: 4.68124| %_neg_is_pos: 0.3625| lr: 0.00021| temp: 1.99572 | loss: 1.01552| constrast_loss: 3.9628| div_loss: 0.99272| %_mask_idx: 0.43703| ppl: 4.66088| %_neg_is_pos: 0.38949| lr: 0.00021| temp: 1.99572 [2021-09-01 16:49:31,699] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 262144.0, reducing to 131072.0 [2021-09-01 16:49:31,699] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 262144.0, reducing to 131072.0 | loss: 1.0707| constrast_loss: 4.18342| div_loss: 0.99367| %_mask_idx: 0.39834| ppl: 4.04921| %_neg_is_pos: 0.48169| lr: 0.00021| temp: 1.9957 | loss: 1.03879| constrast_loss: 4.05579| div_loss: 0.99369| %_mask_idx: 0.36826| ppl: 4.03917| %_neg_is_pos: 0.47086| lr: 0.00021| temp: 1.9957 | loss: 0.89423| constrast_loss: 3.47756| div_loss: 0.9938| %_mask_idx: 0.39959| ppl: 3.96982| %_neg_is_pos: 0.49778| lr: 0.00022| temp: 1.99569 | loss: 0.89508| constrast_loss: 3.48093| div_loss: 0.99378| %_mask_idx: 0.39662| ppl: 3.9794| %_neg_is_pos: 0.49836| lr: 0.00022| temp: 1.99569 | loss: 1.00906| constrast_loss: 3.93686| div_loss: 0.99378| %_mask_idx: 0.41197| ppl: 3.98203| %_neg_is_pos: 0.45762| lr: 0.00022| temp: 1.99567 | loss: 0.95001| constrast_loss: 3.70066| div_loss: 0.99376| %_mask_idx: 0.42293| ppl: 3.99365| %_neg_is_pos: 0.47413| lr: 0.00022| temp: 1.99567 | loss: 0.95416| constrast_loss: 3.71726| div_loss: 0.99377| %_mask_idx: 0.35041| ppl: 3.98727| %_neg_is_pos: 0.47727| lr: 0.00022| temp: 1.99566 | loss: 0.94793| constrast_loss: 3.69236| div_loss: 0.99377| %_mask_idx: 0.35495| ppl: 3.98775| %_neg_is_pos: 0.47188| lr: 0.00022| temp: 1.99566 | loss: 0.93732| constrast_loss: 3.64992| div_loss: 0.99376| %_mask_idx: 0.39912| ppl: 3.99363| %_neg_is_pos: 0.47221| lr: 0.00022| temp: 1.99565 | loss: 0.96917| constrast_loss: 3.77732| div_loss: 0.99383| %_mask_idx: 0.39803| ppl: 3.951| %_neg_is_pos: 0.49127| lr: 0.00022| temp: 1.99565 | loss: 0.9294| constrast_loss: 3.61822| div_loss: 0.99376| %_mask_idx: 0.38048| ppl: 3.99483| %_neg_is_pos: 0.47704| lr: 0.00022| temp: 1.99564 | loss: 0.95884| constrast_loss: 3.73599| div_loss: 0.99377| %_mask_idx: 0.37798| ppl: 3.98892| %_neg_is_pos: 0.46401| lr: 0.00022| temp: 1.99564 | loss: 0.94486| constrast_loss: 3.68007| div_loss: 0.99379| %_mask_idx: 0.36654| ppl: 3.97634| %_neg_is_pos: 0.45189| lr: 0.00022| temp: 1.99562 | loss: 0.92861| constrast_loss: 3.61507| div_loss: 0.99376| %_mask_idx: 0.43405| ppl: 3.99493| %_neg_is_pos: 0.47039| lr: 0.00022| temp: 1.99562 | loss: 0.95765| constrast_loss: 3.73123| div_loss: 0.99376| %_mask_idx: 0.39615| ppl: 3.99366| %_neg_is_pos: 0.47826| lr: 0.00022| temp: 1.99561 | loss: 0.92515| constrast_loss: 3.60123| div_loss: 0.99376| %_mask_idx: 0.35918| ppl: 3.99572| %_neg_is_pos: 0.47856| lr: 0.00022| temp: 1.99561 | loss: 0.94565| constrast_loss: 3.68323| div_loss: 0.99376| %_mask_idx: 0.38565| ppl: 3.99138| %_neg_is_pos: 0.46933| lr: 0.00022| temp: 1.9956 | loss: 0.94409| constrast_loss: 3.67698| div_loss: 0.99377| %_mask_idx: 0.33913| ppl: 3.98946| %_neg_is_pos: 0.48791| lr: 0.00022| temp: 1.9956 | loss: 0.92976| constrast_loss: 3.61967| div_loss: 0.99376| %_mask_idx: 0.39411| ppl: 3.99616| %_neg_is_pos: 0.4673| lr: 0.00022| temp: 1.99559 | loss: 0.93591| constrast_loss: 3.64426| div_loss: 0.99376| %_mask_idx: 0.39254| ppl: 3.99458| %_neg_is_pos: 0.47064| lr: 0.00022| temp: 1.99559 | loss: 0.94318| constrast_loss: 3.67335| div_loss: 0.99376| %_mask_idx: 0.37798| ppl: 3.99386| %_neg_is_pos: 0.46032| lr: 0.00022| temp: 1.99557 | loss: 0.96352| constrast_loss: 3.75468| div_loss: 0.99376| %_mask_idx: 0.40038| ppl: 3.9921| %_neg_is_pos: 0.48054| lr: 0.00022| temp: 1.99557 | loss: 0.96817| constrast_loss: 3.7733| div_loss: 0.99376| %_mask_idx: 0.37829| ppl: 3.99295| %_neg_is_pos: 0.47709| lr: 0.00022| temp: 1.99556 | loss: 0.93922| constrast_loss: 3.65751| div_loss: 0.99377| %_mask_idx: 0.41385| ppl: 3.98505| %_neg_is_pos: 0.48622| lr: 0.00022| temp: 1.99556 | loss: 0.96432| constrast_loss: 3.75791| div_loss: 0.99377| %_mask_idx: 0.43891| ppl: 3.98735| %_neg_is_pos: 0.45767| lr: 0.00022| temp: 1.99555 | loss: 0.95766| constrast_loss: 3.73127| div_loss: 0.99377| %_mask_idx: 0.40648| ppl: 3.98524| %_neg_is_pos: 0.45604| lr: 0.00022| temp: 1.99555 | loss: 0.93669| constrast_loss: 3.64737| div_loss: 0.99376| %_mask_idx: 0.39066| ppl: 3.99346| %_neg_is_pos: 0.47305| lr: 0.00022| temp: 1.99554 | loss: 0.94577| constrast_loss: 3.68371| div_loss: 0.9938| %_mask_idx: 0.36764| ppl: 3.96699| %_neg_is_pos: 0.49063| lr: 0.00022| temp: 1.99554 | loss: 0.96851| constrast_loss: 3.77467| div_loss: 0.99378| %_mask_idx: 0.38346| ppl: 3.97861| %_neg_is_pos: 0.50044| lr: 0.00022| temp: 1.99553 | loss: 0.95085| constrast_loss: 3.704| div_loss: 0.99377| %_mask_idx: 0.3714| ppl: 3.98941| %_neg_is_pos: 0.46975| lr: 0.00022| temp: 1.99553 | loss: 1.01348| constrast_loss: 3.95454| div_loss: 0.99376| %_mask_idx: 0.3761| ppl: 3.99345| %_neg_is_pos: 0.47121| lr: 0.00022| temp: 1.99551 | loss: 0.96468| constrast_loss: 3.75936| div_loss: 0.99376| %_mask_idx: 0.40304| ppl: 3.99265| %_neg_is_pos: 0.47567| lr: 0.00022| temp: 1.99551 | loss: 0.93874| constrast_loss: 3.65558| div_loss: 0.99376| %_mask_idx: 0.401| ppl: 3.9923| %_neg_is_pos: 0.47773| lr: 0.00023| temp: 1.9955 | loss: 0.93704| constrast_loss: 3.64878| div_loss: 0.99376| %_mask_idx: 0.37672| ppl: 3.99553| %_neg_is_pos: 0.47021| lr: 0.00023| temp: 1.9955 | loss: 0.9678| constrast_loss: 3.77182| div_loss: 0.99376| %_mask_idx: 0.38581| ppl: 3.99384| %_neg_is_pos: 0.48041| lr: 0.00023| temp: 1.99549 | loss: 0.98387| constrast_loss: 3.83612| div_loss: 0.99377| %_mask_idx: 0.37077| ppl: 3.99014| %_neg_is_pos: 0.48225| lr: 0.00023| temp: 1.99549 | loss: 0.98932| constrast_loss: 3.85792| div_loss: 0.99377| %_mask_idx: 0.41776| ppl: 3.98936| %_neg_is_pos: 0.46942| lr: 0.00023| temp: 1.99548 | loss: 0.96594| constrast_loss: 3.76439| div_loss: 0.99376| %_mask_idx: 0.41009| ppl: 3.99228| %_neg_is_pos: 0.47044| lr: 0.00023| temp: 1.99548 | loss: 0.93582| constrast_loss: 3.64389| div_loss: 0.99376| %_mask_idx: 0.3974| ppl: 3.99228| %_neg_is_pos: 0.46088| lr: 0.00023| temp: 1.99547 | loss: 0.95645| constrast_loss: 3.72641| div_loss: 0.99376| %_mask_idx: 0.4021| ppl: 3.99122| %_neg_is_pos: 0.4684| lr: 0.00023| temp: 1.99547 | loss: 0.98683| constrast_loss: 3.84793| div_loss: 0.99379| %_mask_idx: 0.40852| ppl: 3.9756| %_neg_is_pos: 0.46781| lr: 0.00023| temp: 1.99545 | loss: 0.95364| constrast_loss: 3.71516| div_loss: 0.99378| %_mask_idx: 0.31767| ppl: 3.98092| %_neg_is_pos: 0.49272| lr: 0.00023| temp: 1.99545 | loss: 0.97825| constrast_loss: 3.81362| div_loss: 0.99376| %_mask_idx: 0.35871| ppl: 3.99239| %_neg_is_pos: 0.4779| lr: 0.00023| temp: 1.99544 | loss: 0.96633| constrast_loss: 3.76594| div_loss: 0.99378| %_mask_idx: 0.38565| ppl: 3.97844| %_neg_is_pos: 0.47862| lr: 0.00023| temp: 1.99544 | loss: 0.93462| constrast_loss: 3.63911| div_loss: 0.99376| %_mask_idx: 0.3844| ppl: 3.99338| %_neg_is_pos: 0.47604| lr: 0.00023| temp: 1.99543 | loss: 0.93483| constrast_loss: 3.63993| div_loss: 0.99376| %_mask_idx: 0.4021| ppl: 3.99425| %_neg_is_pos: 0.46817| lr: 0.00023| temp: 1.99543 | loss: 0.97601| constrast_loss: 3.80464| div_loss: 0.99376| %_mask_idx: 0.39568| ppl: 3.99249| %_neg_is_pos: 0.4735| lr: 0.00023| temp: 1.99542 | loss: 0.96246| constrast_loss: 3.75048| div_loss: 0.99376| %_mask_idx: 0.38127| ppl: 3.99301| %_neg_is_pos: 0.47575| lr: 0.00023| temp: 1.99542 | loss: 0.97833| constrast_loss: 3.81393| div_loss: 0.99376| %_mask_idx: 0.43531| ppl: 3.99246| %_neg_is_pos: 0.47249| lr: 0.00023| temp: 1.9954 | loss: 0.96893| constrast_loss: 3.77636| div_loss: 0.99378| %_mask_idx: 0.39066| ppl: 3.98011| %_neg_is_pos: 0.48154| lr: 0.00023| temp: 1.9954 | loss: 0.97759| constrast_loss: 3.81097| div_loss: 0.99376| %_mask_idx: 0.4328| ppl: 3.99137| %_neg_is_pos: 0.46525| lr: 0.00023| temp: 1.99539 | loss: 0.96541| constrast_loss: 3.76226| div_loss: 0.99379| %_mask_idx: 0.38221| ppl: 3.97253| %_neg_is_pos: 0.4721| lr: 0.00023| temp: 1.99539 | loss: 0.93313| constrast_loss: 3.63313| div_loss: 0.99377| %_mask_idx: 0.36811| ppl: 3.9858| %_neg_is_pos: 0.45855| lr: 0.00023| temp: 1.99538 | loss: 0.93902| constrast_loss: 3.65669| div_loss: 0.99376| %_mask_idx: 0.42137| ppl: 3.99227| %_neg_is_pos: 0.47015| lr: 0.00023| temp: 1.99538 | loss: 0.93131| constrast_loss: 3.62587| div_loss: 0.99376| %_mask_idx: 0.36184| ppl: 3.99407| %_neg_is_pos: 0.47545| lr: 0.00023| temp: 1.99537 | loss: 0.9525| constrast_loss: 3.71062| div_loss: 0.99376| %_mask_idx: 0.39928| ppl: 3.99175| %_neg_is_pos: 0.455| lr: 0.00023| temp: 1.99537 | loss: 0.92187| constrast_loss: 3.58812| div_loss: 0.99376| %_mask_idx: 0.33678| ppl: 3.99075| %_neg_is_pos: 0.46743| lr: 0.00023| temp: 1.99535 | loss: 0.9616| constrast_loss: 3.74702| div_loss: 0.99376| %_mask_idx: 0.35056| ppl: 3.99282| %_neg_is_pos: 0.4968| lr: 0.00023| temp: 1.99535 | loss: 0.95465| constrast_loss: 3.71921| div_loss: 0.99377| %_mask_idx: 0.36216| ppl: 3.989| %_neg_is_pos: 0.45732| lr: 0.00023| temp: 1.99534 | loss: 0.94951| constrast_loss: 3.69867| div_loss: 0.99377| %_mask_idx: 0.37296| ppl: 3.99016| %_neg_is_pos: 0.46944| lr: 0.00023| temp: 1.99534 | loss: 0.99915| constrast_loss: 3.89721| div_loss: 0.99376| %_mask_idx: 0.44408| ppl: 3.99072| %_neg_is_pos: 0.45938| lr: 0.00023| temp: 1.99532 | loss: 0.97138| constrast_loss: 3.78613| div_loss: 0.99376| %_mask_idx: 0.38957| ppl: 3.99434| %_neg_is_pos: 0.48351| lr: 0.00023| temp: 1.99532 | loss: 0.97706| constrast_loss: 3.80887| div_loss: 0.99376| %_mask_idx: 0.37249| ppl: 3.99292| %_neg_is_pos: 0.47889| lr: 0.00023| temp: 1.99531 | loss: 0.97182| constrast_loss: 3.78792| div_loss: 0.99376| %_mask_idx: 0.33976| ppl: 3.99241| %_neg_is_pos: 0.46534| lr: 0.00023| temp: 1.99531 | loss: 0.93806| constrast_loss: 3.65285| div_loss: 0.99376| %_mask_idx: 0.30404| ppl: 3.99589| %_neg_is_pos: 0.50196| lr: 0.00024| temp: 1.9953 | loss: 0.93248| constrast_loss: 3.63055| div_loss: 0.99376| %_mask_idx: 0.43452| ppl: 3.99378| %_neg_is_pos: 0.45712| lr: 0.00024| temp: 1.9953 | loss: 0.9695| constrast_loss: 3.77861| div_loss: 0.99376| %_mask_idx: 0.39991| ppl: 3.99147| %_neg_is_pos: 0.46421| lr: 0.00024| temp: 1.99529 | loss: 0.96701| constrast_loss: 3.76867| div_loss: 0.99378| %_mask_idx: 0.38033| ppl: 3.98189| %_neg_is_pos: 0.48249| lr: 0.00024| temp: 1.99529 | loss: 0.98987| constrast_loss: 3.8601| div_loss: 0.99376| %_mask_idx: 0.38174| ppl: 3.99471| %_neg_is_pos: 0.46806| lr: 0.00024| temp: 1.99527 | loss: 0.92637| constrast_loss: 3.6061| div_loss: 0.99376| %_mask_idx: 0.34273| ppl: 3.99394| %_neg_is_pos: 0.47101| lr: 0.00024| temp: 1.99527 | loss: 0.98311| constrast_loss: 3.83307| div_loss: 0.99376| %_mask_idx: 0.44267| ppl: 3.99243| %_neg_is_pos: 0.45994| lr: 0.00024| temp: 1.99526 | loss: 0.97012| constrast_loss: 3.78112| div_loss: 0.99377| %_mask_idx: 0.34539| ppl: 3.98948| %_neg_is_pos: 0.46594| lr: 0.00024| temp: 1.99526 | loss: 0.97373| constrast_loss: 3.79554| div_loss: 0.99376| %_mask_idx: 0.39552| ppl: 3.99213| %_neg_is_pos: 0.4623| lr: 0.00024| temp: 1.99525 | loss: 0.95707| constrast_loss: 3.72891| div_loss: 0.99376| %_mask_idx: 0.38675| ppl: 3.99474| %_neg_is_pos: 0.46928| lr: 0.00024| temp: 1.99525 | loss: 0.95908| constrast_loss: 3.73696| div_loss: 0.99377| %_mask_idx: 0.37406| ppl: 3.98873| %_neg_is_pos: 0.46811| lr: 0.00024| temp: 1.99524 | loss: 0.97965| constrast_loss: 3.81924| div_loss: 0.99376| %_mask_idx: 0.38221| ppl: 3.99396| %_neg_is_pos: 0.47949| lr: 0.00024| temp: 1.99524 | loss: 0.95378| constrast_loss: 3.71573| div_loss: 0.99376| %_mask_idx: 0.38362| ppl: 3.99378| %_neg_is_pos: 0.47194| lr: 0.00024| temp: 1.99522 | loss: 0.93838| constrast_loss: 3.65416| div_loss: 0.99376| %_mask_idx: 0.41745| ppl: 3.99171| %_neg_is_pos: 0.45877| lr: 0.00024| temp: 1.99522 | loss: 0.96647| constrast_loss: 3.76652| div_loss: 0.99376| %_mask_idx: 0.40069| ppl: 3.99415| %_neg_is_pos: 0.46395| lr: 0.00024| temp: 1.99521 | loss: 0.94452| constrast_loss: 3.6787| div_loss: 0.99377| %_mask_idx: 0.42325| ppl: 3.98865| %_neg_is_pos: 0.4649| lr: 0.00024| temp: 1.99521 | loss: 0.94912| constrast_loss: 3.69709| div_loss: 0.99376| %_mask_idx: 0.35229| ppl: 3.99233| %_neg_is_pos: 0.47462| lr: 0.00024| temp: 1.9952 | loss: 0.93295| constrast_loss: 3.63243| div_loss: 0.99376| %_mask_idx: 0.37108| ppl: 3.99058| %_neg_is_pos: 0.48026| lr: 0.00024| temp: 1.9952 | loss: 0.96009| constrast_loss: 3.74097| div_loss: 0.99377| %_mask_idx: 0.33224| ppl: 3.98747| %_neg_is_pos: 0.47886| lr: 0.00024| temp: 1.99519 | loss: 0.96853| constrast_loss: 3.77474| div_loss: 0.99376| %_mask_idx: 0.43358| ppl: 3.99221| %_neg_is_pos: 0.47882| lr: 0.00024| temp: 1.99519 | loss: 0.97141| constrast_loss: 3.78627| div_loss: 0.99376| %_mask_idx: 0.40648| ppl: 3.9937| %_neg_is_pos: 0.47048| lr: 0.00024| temp: 1.99517 | loss: 0.96288| constrast_loss: 3.75214| div_loss: 0.99376| %_mask_idx: 0.36811| ppl: 3.99362| %_neg_is_pos: 0.4733| lr: 0.00024| temp: 1.99517 | loss: 0.92748| constrast_loss: 3.61056| div_loss: 0.99376| %_mask_idx: 0.37531| ppl: 3.99366| %_neg_is_pos: 0.48421| lr: 0.00024| temp: 1.99516 | loss: 0.95294| constrast_loss: 3.71237| div_loss: 0.99377| %_mask_idx: 0.42607| ppl: 3.98928| %_neg_is_pos: 0.47753| lr: 0.00024| temp: 1.99516 [2021-09-01 16:56:31,557] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 131072.0, reducing to 65536.0 [2021-09-01 16:56:31,557] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 131072.0, reducing to 65536.0 | loss: 0.97986| constrast_loss: 3.82007| div_loss: 0.99376| %_mask_idx: 0.39912| ppl: 3.99221| %_neg_is_pos: 0.48368| lr: 0.00024| temp: 1.99514 | loss: 1.02033| constrast_loss: 3.98194| div_loss: 0.99379| %_mask_idx: 0.36873| ppl: 3.97677| %_neg_is_pos: 0.46117| lr: 0.00024| temp: 1.99514 | loss: 1.02601| constrast_loss: 4.00466| div_loss: 0.99385| %_mask_idx: 0.40006| ppl: 3.93402| %_neg_is_pos: 0.4369| lr: 0.00024| temp: 1.99513 | loss: 1.00866| constrast_loss: 3.93524| div_loss: 0.99382| %_mask_idx: 0.3869| ppl: 3.95265| %_neg_is_pos: 0.43122| lr: 0.00024| temp: 1.99513 | loss: 0.99096| constrast_loss: 3.86444| div_loss: 0.99385| %_mask_idx: 0.37343| ppl: 3.93889| %_neg_is_pos: 0.40544| lr: 0.00024| temp: 1.99512 | loss: 0.98832| constrast_loss: 3.85389| div_loss: 0.99381| %_mask_idx: 0.39818| ppl: 3.96116| %_neg_is_pos: 0.40914| lr: 0.00024| temp: 1.99512 | loss: 1.06826| constrast_loss: 4.17366| div_loss: 0.99388| %_mask_idx: 0.35761| ppl: 3.91735| %_neg_is_pos: 0.39164| lr: 0.00025| temp: 1.99511 | loss: 1.03335| constrast_loss: 4.03401| div_loss: 0.99385| %_mask_idx: 0.39693| ppl: 3.93812| %_neg_is_pos: 0.40689| lr: 0.00025| temp: 1.99511 | loss: 1.12369| constrast_loss: 4.39538| div_loss: 0.99396| %_mask_idx: 0.40476| ppl: 3.8664| %_neg_is_pos: 0.39189| lr: 0.00025| temp: 1.99509 | loss: 1.05973| constrast_loss: 4.13952| div_loss: 0.99395| %_mask_idx: 0.41792| ppl: 3.87489| %_neg_is_pos: 0.38905| lr: 0.00025| temp: 1.99509 | loss: 1.01128| constrast_loss: 3.94568| div_loss: 0.99427| %_mask_idx: 0.4057| ppl: 3.66808| %_neg_is_pos: 0.38601| lr: 0.00025| temp: 1.99508 | loss: 1.00974| constrast_loss: 3.9395| div_loss: 0.99446| %_mask_idx: 0.45316| ppl: 3.54294| %_neg_is_pos: 0.41305| lr: 0.00025| temp: 1.99508 | loss: 1.00738| constrast_loss: 3.92997| div_loss: 0.99538| %_mask_idx: 0.39129| ppl: 2.95367| %_neg_is_pos: 0.4954| lr: 0.00025| temp: 1.99507 | loss: 1.0056| constrast_loss: 3.92287| div_loss: 0.99538| %_mask_idx: 0.36028| ppl: 2.95613| %_neg_is_pos: 0.50016| lr: 0.00025| temp: 1.99507 | loss: 1.01492| constrast_loss: 3.96015| div_loss: 0.99538| %_mask_idx: 0.36028| ppl: 2.95472| %_neg_is_pos: 0.48884| lr: 0.00025| temp: 1.99506 | loss: 1.00434| constrast_loss: 3.91783| div_loss: 0.99543| %_mask_idx: 0.3938| ppl: 2.92475| %_neg_is_pos: 0.51228| lr: 0.00025| temp: 1.99506 | loss: 0.96665| constrast_loss: 3.76707| div_loss: 0.99546| %_mask_idx: 0.36873| ppl: 2.90791| %_neg_is_pos: 0.51283| lr: 0.00025| temp: 1.99504 | loss: 0.96543| constrast_loss: 3.76215| div_loss: 0.99549| %_mask_idx: 0.42685| ppl: 2.88392| %_neg_is_pos: 0.52982| lr: 0.00025| temp: 1.99504 | loss: 0.8612| constrast_loss: 3.34523| div_loss: 0.9955| %_mask_idx: 0.42701| ppl: 2.87837| %_neg_is_pos: 0.54813| lr: 0.00025| temp: 1.99503 | loss: 0.85283| constrast_loss: 3.31178| div_loss: 0.99552| %_mask_idx: 0.38424| ppl: 2.87022| %_neg_is_pos: 0.50932| lr: 0.00025| temp: 1.99503 | loss: 0.87085| constrast_loss: 3.38384| div_loss: 0.99546| %_mask_idx: 0.41056| ppl: 2.90415| %_neg_is_pos: 0.50727| lr: 0.00025| temp: 1.99502 | loss: 0.85666| constrast_loss: 3.32707| div_loss: 0.99547| %_mask_idx: 0.38972| ppl: 2.89996| %_neg_is_pos: 0.50834| lr: 0.00025| temp: 1.99502 | loss: 0.89629| constrast_loss: 3.48561| div_loss: 0.99537| %_mask_idx: 0.38831| ppl: 2.96354| %_neg_is_pos: 0.51429| lr: 0.00025| temp: 1.99501 | loss: 0.86006| constrast_loss: 3.3407| div_loss: 0.99543| %_mask_idx: 0.35385| ppl: 2.9277| %_neg_is_pos: 0.51233| lr: 0.00025| temp: 1.99501 | loss: 0.84646| constrast_loss: 3.28631| div_loss: 0.99543| %_mask_idx: 0.35746| ppl: 2.92383| %_neg_is_pos: 0.49014| lr: 0.00025| temp: 1.99499| loss: 0.88169| constrast_loss: 3.42722| div_loss: 0.99539| %_mask_idx: 0.42873| ppl: 2.9495| %_neg_is_pos: 0.51539| lr: 0.00025| temp: 1.99499 | loss: 0.75426| constrast_loss: 2.91749| div_loss: 0.99553| %_mask_idx: 0.40335| ppl: 2.86359| %_neg_is_pos: 0.53524| lr: 0.00025| temp: 1.99498 | loss: 0.93971| constrast_loss: 3.65929| div_loss: 0.99534| %_mask_idx: 0.36654| ppl: 2.98255| %_neg_is_pos: 0.50223| lr: 0.00025| temp: 1.99498 | loss: 1.0244| constrast_loss: 3.99806| div_loss: 0.99531| %_mask_idx: 0.39098| ppl: 2.99959| %_neg_is_pos: 0.51247| lr: 0.00025| temp: 1.99496 | loss: 0.96754| constrast_loss: 3.77064| div_loss: 0.99533| %_mask_idx: 0.40476| ppl: 2.98998| %_neg_is_pos: 0.51338| lr: 0.00025| temp: 1.99496 | loss: 0.92905| constrast_loss: 3.61668| div_loss: 0.99535| %_mask_idx: 0.37672| ppl: 2.97813| %_neg_is_pos: 0.51154| lr: 0.00025| temp: 1.99495 | loss: 1.02498| constrast_loss: 4.00038| div_loss: 0.99531| %_mask_idx: 0.39192| ppl: 2.99962| %_neg_is_pos: 0.51035| lr: 0.00025| temp: 1.99495 | loss: 0.93022| constrast_loss: 3.62133| div_loss: 0.99535| %_mask_idx: 0.43578| ppl: 2.97757| %_neg_is_pos: 0.51449| lr: 0.00025| temp: 1.99494 | loss: 0.9452| constrast_loss: 3.68128| div_loss: 0.99534| %_mask_idx: 0.39442| ppl: 2.98216| %_neg_is_pos: 0.50656| lr: 0.00025| temp: 1.99494 | loss: 0.98343| constrast_loss: 3.8342| div_loss: 0.99532| %_mask_idx: 0.36466| ppl: 2.99436| %_neg_is_pos: 0.50901| lr: 0.00025| temp: 1.99493 | loss: 0.96904| constrast_loss: 3.77664| div_loss: 0.99533| %_mask_idx: 0.38252| ppl: 2.9902| %_neg_is_pos: 0.50124| lr: 0.00025| temp: 1.99493 | loss: 0.99159| constrast_loss: 3.86685| div_loss: 0.99532| %_mask_idx: 0.4375| ppl: 2.99538| %_neg_is_pos: 0.51075| lr: 0.00025| temp: 1.99491 | loss: 0.94682| constrast_loss: 3.68775| div_loss: 0.99534| %_mask_idx: 0.33067| ppl: 2.98192| %_neg_is_pos: 0.49763| lr: 0.00025| temp: 1.99491 | loss: 0.95102| constrast_loss: 3.70453| div_loss: 0.99534| %_mask_idx: 0.3667| ppl: 2.98448| %_neg_is_pos: 0.50619| lr: 0.00026| temp: 1.9949 | loss: 1.01664| constrast_loss: 3.96701| div_loss: 0.99531| %_mask_idx: 0.36263| ppl: 2.99915| %_neg_is_pos: 0.51128| lr: 0.00026| temp: 1.9949 | loss: 0.92099| constrast_loss: 3.58442| div_loss: 0.99535| %_mask_idx: 0.37719| ppl: 2.97485| %_neg_is_pos: 0.50807| lr: 0.00026| temp: 1.99489 | loss: 0.94823| constrast_loss: 3.69339| div_loss: 0.99533| %_mask_idx: 0.40946| ppl: 2.98643| %_neg_is_pos: 0.51674| lr: 0.00026| temp: 1.99489 | loss: 0.88604| constrast_loss: 3.44462| div_loss: 0.99538| %_mask_idx: 0.36153| ppl: 2.95684| %_neg_is_pos: 0.49764| lr: 0.00026| temp: 1.99488 | loss: 0.91041| constrast_loss: 3.54211| div_loss: 0.99536| %_mask_idx: 0.43546| ppl: 2.96922| %_neg_is_pos: 0.51976| lr: 0.00026| temp: 1.99488 | loss: 0.96728| constrast_loss: 3.7696| div_loss: 0.99533| %_mask_idx: 0.37547| ppl: 2.99071| %_neg_is_pos: 0.50082| lr: 0.00026| temp: 1.99486 | loss: 0.95197| constrast_loss: 3.70833| div_loss: 0.99533| %_mask_idx: 0.41244| ppl: 2.98664| %_neg_is_pos: 0.50448| lr: 0.00026| temp: 1.99486 | loss: 0.9171| constrast_loss: 3.56886| div_loss: 0.99536| %_mask_idx: 0.43296| ppl: 2.97132| %_neg_is_pos: 0.51228| lr: 0.00026| temp: 1.99485 | loss: 0.83966| constrast_loss: 3.2591| div_loss: 0.99542| %_mask_idx: 0.41322| ppl: 2.93236| %_neg_is_pos: 0.52353| lr: 0.00026| temp: 1.99485 | loss: 0.96618| constrast_loss: 3.76519| div_loss: 0.99533| %_mask_idx: 0.41322| ppl: 2.98954| %_neg_is_pos: 0.51117| lr: 0.00026| temp: 1.99484 | loss: 1.0201| constrast_loss: 3.98087| div_loss: 0.99531| %_mask_idx: 0.37077| ppl: 2.99894| %_neg_is_pos: 0.50913| lr: 0.00026| temp: 1.99484 | loss: 0.93767| constrast_loss: 3.65115| div_loss: 0.99534| %_mask_idx: 0.31454| ppl: 2.98197| %_neg_is_pos: 0.51104| lr: 0.00026| temp: 1.99483 | loss: 0.96028| constrast_loss: 3.74157| div_loss: 0.99533| %_mask_idx: 0.40836| ppl: 2.98845| %_neg_is_pos: 0.5045| lr: 0.00026| temp: 1.99483 | loss: 0.93524| constrast_loss: 3.64142| div_loss: 0.99534| %_mask_idx: 0.33537| ppl: 2.98011| %_neg_is_pos: 0.50256| lr: 0.00026| temp: 1.99481 | loss: 0.97405| constrast_loss: 3.79666| div_loss: 0.99532| %_mask_idx: 0.39928| ppl: 2.99219| %_neg_is_pos: 0.50169| lr: 0.00026| temp: 1.99481 | loss: 0.90897| constrast_loss: 3.53635| div_loss: 0.99536| %_mask_idx: 0.39505| ppl: 2.9679| %_neg_is_pos: 0.49899| lr: 0.00026| temp: 1.9948 | loss: 0.97141| constrast_loss: 3.78612| div_loss: 0.99533| %_mask_idx: 0.43092| ppl: 2.99083| %_neg_is_pos: 0.51192| lr: 0.00026| temp: 1.9948 | loss: 0.92981| constrast_loss: 3.61969| div_loss: 0.99535| %_mask_idx: 0.41322| ppl: 2.9771| %_neg_is_pos: 0.50412| lr: 0.00026| temp: 1.99478 | loss: 0.97821| constrast_loss: 3.81332| div_loss: 0.99533| %_mask_idx: 0.38612| ppl: 2.9919| %_neg_is_pos: 0.50694| lr: 0.00026| temp: 1.99478 | loss: 0.98989| constrast_loss: 3.86005| div_loss: 0.99532| %_mask_idx: 0.40461| ppl: 2.99441| %_neg_is_pos: 0.50061| lr: 0.00026| temp: 1.99477 | loss: 1.00934| constrast_loss: 3.93783| div_loss: 0.99532| %_mask_idx: 0.39536| ppl: 2.99811| %_neg_is_pos: 0.5127| lr: 0.00026| temp: 1.99477 | loss: 0.98788| constrast_loss: 3.85198| div_loss: 0.99532| %_mask_idx: 0.40492| ppl: 2.99448| %_neg_is_pos: 0.51067| lr: 0.00026| temp: 1.99476 | loss: 0.98114| constrast_loss: 3.82502| div_loss: 0.99532| %_mask_idx: 0.33803| ppl: 2.99335| %_neg_is_pos: 0.49539| lr: 0.00026| temp: 1.99476 | loss: 0.94053| constrast_loss: 3.66257| div_loss: 0.99534| %_mask_idx: 0.41745| ppl: 2.98126| %_neg_is_pos: 0.50464| lr: 0.00026| temp: 1.99475 | loss: 0.97379| constrast_loss: 3.79564| div_loss: 0.99533| %_mask_idx: 0.37265| ppl: 2.99142| %_neg_is_pos: 0.50152| lr: 0.00026| temp: 1.99475 | loss: 0.96329| constrast_loss: 3.75364| div_loss: 0.99533| %_mask_idx: 0.38549| ppl: 2.9885| %_neg_is_pos: 0.51724| lr: 0.00026| temp: 1.99473 | loss: 0.97951| constrast_loss: 3.8185| div_loss: 0.99532| %_mask_idx: 0.35652| ppl: 2.9945| %_neg_is_pos: 0.51942| lr: 0.00026| temp: 1.99473 | loss: 0.98212| constrast_loss: 3.82894| div_loss: 0.99532| %_mask_idx: 0.40539| ppl: 2.99386| %_neg_is_pos: 0.50962| lr: 0.00026| temp: 1.99472 | loss: 0.96987| constrast_loss: 3.77997| div_loss: 0.99533| %_mask_idx: 0.40617| ppl: 2.98968| %_neg_is_pos: 0.49652| lr: 0.00026| temp: 1.99472 | loss: 1.05174| constrast_loss: 4.10743| div_loss: 0.99531| %_mask_idx: 0.41917| ppl: 2.9999| %_neg_is_pos: 0.51087| lr: 0.00027| temp: 1.99471 | loss: 0.95864| constrast_loss: 3.73503| div_loss: 0.99533| %_mask_idx: 0.43468| ppl: 2.98824| %_neg_is_pos: 0.51523| lr: 0.00027| temp: 1.99471 | loss: 0.9474| constrast_loss: 3.69005| div_loss: 0.99534| %_mask_idx: 0.39928| ppl: 2.98454| %_neg_is_pos: 0.51132| lr: 0.00027| temp: 1.9947 | loss: 0.91645| constrast_loss: 3.56627| div_loss: 0.99536| %_mask_idx: 0.40367| ppl: 2.97205| %_neg_is_pos: 0.50799| lr: 0.00027| temp: 1.9947 | loss: 0.9382| constrast_loss: 3.65328| div_loss: 0.99535| %_mask_idx: 0.46021| ppl: 2.97897| %_neg_is_pos: 0.50591| lr: 0.00027| temp: 1.99468 | loss: 0.96121| constrast_loss: 3.74531| div_loss: 0.99533| %_mask_idx: 0.36654| ppl: 2.98758| %_neg_is_pos: 0.51357| lr: 0.00027| temp: 1.99468 | loss: 0.96475| constrast_loss: 3.75947| div_loss: 0.99533| %_mask_idx: 0.41275| ppl: 2.99011| %_neg_is_pos: 0.50902| lr: 0.00027| temp: 1.99467 | loss: 0.9691| constrast_loss: 3.77686| div_loss: 0.99533| %_mask_idx: 0.38659| ppl: 2.99096| %_neg_is_pos: 0.50856| lr: 0.00027| temp: 1.99467 | loss: 0.95264| constrast_loss: 3.71104| div_loss: 0.99533| %_mask_idx: 0.35871| ppl: 2.98746| %_neg_is_pos: 0.52494| lr: 0.00027| temp: 1.99466 | loss: 0.95145| constrast_loss: 3.70628| div_loss: 0.99533| %_mask_idx: 0.42231| ppl: 2.98629| %_neg_is_pos: 0.5026| lr: 0.00027| temp: 1.99466 | loss: 0.90447| constrast_loss: 3.51834| div_loss: 0.99537| %_mask_idx: 0.41103| ppl: 2.96623| %_neg_is_pos: 0.51589| lr: 0.00027| temp: 1.99465 | loss: 0.9035| constrast_loss: 3.51448| div_loss: 0.99536| %_mask_idx: 0.44408| ppl: 2.96681| %_neg_is_pos: 0.50988| lr: 0.00027| temp: 1.99465 | loss: 1.0254| constrast_loss: 4.00207| div_loss: 0.99531| %_mask_idx: 0.39944| ppl: 2.99945| %_neg_is_pos: 0.50516| lr: 0.00027| temp: 1.99463 | loss: 0.94644| constrast_loss: 3.68622| div_loss: 0.99534| %_mask_idx: 0.40821| ppl: 2.98365| %_neg_is_pos: 0.51267| lr: 0.00027| temp: 1.99463 | loss: 0.88407| constrast_loss: 3.43674| div_loss: 0.99538| %_mask_idx: 0.3833| ppl: 2.95742| %_neg_is_pos: 0.51411| lr: 0.00027| temp: 1.99462 | loss: 0.97902| constrast_loss: 3.81655| div_loss: 0.99532| %_mask_idx: 0.40633| ppl: 2.99405| %_neg_is_pos: 0.51548| lr: 0.00027| temp: 1.99462 | loss: 0.98653| constrast_loss: 3.84658| div_loss: 0.99532| %_mask_idx: 0.39145| ppl: 2.99406| %_neg_is_pos: 0.50361| lr: 0.00027| temp: 1.9946 | loss: 1.00462| constrast_loss: 3.91895| div_loss: 0.99532| %_mask_idx: 0.40398| ppl: 2.99704| %_neg_is_pos: 0.49913| lr: 0.00027| temp: 1.9946 | loss: 0.94438| constrast_loss: 3.678| div_loss: 0.99534| %_mask_idx: 0.44361| ppl: 2.98171| %_neg_is_pos: 0.51172| lr: 0.00027| temp: 1.99459 | loss: 0.98727| constrast_loss: 3.84954| div_loss: 0.99532| %_mask_idx: 0.33976| ppl: 2.99406| %_neg_is_pos: 0.50042| lr: 0.00027| temp: 1.99459 | loss: 0.9284| constrast_loss: 3.61407| div_loss: 0.99535| %_mask_idx: 0.42888| ppl: 2.97781| %_neg_is_pos: 0.51593| lr: 0.00027| temp: 1.99458 | loss: 0.92538| constrast_loss: 3.60198| div_loss: 0.99535| %_mask_idx: 0.40273| ppl: 2.97791| %_neg_is_pos: 0.5217| lr: 0.00027| temp: 1.99458 | loss: 1.00217| constrast_loss: 3.90916| div_loss: 0.99532| %_mask_idx: 0.3313| ppl: 2.99755| %_neg_is_pos: 0.50428| lr: 0.00027| temp: 1.99457 | loss: 0.93692| constrast_loss: 3.64816| div_loss: 0.99534| %_mask_idx: 0.36122| ppl: 2.97919| %_neg_is_pos: 0.49251| lr: 0.00027| temp: 1.99457 | loss: 0.96348| constrast_loss: 3.7544| div_loss: 0.99533| %_mask_idx: 0.34743| ppl: 2.98897| %_neg_is_pos: 0.48152| lr: 0.00027| temp: 1.99455 | loss: 0.98612| constrast_loss: 3.84494| div_loss: 0.99532| %_mask_idx: 0.42982| ppl: 2.99328| %_neg_is_pos: 0.51135| lr: 0.00027| temp: 1.99455 | loss: 0.96007| constrast_loss: 3.74076| div_loss: 0.99533| %_mask_idx: 0.3692| ppl: 2.98861| %_neg_is_pos: 0.51305| lr: 0.00027| temp: 1.99454 | loss: 0.97161| constrast_loss: 3.78691| div_loss: 0.99533| %_mask_idx: 0.38158| ppl: 2.99095| %_neg_is_pos: 0.50422| lr: 0.00027| temp: 1.99454 | loss: 1.05152| constrast_loss: 4.10656| div_loss: 0.99531| %_mask_idx: 0.32675| ppl: 2.99968| %_neg_is_pos: 0.52171| lr: 0.00027| temp: 1.99453 | loss: 0.97713| constrast_loss: 3.80899| div_loss: 0.99532| %_mask_idx: 0.37359| ppl: 2.99249| %_neg_is_pos: 0.50368| lr: 0.00027| temp: 1.99453 | loss: 0.97162| constrast_loss: 3.78694| div_loss: 0.99533| %_mask_idx: 0.40993| ppl: 2.99016| %_neg_is_pos: 0.5045| lr: 0.00027| temp: 1.99452 | loss: 0.95146| constrast_loss: 3.7063| div_loss: 0.99534| %_mask_idx: 0.4187| ppl: 2.98471| %_neg_is_pos: 0.50392| lr: 0.00027| temp: 1.99452 | loss: 0.9931| constrast_loss: 3.87287| div_loss: 0.99532| %_mask_idx: 0.36576| ppl: 2.99597| %_neg_is_pos: 0.51564| lr: 0.00028| temp: 1.9945 | loss: 0.95016| constrast_loss: 3.70112| div_loss: 0.99534| %_mask_idx: 0.42011| ppl: 2.9854| %_neg_is_pos: 0.5225| lr: 0.00028| temp: 1.9945 | loss: 0.9456| constrast_loss: 3.68287| div_loss: 0.99534| %_mask_idx: 0.36137| ppl: 2.98444| %_neg_is_pos: 0.51108| lr: 0.00028| temp: 1.99449 | loss: 0.974| constrast_loss: 3.79647| div_loss: 0.99533| %_mask_idx: 0.38628| ppl: 2.9917| %_neg_is_pos: 0.51231| lr: 0.00028| temp: 1.99449 | loss: 0.99137| constrast_loss: 3.86593| div_loss: 0.99532| %_mask_idx: 0.40648| ppl: 2.99443| %_neg_is_pos: 0.49641| lr: 0.00028| temp: 1.99448 | loss: 1.0105| constrast_loss: 3.94249| div_loss: 0.99531| %_mask_idx: 0.39724| ppl: 2.99856| %_neg_is_pos: 0.50349| lr: 0.00028| temp: 1.99448 | loss: 1.01888| constrast_loss: 3.976| div_loss: 0.99531| %_mask_idx: 0.35949| ppl: 2.99923| %_neg_is_pos: 0.50639| lr: 0.00028| temp: 1.99447 | loss: 1.01171| constrast_loss: 3.94732| div_loss: 0.99532| %_mask_idx: 0.41886| ppl: 2.99804| %_neg_is_pos: 0.5039| lr: 0.00028| temp: 1.99447 | loss: 0.96296| constrast_loss: 3.7523| div_loss: 0.99533| %_mask_idx: 0.37359| ppl: 2.9891| %_neg_is_pos: 0.50751| lr: 0.00028| temp: 1.99445 | loss: 1.00752| constrast_loss: 3.93056| div_loss: 0.99532| %_mask_idx: 0.37751| ppl: 2.99741| %_neg_is_pos: 0.50888| lr: 0.00028| temp: 1.99445 | loss: 1.0296| constrast_loss: 4.01886| div_loss: 0.99531| %_mask_idx: 0.37516| ppl: 2.99995| %_neg_is_pos: 0.51078| lr: 0.00028| temp: 1.99444 | loss: 0.938| constrast_loss: 3.65247| div_loss: 0.99534| %_mask_idx: 0.40006| ppl: 2.9819| %_neg_is_pos: 0.51998| lr: 0.00028| temp: 1.99444 [2021-09-01 17:05:44,982] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 65536.0, reducing to 32768.0 [2021-09-01 17:05:44,982] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 65536.0, reducing to 32768.0 | loss: 0.96697| constrast_loss: 3.76836| div_loss: 0.99533| %_mask_idx: 0.35824| ppl: 2.98986| %_neg_is_pos: 0.50659| lr: 0.00028| temp: 1.99442 | loss: 0.97078| constrast_loss: 3.78358| div_loss: 0.99533| %_mask_idx: 0.41949| ppl: 2.99052| %_neg_is_pos: 0.50234| lr: 0.00028| temp: 1.99442 | loss: 1.03292| constrast_loss: 4.03213| div_loss: 0.99531| %_mask_idx: 0.42638| ppl: 2.99985| %_neg_is_pos: 0.51743| lr: 0.00028| temp: 1.99441 | loss: 1.01886| constrast_loss: 3.97591| div_loss: 0.99531| %_mask_idx: 0.38706| ppl: 2.99861| %_neg_is_pos: 0.51037| lr: 0.00028| temp: 1.99441 | loss: 1.14318| constrast_loss: 4.47318| div_loss: 0.99537| %_mask_idx: 0.38581| ppl: 2.96244| %_neg_is_pos: 0.53445| lr: 0.00028| temp: 1.9944 | loss: 1.12866| constrast_loss: 4.41512| div_loss: 0.99536| %_mask_idx: 0.34602| ppl: 2.97109| %_neg_is_pos: 0.53173| lr: 0.00028| temp: 1.9944 | loss: 1.00553| constrast_loss: 3.92255| div_loss: 0.9956| %_mask_idx: 0.42763| ppl: 2.81533| %_neg_is_pos: 0.66213| lr: 0.00028| temp: 1.99439 | loss: 0.99007| constrast_loss: 3.86072| div_loss: 0.99555| %_mask_idx: 0.38628| ppl: 2.84567| %_neg_is_pos: 0.6659| lr: 0.00028| temp: 1.99439 | loss: 0.11112| constrast_loss: 0.34479| div_loss: 0.99677| %_mask_idx: 0.44063| ppl: 2.0656| %_neg_is_pos: 0.9799| lr: 0.00028| temp: 1.99437 | loss: 0.18209| constrast_loss: 0.62867| div_loss: 0.99674| %_mask_idx: 0.34649| ppl: 2.08878| %_neg_is_pos: 0.95942| lr: 0.00028| temp: 1.99437 | loss: 1.13191| constrast_loss: 4.42807| div_loss: 0.99553| %_mask_idx: 0.40492| ppl: 2.86215| %_neg_is_pos: 0.51164| lr: 0.00028| temp: 1.99436 | loss: 1.104| constrast_loss: 4.31644| div_loss: 0.99543| %_mask_idx: 0.4364| ppl: 2.92278| %_neg_is_pos: 0.52502| lr: 0.00028| temp: 1.99436 | loss: 1.10653| constrast_loss: 4.32658| div_loss: 0.9956| %_mask_idx: 0.38346| ppl: 2.81611| %_neg_is_pos: 0.55329| lr: 0.00028| temp: 1.99435 | loss: 1.13216| constrast_loss: 4.42908| div_loss: 0.99577| %_mask_idx: 0.34868| ppl: 2.70506| %_neg_is_pos: 0.53162| lr: 0.00028| temp: 1.99435 | loss: 0.32775| constrast_loss: 1.21133| div_loss: 0.99672| %_mask_idx: 0.4317| ppl: 2.101| %_neg_is_pos: 0.94125| lr: 0.00028| temp: 1.99434 | loss: 0.31882| constrast_loss: 1.1756| div_loss: 0.99673| %_mask_idx: 0.30247| ppl: 2.09388| %_neg_is_pos: 0.8885| lr: 0.00028| temp: 1.99434 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38753| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00028| temp: 1.99432| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34023| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00028| temp: 1.99432 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.31125| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00028| temp: 1.99431 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36388| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00028| temp: 1.99431 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44486| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.9943 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40241| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.9943 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39521| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99429 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3656| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99429 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41024| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99427 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37437| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99427 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39458| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99426 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41165| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99426 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41557| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99424 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3927| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99424 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43515| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99423 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3537| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99423 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40774| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99422 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38565| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99422 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44862| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99421 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38675| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99421 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38784| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99419 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40116| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99419 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38017| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99418 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34712| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99418 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3761| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99417 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42607| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99417 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37735| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99416 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43123| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99416 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35949| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99414 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3761| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99414 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40648| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99413 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36357| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99413 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40085| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99412 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38628| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00029| temp: 1.99412 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41949| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99411 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37328| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99411 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40429| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99409 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41275| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99409 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41855| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99408 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34179| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99408 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43969| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99406 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37093| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99406 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37171| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99405 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39176| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99405 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38236| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99404 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36388| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99404 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38675| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99403 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99403 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40241| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99401 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39051| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99401 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3761| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.994 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43452| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.994 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38095| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99399 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.414| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99399 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4339| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99398 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40602| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99398 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3667| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99396 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40758| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99396 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38283| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99395 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39317| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99395 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34117| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99394 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35949| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99394 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36278| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99393 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39317| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99393 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37657| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99391 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39317| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0003| temp: 1.99391 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38142| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.9939 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36231| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.9939 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42262| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99388 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32848| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99388 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43076| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99387 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39286| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99387 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.375| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99386 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38095| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99386 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39004| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99385 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36513| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99385 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41792| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99383 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40461| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99383 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39568| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99382 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41745| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99382 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42716| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99381 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36654| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99381 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3739| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.9938 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34398| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.9938 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41416| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99378 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38581| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99378 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40163| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99377 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38534| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99377 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38722| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99376 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39035| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99376 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41212| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99375 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38205| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99375 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36701| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99373 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39803| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99373 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42074| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99372 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35667| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00031| temp: 1.99372 [2021-09-01 17:14:55,560] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0 [2021-09-01 17:14:55,560] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43468| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.9937 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42481| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.9937 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39364| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99369 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37986| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99369 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40367| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99368 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37218| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99368 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40602| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99367 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35777| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99367 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36325| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99365| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38299| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99365 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34696| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99364 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35088| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99364 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36529| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99363 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37046| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99363 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41588| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99362 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.4032| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99362 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41996| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.9936 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39803| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.9936 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38236| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99359 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36764| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99359 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40555| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99358 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39599| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99358 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39427| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99357 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40586| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99357 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40288| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99355| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37657| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99355 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40664| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99354 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38816| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99354 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.32879| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99352 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38127| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00032| temp: 1.99352 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41228| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99351 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36748| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99351 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3703| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.9935 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3692| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.9935 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3974| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99349 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39756| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99349 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.46209| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99347 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35385| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99347 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38158| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99346 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36873| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99346 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39207| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99345 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39051| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99345 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34696| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99344 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36983| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99344 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33521| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99342 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40398| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99342 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36388| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99341 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41557| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99341 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36905| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.9934 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37735| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.9934 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40664| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99339 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34211| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99339 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45536| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99337 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3573| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99337 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4234| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99336 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38205| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99336 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44048| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99334 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42873| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99334 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41009| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99333 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38863| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99333 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33647| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99332 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41212| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99332 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37328| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99331 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41165| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00033| temp: 1.99331 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3869| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99329 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37547| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99329 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42246| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99328 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39474| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99328 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33929| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99327 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42199| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99327 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40492| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99326 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39881| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99326 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41463| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99324 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39865| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99324 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44862| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99323 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44612| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99323 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42622| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99322 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35589| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99322 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39145| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99321 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43139| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99321 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37484| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99319 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36623| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99319 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39599| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99318 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40805| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99318 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38831| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99316 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42309| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99316 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35605| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99315 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39756| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99315 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38534| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99314 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38988| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99314 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43327| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99313 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4162| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00034| temp: 1.99313 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.31156| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99311 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41322| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99311 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41416| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.9931 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.46397| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.9931 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38033| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99309 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42105| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99309 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38424| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99308 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4021| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99308 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42043| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99306 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3916| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99306 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40633| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99305 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44392| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99305 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41917| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99304 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37265| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99304 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37265| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99303 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34618| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99303 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43781| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99301 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40414| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99301 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41447| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.993 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36341| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.993 [2021-09-01 17:24:08,126] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0 [2021-09-01 17:24:08,126] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42372| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99298 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39912| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99298 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37281| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99297 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43155| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99297 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39803| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99296 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35479| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99296 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3844| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99295 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39615| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99295 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42591| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99293 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36858| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99293 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38142| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99292 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3407| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99292 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4328| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99291 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36905| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00035| temp: 1.99291 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39333| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.9929 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45081| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.9929 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3891| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99288| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40382| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99288 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35229| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99287 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.388| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99287 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33662| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99286 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37093| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99286 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34367| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99285 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40946| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99285 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43484| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99283| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40821| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99283 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38878| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99282 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39145| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99282 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36873| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.9928 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36075| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.9928 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37375| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99279 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41667| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99279 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35182| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99278 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34868| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99278 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3974| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99277 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35432| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99277 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34633| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99275 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42215| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99275 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39113| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99274 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39207| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99274 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37907| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99273 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40273| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99273 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41557| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99272 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39474| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00036| temp: 1.99272 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34492| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.9927 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38064| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.9927 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3255| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99269 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39082| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99269 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.388| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99268 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3985| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99268 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36106| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99267 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43515| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99267 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37171| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99265 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38753| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99265 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36231| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99264 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36795| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99264 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38816| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99262 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35417| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99262 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37046| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99261 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39364| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99261 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41024| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.9926 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38769| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.9926 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41134| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99259 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40789| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99259 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36043| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99257 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38753| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99257 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43108| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99256 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34712| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99256 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38456| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99255 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39145| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99255 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40241| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99254 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44784| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99254 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39254| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99252 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3739| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00037| temp: 1.99252 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33835| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99251 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42309| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99251 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38878| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.9925 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41557| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.9925 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42231| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99249 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44471| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99249 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38549| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99247 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39818| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99247 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41651| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99246 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38221| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99246 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34477| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99244 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.47964| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99244 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42309| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99243 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41275| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99243 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42732| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99242 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36466| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99242 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37704| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99241 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3479| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99241 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36889| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99239 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39239| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99239 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35683| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99238 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35213| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99238 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40821| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99237 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43249| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99237 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3974| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99236 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37343| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99236 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35699| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99234 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34445| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99234 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33553| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99233 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39975| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99233 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34978| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99232 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39928| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99232 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33709| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99231 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37046| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00038| temp: 1.99231 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44142| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99229 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39865| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99229 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38409| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99228 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3808| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99228 [2021-09-01 17:33:19,870] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0 [2021-09-01 17:33:19,870] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45238| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99226 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39583| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99226 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3844| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99225 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39427| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99225 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39912| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99225 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43703| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99225 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40899| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99224 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99224 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37014| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99222 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36607| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99222 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35103| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99221 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39724| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99221 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38596| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.9922 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36122| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.9922 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43656| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99219 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44831| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99219 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39129| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99217 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41526| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99217 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42826| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99216 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39004| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99216 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37876| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99215 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36576| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99215 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37657| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99214 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3703| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00039| temp: 1.99214 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3432| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99212| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42293| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99212 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37845| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99211 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42951| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99211 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42873| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99209 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40398| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99209 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39583| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99208 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40821| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99208 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37751| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99207 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39395| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99207 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4328| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99206 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34211| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99206 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.401| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99204 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42732| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99204 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37218| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99203 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41009| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99203 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42325| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99202 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38064| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99202 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4328| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99201 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37124| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99201 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33897| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99199 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39348| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99199 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34806| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99198 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32675| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99198 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38283| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99197 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37093| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99197 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38534| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99196 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38706| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99196 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42654| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99194 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38816| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99194 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43139| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99193 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39427| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0004| temp: 1.99193 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39458| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99191 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32848| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99191 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32973| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.9919 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3407| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.9919 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.46476| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99189 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43938| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99189 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37813| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99188 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44737| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99188 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41651| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99186 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39223| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99186 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36983| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99185 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34602| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99185 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42982| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99184 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.401| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99184 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32456| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99183 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37171| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99183 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.401| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99181 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39646| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99181 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41275| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.9918 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35072| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.9918 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41447| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99179 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4339| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99179 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41056| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99178 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40085| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99178 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39756| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99176 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37328| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99176 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42121| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99175 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39348| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99175 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42246| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99173 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40805| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99173 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36278| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99172 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.4032| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00041| temp: 1.99172 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40179| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99171 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40006| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99171 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3786| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.9917 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37249| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.9917 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37578| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99168 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40962| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99168 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34868| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99167 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41432| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99167 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35244| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99166 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39411| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99166 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33365| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99165 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39442| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99165 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36999| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99163 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40006| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99163 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38706| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99162 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40382| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99162 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38659| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99161 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.4032| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99161 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42387| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.9916 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.31751| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.9916 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4068| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99158 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41667| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99158 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34305| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99157 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40742| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99157 [2021-09-01 17:42:32,039] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 4096.0, reducing to 2048.0 [2021-09-01 17:42:32,039] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 4096.0, reducing to 2048.0 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45348| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99155 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38346| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99155 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41682| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99154 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37954| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99154 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43922| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99153 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34665| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00042| temp: 1.99153 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37453| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99152 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4588| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99152 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44283| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.9915| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37547| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.9915 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34383| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99149 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40006| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99149 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4093| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99148 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36591| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99148 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39176| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99147 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39787| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99147 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42622| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99145 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39223| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99145 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37281| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99144 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36779| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99144 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44267| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99143 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3927| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99143 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32989| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99142 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39787| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99142 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39254| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.9914 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40602| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.9914 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35229| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99139 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40868| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99139 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38456| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99137 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44815| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99137 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34226| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99136 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38518| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99136 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36936| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99135 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38377| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99135 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39019| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99134 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45316| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99134 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33083| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99132 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37813| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00043| temp: 1.99132 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3667| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99131 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38863| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99131 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39521| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.9913 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35636| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.9913 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33976| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99129 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36153| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99129 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39489| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99127 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40993| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99127 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34367| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99126 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38252| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99126 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3974| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99125 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38424| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99125 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.401| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99124 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36435| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99124 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38565| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99122 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3573| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99122 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38393| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99121 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37672| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99121 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3869| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99119 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35542| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99119 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38236| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99118 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38988| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99118 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36826| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99117 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38346| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99117 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3891| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99116 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39693| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99116 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3739| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99114 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99114 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35056| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99113 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35495| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00044| temp: 1.99113 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41917| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99112 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41228| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99112 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4057| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99111 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36842| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99111 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43452| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99109 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38487| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99109 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39145| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99108 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38941| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99108 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3573| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99107 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3761| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99107 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44502| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99106 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40962| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99106 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38111| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99104 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42434| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99104 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35182| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99103 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39317| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99103 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40742| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99101 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33553| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99101 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39301| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.991 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4187| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.991 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40273| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99099 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44079| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99099 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41635| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99098 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34524| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99098 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3479| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99096 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38957| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99096 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36325| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99095 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37954| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99095 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42607| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99094 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34336| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99094 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36357| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99093 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39944| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00045| temp: 1.99093 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38362| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99091 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39724| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99091 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3916| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.9909 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37437| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.9909 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36999| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99089 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34273| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99089 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39035| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99088 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35448| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99088 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38534| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99086 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35699| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99086 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3938| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99085 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39066| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99085 [2021-09-01 17:51:43,064] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 2048.0, reducing to 1024.0 [2021-09-01 17:51:43,065] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 2048.0, reducing to 1024.0 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37014| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99083 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42074| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99083 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33537| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99082 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37625| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99082 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34884| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99081 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40116| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99081 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41259| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.9908 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41964| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.9908 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36435| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99078 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41322| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99078 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38972| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99077 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40069| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99077 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34117| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99076 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33459| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99076 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38706| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99075 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35981| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99075 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38769| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99073 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39035| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99073 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3938| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99072 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35605| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00046| temp: 1.99072 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32221| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99071 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38236| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99071 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38612| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.9907 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43468| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.9907 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37766| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99068 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3584| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99068 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37469| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99067 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40993| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99067 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36529| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99065 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4021| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99065 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39176| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99064 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39897| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99064 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34947| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99063 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33192| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99063 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37735| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99062 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41886| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99062 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3963| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.9906 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3515| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.9906 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40946| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99059 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36137| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99059 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40821| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99058 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34383| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99058 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38252| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99057 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41526| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99057 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42779| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99055 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35432| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99055 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38283| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99054 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33208| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99054 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36169| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99053 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39364| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00047| temp: 1.99053 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40288| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99052 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4093| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99052 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40194| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.9905 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40821| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.9905 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39646| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99049 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37453| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99049 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36873| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99047 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36795| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99047 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34148| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99046 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35401| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99046 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39646| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99045 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41808| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99045 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39395| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99044 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39959| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99044 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37547| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99042 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38628| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99042 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38596| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99041 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4245| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99041 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44674| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.9904 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3963| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.9904 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36999| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99039 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35009| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99039 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38612| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99037 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40633| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99037 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39098| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99036 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40836| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99036 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38424| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99035 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41635| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99035 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36075| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99034 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3786| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99034 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36122| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99032 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37688| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00048| temp: 1.99032 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42325| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99031 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36404| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99031 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40241| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99029 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35981| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99029 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39662| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99028 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34994| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99028 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36075| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99027 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37939| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99027 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39082| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99026 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39333| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99026 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39771| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99024 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40758| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99024 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3656| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99023 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38268| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99023 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39113| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99022 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41823| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99022 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41964| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99021 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39254| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99021 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34931| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99019 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40414| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99019 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36701| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99018 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39662| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99018 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3963| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99017 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40993| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99017 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36905| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99016 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41087| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99016 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42246| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99014 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41729| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99014 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42262| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99013 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.401| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.00049| temp: 1.99013 [2021-09-01 18:00:52,897] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1024.0, reducing to 512.0 [2021-09-01 18:00:52,897] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1024.0, reducing to 512.0 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38706| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99011 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34947| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99011 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4068| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9901 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38299| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9901 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36153| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99009 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3916| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99009 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37516| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99008 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38643| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99008 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34978| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99006 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37829| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99006 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37406| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99005 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34007| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99005 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37625| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99004 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3067| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99004 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44001| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99003 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4068| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99003 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40899| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99002| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35182| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99002 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43421| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99001 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41197| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99001 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35307| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41463| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.99 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.31172| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98999 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42638| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98999 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34007| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98997| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32785| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98997 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40476| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98996 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36341| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98996 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38706| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98994 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98994 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42481| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98993 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35103| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98993 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40789| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98992 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39505| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98992 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35777| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98991 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42387| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98991 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3761| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98989 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39364| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98989 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33725| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98988 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38409| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98988 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3833| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98987 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41416| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98987 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41275| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98986 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36889| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98986 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43437| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98984 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38346| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98984 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.362| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98983 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36043| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98983 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36059| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98982 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.29731| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98982 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38784| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98981 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39818| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98981 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38753| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98979 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42215| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98979 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42591| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98978 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36764| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98978 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43562| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98976 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36795| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98976 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35667| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98975 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32832| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98975 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44016| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98974 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42325| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98974 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39348| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98973 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38675| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98973 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41729| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98971 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34117| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98971 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38659| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9897 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39082| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9897 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35667| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98969 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36685| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98969 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38189| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98968 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3844| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98968 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37563| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98966 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37484| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98966 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38409| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98965 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39975| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98965 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32628| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98964 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39395| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98964 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37343| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98963 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38753| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98963 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37704| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98961 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37813| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98961 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40414| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9896 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41667| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9896 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35949| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98958 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39348| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98958 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36764| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98957 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42372| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98957 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41087| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98956 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35025| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98956 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38471| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98955 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41573| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98955 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41839| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98953 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39724| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98953 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40053| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98952 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43155| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98952 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37437| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98951 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3573| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98951 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38283| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9895 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37672| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9895 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37328| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98948 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42027| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98948 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3786| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98947 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38581| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98947 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41228| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98946 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40367| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98946 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38941| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98945 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40915| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98945 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38596| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98943 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36826| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98943 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36184| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98942 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36278| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98942 [2021-09-01 18:10:05,502] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 512.0, reducing to 256.0 [2021-09-01 18:10:05,502] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 512.0, reducing to 256.0 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42011| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9894 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39082| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9894 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43781| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98939 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35887| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98939 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3786| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98938 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38628| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98938 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3938| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98937 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3609| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98937 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32472| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98935 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38346| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98935 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3631| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98934 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.4032| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98934 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41118| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98933 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40523| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98933 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36513| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98932 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42701| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98932 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33443| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9893 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.31845| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9893 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37657| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98929 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37061| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98929 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37046| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98928 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.375| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98928 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42513| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98927 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43891| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98927 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37766| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98925 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44032| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98925 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38315| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98924 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40335| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98924 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38064| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98922 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40946| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98922 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39724| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98921 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37108| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98921 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35025| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9892 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41071| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9892 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42215| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98919 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40461| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98919 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43108| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98917 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36466| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98917 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33286| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98916 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.362| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98916 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4375| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98915 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39991| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98915 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41682| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98914 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38863| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98914 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42654| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98912 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38581| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98912 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34837| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98911 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37265| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98911 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40915| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9891 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38863| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9891 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41385| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98909 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35307| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98909 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37641| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98907 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38001| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98907 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38722| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98906 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38315| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98906 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35432| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98904 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.375| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98904 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39004| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98903 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42826| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98903 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36497| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98902 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39677| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98902 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38471| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98901 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34602| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98901 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39583| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98899 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38769| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98899 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36576| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98898 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36404| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98898 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.32957| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98897 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39254| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98897 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39881| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98896 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39113| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98896 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41056| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98894 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39568| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98894 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38675| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98893 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.375| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98893 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38565| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98892 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35448| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98892 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98891 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43045| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98891 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38503| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98889 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38675| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98889 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39395| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98888 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35229| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98888 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33459| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98886 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37735| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98886 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36325| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98885 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41071| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98885 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40116| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98884 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41479| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98884 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40836| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98883 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33349| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98883 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41165| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98881 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41588| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98881 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4328| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9888 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43029| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9888 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39458| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98879 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42794| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98879 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35135| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98878 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41682| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98878 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38878| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98876 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40617| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98876 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42215| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98875 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42387| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98875 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4068| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98874 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38816| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98874 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36043| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98873 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38471| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98873 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41698| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98871 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36341| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98871 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.336| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9887 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40789| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9887 [2021-09-01 18:19:15,348] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 256.0, reducing to 128.0 [2021-09-01 18:19:15,348] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 256.0, reducing to 128.0 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3786| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98868 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38894| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98868 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38581| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98867 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37014| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98867 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36482| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98866 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37484| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98866 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40288| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98865 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4245| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98865 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37108| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98863 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40586| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98863 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36701| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98862 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41432| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98862 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3692| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98861 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43139| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98861 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9886 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36748| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9886 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38941| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98858 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4317| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98858 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40241| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98857 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3739| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98857 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38878| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98856 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35871| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98856 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38377| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98855 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40085| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98855 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39411| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98853 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34211| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98853 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43719| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98852 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41447| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98852 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39004| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9885 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42716| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9885 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37437| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98849 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36811| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98849 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37108| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98848 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41573| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98848 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39192| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98847 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38252| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98847 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38424| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98845 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37265| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98845 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40194| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98844 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39145| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98844 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39348| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98843 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40398| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98843 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40836| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98842 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35746| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98842 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38283| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9884 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43108| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9884 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39505| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98839 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37625| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98839 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41447| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98838 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3089| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98838 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37704| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98837 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39881| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98837 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36341| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98835 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34054| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98835 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41526| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98834 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37798| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98834 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40226| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98832 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44768| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98832 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34853| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98831 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43217| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98831 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40648| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9883 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37406| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9883 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.375| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98829 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35041| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98829 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3161| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98827 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38268| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98827 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41573| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98826 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40774| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98826 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41761| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98825 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36967| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98825 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35918| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98824 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3692| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98824 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40539| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98822 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.46272| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98822 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37375| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98821 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32425| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98821 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39803| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9882 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34665| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9882 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44048| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98819 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38549| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98819 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40163| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98818 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35213| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98818 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34602| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98817 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4068| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98817 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37328| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98815 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3869| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98815 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39364| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98814 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35276| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98814 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37672| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98813 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41463| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98813 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37422| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98812 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36826| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98812 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39333| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9881 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34101| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9881 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37234| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98809 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39348| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98809 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33427| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98808 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38189| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98808 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37923| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98807 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38299| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98807 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39427| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98805 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37234| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98805 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41557| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98804 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3244| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98804 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40962| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98803 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42199| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98803 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42669| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98802 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39207| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98802 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44596| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.988 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39568| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.988 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37782| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98799 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37296| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98799 [2021-09-01 18:28:27,179] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 128.0, reducing to 64.0 [2021-09-01 18:28:27,179] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 128.0, reducing to 64.0 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40476| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98797 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37124| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98797 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.29981| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98796 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33349| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98796 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3891| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98795 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42873| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98795 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37296| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98794 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39834| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98794 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36873| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98792| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37625| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98792 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40868| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98791 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37547| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98791 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33756| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9879 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3786| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9879 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40883| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98789 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36842| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98789 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4021| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98787 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40147| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98787 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3609| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98786 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37594| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98786 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39411| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98785 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35432| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98785 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42607| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98784 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41369| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98784 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34461| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98782 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35808| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98782 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36012| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98781 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37782| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98781 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40805| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98779 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98779 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39239| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98778 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40946| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98778 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4151| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98777 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39239| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98777 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37124| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98776 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40539| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98776 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39333| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98774 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4021| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98774 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40335| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98773 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41103| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98773 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40273| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98772 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36795| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98772 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3938| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98771 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42873| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98771 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39364| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98769 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3963| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98769 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39991| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98768 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38189| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98768 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41745| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98767 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44157| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98767 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33427| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98766 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35072| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98766 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34007| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98764 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43217| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98764 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33662| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98763 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42434| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98763 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40711| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98761 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39834| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98761 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42356| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9876 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39818| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9876 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40805| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98759 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43092| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98759 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38784| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98758 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38174| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98758 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36967| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98756 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.375| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98756 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40273| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98755 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39505| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98755 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42325| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98754 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38784| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98754 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40633| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98753 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34477| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98753 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38894| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98751 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39568| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98751 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37657| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9875 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4281| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9875 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40116| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98749 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38596| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98749 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37641| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98748 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36623| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98748 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38769| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98746 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4411| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98746 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36357| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98745 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44048| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98745 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39897| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98743 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38142| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98743 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38221| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98742 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38487| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98742 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39897| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98741 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40789| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98741 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41588| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9874 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43578| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9874 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37892| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98738 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38847| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98738 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.31924| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98737 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4198| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98737 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37954| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98736 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4093| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98736 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41259| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98735 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41808| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98735 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32989| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98733 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4411| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98733 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3844| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98732 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41244| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98732 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36059| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98731 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41494| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98731 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34759| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9873 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39129| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9873 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40006| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98728 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40006| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98728 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37719| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98727 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42701| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98727 [2021-09-01 18:37:38,797] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 64.0, reducing to 32.0 [2021-09-01 18:37:38,797] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 64.0, reducing to 32.0 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43672| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98725 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3537| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98725 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44408| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98724 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32315| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98724 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36231| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98723 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37281| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98723 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98722 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44016| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98722 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34853| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9872| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42168| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9872 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41353| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98719 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39756| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98719 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39223| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98718 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39301| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98718 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39865| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98717 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36388| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98717 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44048| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98715 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41792| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98715 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40774| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98714 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35244| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98714 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39411| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98713 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40883| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98713 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35354| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98712 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39082| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98712 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39834| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9871 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40993| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9871 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37578| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98709 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37563| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98709 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4187| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98707 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40852| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98707 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41244| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98706 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4115| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98706 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34555| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98705 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36842| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98705 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37923| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98704 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39959| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98704 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38064| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98702 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39301| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98702 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39693| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98701 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40742| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98701 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39066| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.987 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3739| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.987 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40946| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98699 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36811| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98699 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3761| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98697 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39427| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98697 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36889| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98696 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35479| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98696 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36435| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98695 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41385| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98695 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35526| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98694 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41557| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98694 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40006| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98692 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41463| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98692 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41729| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98691 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38299| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98691 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38189| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98689 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40727| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98689 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37766| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98688 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36999| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98688 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33004| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98687 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44189| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98687 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38784| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98686 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35902| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98686 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38518| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98684 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39301| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98684 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36544| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98683 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37516| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98683 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38393| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98682 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36466| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98682 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41134| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98681 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35558| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98681 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36028| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98679 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37547| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98679 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33208| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98678 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38581| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98678 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37751| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98677 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39082| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98677 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41635| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98676 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39239| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98676 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37657| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98674 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42262| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98674 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42654| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98673 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42293| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98673 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38784| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98671 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34743| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98671 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36748| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9867 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38095| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9867 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39301| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98669 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38769| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98669 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33929| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98668 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4469| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98668 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40038| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98666 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36685| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98666 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41165| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98665 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40852| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98665 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39348| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98664 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37202| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98664 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36591| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98663 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38487| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98663 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36591| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98661 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37328| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98661 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40962| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98661 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40805| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98661 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40273| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9866 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36701| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9866 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41479| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98659 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37469| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98659 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39223| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98657 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40868| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98657 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39082| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98656 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41886| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98656 [2021-09-01 18:46:49,648] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 32.0, reducing to 16.0 [2021-09-01 18:46:49,648] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 32.0, reducing to 16.0 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38299| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98654 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39489| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98654 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36278| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98653 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41165| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98653 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41181| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98652 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39897| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98652 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37892| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98651 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37343| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98651 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32832| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98649 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42121| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98649 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3927| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98648 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41729| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98648 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39662| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98647 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45254| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98647 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38471| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98646 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38894| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98646 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37453| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98644| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38675| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98644 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39286| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98643 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41463| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98643 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4068| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98642 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4187| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98642 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41526| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98641 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37202| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98641 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.401| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98639| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35354| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98639 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39787| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98638 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36607| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98638 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38471| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98636 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40132| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98636 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35605| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98635 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41463| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98635 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36404| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98634 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42184| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98634 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40116| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98633 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39818| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98633 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40335| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98631 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40367| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98631 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40742| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9863 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40617| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9863 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40648| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98629 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33866| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98629 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39771| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98628 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40523| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98628 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39035| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98626 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38456| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98626 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39239| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98625 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35824| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98625 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40586| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98624 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45536| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98624 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38972| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98623 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39568| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98623 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44549| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98621 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41071| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98621 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39019| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9862 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40038| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9862 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39223| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98618 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36529| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98618 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37124| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98617 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43405| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98617 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38205| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98616 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4057| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98616 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40288| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98615 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41792| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98615 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39756| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98613 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36732| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98613 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36184| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98612 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33741| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98612 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35902| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98611 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38315| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98611 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37453| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9861 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38174| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9861 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36497| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98608 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38894| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98608 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41416| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98607 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36936| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98607 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41917| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98606 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4115| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98606 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33897| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98605 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98605 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36137| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98603 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.401| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98603 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42794| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98602 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37719| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98602 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38048| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.986 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41134| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.986 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.375| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98599 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44533| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98599 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42419| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98598 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42011| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98598 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40539| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98597 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41761| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98597 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36795| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98595 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43922| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98595 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36497| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98594 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4021| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98594 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34101| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98593 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42121| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98593 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37061| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98592 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41322| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98592 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34602| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9859 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33145| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9859 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42387| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98589 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39803| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98589 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38549| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98588 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42309| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98588 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38487| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98587 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37594| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98587 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40398| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98585 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4411| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98585 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39207| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98584 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42857| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98584 [2021-09-01 18:56:02,222] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 16.0, reducing to 8.0 [2021-09-01 18:56:02,222] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 16.0, reducing to 8.0 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39035| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98582 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37625| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98582 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37672| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98581 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40492| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98581 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36764| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9858 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43045| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9858 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44753| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98579 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45238| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98579 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38268| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98577 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35354| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98577 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42967| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98576 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40116| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98576 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39552| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98575 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35417| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98575 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40351| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98574 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40335| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98574 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43703| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98572 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39536| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98572 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36873| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98571 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35229| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98571 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39098| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9857 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38722| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9857 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41479| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98569 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38581| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98569 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40555| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98567| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42622| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98567 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36764| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98566 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37155| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98566 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39223| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98564 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33145| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98564 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36576| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98563 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36529| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98563 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39458| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98562 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3609| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98562 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35307| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98561 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37751| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98561 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37375| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98559 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37625| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98559 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43343| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98558 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40163| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98558 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41087| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98557 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41792| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98557 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42513| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98556 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37954| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98556 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35276| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98554 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36419| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98554 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40351| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98553 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41369| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98553 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32018| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98552 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41886| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98552 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42011| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98551 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41228| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98551 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41635| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98549 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41902| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98549 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42121| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98548 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39474| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98548 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41526| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98546 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39145| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98546 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4115| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98545 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39286| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98545 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36028| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98544 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39004| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98544 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40382| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98543 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35996| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98543 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38252| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98541 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98541 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.31328| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9854 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38643| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9854 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40116| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98539 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40742| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98539 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40132| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98538 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38972| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98538 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38158| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98536 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33208| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98536 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4068| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98535 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39865| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98535 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34727| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98534 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35432| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98534 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4281| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98533 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38346| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98533 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38894| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98531 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35464| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98531 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38518| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9853 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38925| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9853 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36858| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98528 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35949| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98528 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36873| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98527 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32378| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98527 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39834| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98526 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33553| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98526 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34211| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98525 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37171| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98525 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38612| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98523 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39395| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98523 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33255| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98522 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43405| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98522 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36685| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98521 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36216| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98521 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34539| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9852 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42137| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9852 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37563| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98519 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39035| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98519 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41651| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98518 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3916| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98518 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36137| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98517 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39129| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98517 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38612| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98516 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39568| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98516 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41698| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98514 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39458| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98514 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41181| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98513 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39787| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98513 [2021-09-01 19:05:15,020] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 8.0, reducing to 4.0 [2021-09-01 19:05:15,020] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 8.0, reducing to 4.0 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45739| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98511 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3938| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98511 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41432| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9851 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40006| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9851 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3161| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98509 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98509 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37688| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98508 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39568| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98508 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33882| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98506| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40288| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98506 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41573| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98505 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44502| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98505 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32691| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98504 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36388| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98504 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43405| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98503 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39035| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98503 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42278| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98501| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38017| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98501 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35213| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.985 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41071| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.985 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40069| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98499 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41071| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98499 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41776| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98498 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37202| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98498 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39897| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98496 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40774| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98496 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40883| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98495 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35934| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98495 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42638| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98493 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35182| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98493 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39019| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98492 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40727| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98492 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.30154| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98491 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37578| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98491 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38017| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9849 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41353| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9849 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36059| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98488 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32926| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98488 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40429| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98487 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39677| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98487 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34101| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98486 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3985| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98486 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35777| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98485 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38612| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98485 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3844| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98483 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40398| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98483 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39019| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98482 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36263| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98482 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40993| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98481 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44439| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98481 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38142| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9848 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37876| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9848 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35871| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98478 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41447| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98478 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35166| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98477 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38111| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98477 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41338| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98475 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43421| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98475 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40633| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98474 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42137| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98474 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41009| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98473 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37422| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98473 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37187| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98472 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38518| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98472 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.362| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9847 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36779| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9847 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37249| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98469 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34774| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98469 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38706| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98468 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39834| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98468 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39348| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98467 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40382| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98467 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38377| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98465 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41259| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98465 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38033| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98464 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40633| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98464 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37578| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98463 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40774| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98463 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36967| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98462 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38033| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98462 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40789| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9846 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3808| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9846 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42074| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98459 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3703| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98459 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3938| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98457 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41103| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98457 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40586| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98456 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39975| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98456 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42841| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98455 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41761| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98455 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43264| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98454 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33067| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98454 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36544| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98452 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34665| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98452 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40273| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98451 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37469| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98451 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40226| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9845 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39113| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9845 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37782| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98449 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36576| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98449 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38064| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98447 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45144| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98447 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3526| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98446 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35542| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98446 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38894| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98445 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36811| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98445 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37453| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98444 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36529| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98444 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40492| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98442 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98442 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35135| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98441 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43499| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98441 [2021-09-01 19:14:25,957] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 4.0, reducing to 2.0 [2021-09-01 19:14:25,957] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 4.0, reducing to 2.0 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36544| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98439 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37813| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98439 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37563| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98438 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40085| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98438 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40868| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98437 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3927| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98437 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37516| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98436 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36075| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98436 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36247| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98434| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41588| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98434 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43358| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98433 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41682| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98433 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37281| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98432 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38549| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98432 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37672| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98431 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36748| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98431 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43875| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98429 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34508| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98429 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41165| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98428 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39301| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98428 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42121| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98427 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38596| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98427 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35981| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98426 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.362| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98426 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36482| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98424 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3938| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98424 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4162| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98423 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33192| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98423 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3703| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98421 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37672| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98421 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38252| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9842 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3974| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9842 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38409| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98419 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37798| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98419 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35699| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98418 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37249| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98418 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40695| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98416 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39082| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98416 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3443| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98415 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.401| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98415 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34978| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98414 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37249| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98414 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39474| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98413 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40163| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98413 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34336| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98411 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38957| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98411 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36466| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9841 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36811| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9841 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33803| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98409 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38878| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98409 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34242| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98408 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36012| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98408 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37782| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98406 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40977| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98406 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36873| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98405 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37375| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98405 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35855| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98403 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38925| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98403 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36999| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98402 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.401| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98402 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4021| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98401 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45254| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98401 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37077| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.984 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38894| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.984 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45959| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98398 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4104| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98398 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42215| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98397 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42685| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98397 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35307| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98396 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33882| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98396 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37704| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98395 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40288| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98395 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3963| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98393 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42231| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98393 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.4032| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98392 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43343| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98392 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40429| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98391 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36372| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98391 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39944| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98391 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36842| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98391 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.401| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98389 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43437| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98389 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33318| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98388 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32832| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98388 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35072| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98386 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35182| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98386 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40852| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98385 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41792| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98385 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44079| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98384 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44063| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98384 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41322| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98383 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35323| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98383 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34947| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98381 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41275| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98381 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37093| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9838 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3219| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9838 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37625| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98379 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36451| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98379 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39912| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98378 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40617| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98378 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42935| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98376 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42622| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98376 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42513| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98375 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36075| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98375 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41056| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98374 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37061| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98374 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45363| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98373 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39442| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98373 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43076| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98371 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39912| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98371 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42669| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9837 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41228| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9837 [2021-09-01 19:23:37,254] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 2.0, reducing to 1.0 [2021-09-01 19:23:37,255] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 2.0, reducing to 1.0 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33145| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98368 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40805| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98368 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.414| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98367 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35871| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98367 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37923| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98366 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33866| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98366 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37625| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98365 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41071| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98365 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3739| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98363| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42591| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98363 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41432| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98362 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38033| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98362 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43656| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98361 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4151| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98361 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38409| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9836 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36451| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9836 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40492| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98358 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39192| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98358 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33145| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98357 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4104| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98357 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39442| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98356 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37171| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98356 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41322| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98355 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40132| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98355 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39145| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98353 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36388| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98353 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37155| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98352 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35793| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98352 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36795| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9835 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35417| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9835 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44032| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98349 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33772| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98349 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3808| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98348 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39897| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98348 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40429| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98347 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35934| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98347 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37234| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98345 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41009| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98345 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41933| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98344 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38612| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98344 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40821| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98343 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37735| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98343 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37469| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98342 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38706| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98342 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37484| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9834 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35464| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9834 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39411| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98339 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34445| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98339 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39865| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98338 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41353| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98338 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41259| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98337 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39364| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98337 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36247| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98335 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32816| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98335 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36497| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98334 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40883| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98334 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38377| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98332 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.29355| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98332 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40132| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98331 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39709| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98331 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40789| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9833 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34179| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9833 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39474| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98329 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39458| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98329 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36826| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98327 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34947| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98327 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39035| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98326 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3407| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98326 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40868| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98325 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41416| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98325 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38487| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98324 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36591| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98324 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42497| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98322 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37798| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98322 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38111| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98321 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38127| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98321 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33662| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9832 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3667| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9832 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41729| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98319 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43374| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98319 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43296| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98317 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4375| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98317 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34868| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98316 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38064| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98316 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36482| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98314 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35949| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98314 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40226| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98313 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.375| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98313 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37876| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98312 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4104| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98312 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44063| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98311 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3938| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98311 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41338| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98309 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4433| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98309 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40414| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98308 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41197| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98308 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38988| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98307 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42638| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98307 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38174| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98306 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35934| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98306 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40836| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98304 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40993| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98304 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40602| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98303 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3844| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98303 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42434| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98302 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36247| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98302 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43875| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98301 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3573| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98301 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37296| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98299 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38534| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98299 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36873| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98298 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38847| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98298 [2021-09-01 19:32:49,876] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1.0, reducing to 1 [2021-09-01 19:32:49,876] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1.0, reducing to 1 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36826| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98296 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39646| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98296 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37469| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98295 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3869| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98295 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36388| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98294 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38283| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98294 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38236| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98293 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38111| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98293 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37312| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98291 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34994| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98291 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37516| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9829 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39521| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9829 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39568| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98289 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39239| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98289 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35041| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98288 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3938| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98288 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42935| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98286 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35573| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98286 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39254| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98285 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37688| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98285 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3808| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98284 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39442| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98284 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39521| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98283 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37704| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98283 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4552| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98281 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39442| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98281 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35918| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9828 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40476| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9828 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39787| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98278 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37171| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98278 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38722| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98277 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38064| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98277 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41385| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98276 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39897| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98276 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39771| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98275 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38925| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98275 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39724| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98273 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42027| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98273 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37281| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98273 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42105| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98273 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39865| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98272 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37531| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98272 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38503| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98271 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39192| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98271 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39176| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98269 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38847| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98269 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3786| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98268 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42763| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98268 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37657| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98267 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.31391| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98267 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38518| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98266 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38534| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98266 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35667| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98264 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39536| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98264 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40962| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98263 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4469| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98263 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36717| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98261 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3631| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98261 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39019| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9826 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35338| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9826 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37484| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98259 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37845| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98259 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36106| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98258 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36701| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98258 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42309| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98256 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39912| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98256 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39803| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98255 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36059| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98255 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39599| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98254 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33412| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98254 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40241| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98253 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40742| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98253 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39865| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98251 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33286| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98251 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35996| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9825 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37281| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9825 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36701| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98249 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38033| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98249 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36122| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98248 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3797| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98248 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40116| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98246 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38612| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98246 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42152| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98245 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40429| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98245 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38283| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98243 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37563| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98243 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40492| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98242 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39301| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98242 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40288| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98241 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41949| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98241 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41009| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9824 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43468| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9824 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37578| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98238 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.31877| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98238 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.388| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98237 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3761| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98237 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36028| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98236 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36184| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98236 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43452| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98235 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41792| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98235 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44345| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98233 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35996| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98233 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42967| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98232 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39113| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98232 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98231 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39897| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98231 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38816| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9823 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9823 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38737| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98228 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39004| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98228 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39975| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98227 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39662| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98227 [2021-09-01 19:41:59,998] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-01 19:41:59,998] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36623| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98225 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40022| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98225 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37218| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98224 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3985| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98224 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39254| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98223 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39912| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98223 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41588| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98222 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37766| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98222 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40053| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9822 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42325| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9822 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37265| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98219 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38675| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98219 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.31798| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98218 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35432| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98218 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38158| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98217 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37625| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98217 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41823| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98215 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3985| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98215 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41275| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98214 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42184| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98214 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39959| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98213 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42121| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98213 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34759| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98212 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40868| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98212 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39944| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9821 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44784| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9821 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40335| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98209 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3833| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98209 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40508| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98207 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40398| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98207 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36685| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98206 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38111| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98206 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4068| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98205 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39865| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98205 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40836| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98204 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38581| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98204 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41118| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98202 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39771| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98202 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42152| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98201 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45348| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98201 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36451| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.982 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41103| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.982 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34853| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98199 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37406| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98199 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38095| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98197 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98197 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44204| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98196 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37657| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98196 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41338| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98195 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37907| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98195 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34258| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98194 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38534| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98194 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40539| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98192 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98192 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4104| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98191 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3443| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98191 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43014| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98189 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36451| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98189 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35307| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98188 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39223| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98188 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40555| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98187 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38784| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98187 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39552| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98186 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4115| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98186 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42982| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98184 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41056| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98184 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40304| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98183 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38064| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98183 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36404| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98182 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37813| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98182 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37296| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98181 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38988| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98181 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45551| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98179 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41103| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98179 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43202| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98178 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33615| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98178 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42184| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98177 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35887| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98177 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40617| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98176 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34743| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98176 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38581| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98174 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41338| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98174 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37657| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98173 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.31187| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98173 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36983| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98171 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36075| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98171 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42622| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9817 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40085| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9817 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37328| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98169 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39693| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98169 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43452| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98168 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.31501| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98168 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43828| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98166 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38221| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98166 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39411| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98165 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41557| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98165 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37281| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98164 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35949| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98164 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42497| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98163 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35855| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98163 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4198| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98161 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38189| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98161 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42622| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98161 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42105| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98161 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37876| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9816 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41087| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9816 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39286| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98159 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37437| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98159 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36404| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98157 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40664| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98157 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37813| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98156 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41212| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98156 [2021-09-01 19:51:12,296] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-01 19:51:12,296] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37751| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98154 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38252| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98154 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38033| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98153 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39677| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98153 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41291| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98152 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38581| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98152 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41776| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98151 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40727| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98151 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38769| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98149 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39004| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98149 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38236| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98148 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40883| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98148 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34539| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98147 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44063| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98147 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38596| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98146 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38722| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98146 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35996| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98144| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39552| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98144 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40633| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98143 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39771| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98143 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36717| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98142 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41651| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98142 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39771| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98141 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3703| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98141 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38863| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98139| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41134| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98139 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3703| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98138 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41902| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98138 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38221| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98136 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39771| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98136 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33991| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98135 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43061| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98135 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40194| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98134 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38675| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98134 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39552| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98133 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3537| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98133 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36607| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98131 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37641| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98131 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4021| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9813 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36513| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9813 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38753| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98129 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39082| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98129 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37124| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98128 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41886| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98128 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34774| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98126 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44956| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98126 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.46225| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98125 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38048| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98125 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41009| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98124 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38941| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98124 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39427| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98123 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38456| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98123 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39395| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98121 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34477| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98121 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34633| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9812 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33772| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9812 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42105| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98118 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38142| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98118 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37265| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98117 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35119| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98117 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40508| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98116 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38095| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98116 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44439| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98115 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40335| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98115 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37719| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98113 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39928| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98113 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44565| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98112 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40586| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98112 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38033| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98111 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43985| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98111 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40006| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9811 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41573| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9811 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41776| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98108 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32691| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98108 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37688| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98107 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43139| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98107 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42434| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98106 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40758| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98106 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34868| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98105 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39599| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98105 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42732| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98103 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41792| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98103 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39286| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98102 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39912| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98102 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39333| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.981 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43061| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.981 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33584| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98099 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37108| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98099 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37923| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98098 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36638| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98098 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39442| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98097 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40492| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98097 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39897| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98095 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40273| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98095 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34696| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98094 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42779| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98094 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.31062| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98093 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40774| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98093 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39301| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98092 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35307| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98092 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43703| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9809 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42419| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9809 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34806| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98089 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45583| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98089 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41541| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98088 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34445| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98088 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42998| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98087 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40555| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98087 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38283| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98085 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36748| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98085 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41698| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98084 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35229| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98084 [2021-09-01 20:00:24,512] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-01 20:00:24,512] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35182| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98082 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40633| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98082 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36858| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98081 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40695| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98081 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43108| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9808 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39301| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9808 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40789| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98079 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38831| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98079 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44204| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98077 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41682| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98077 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40711| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98076 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36999| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98076 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3667| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98075 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36795| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98075 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36451| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98074 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41071| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98074 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3916| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98072 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42716| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98072 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45692| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98071 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40852| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98071 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36106| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9807 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32033| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9807 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39787| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98069 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40695| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98069 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39458| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98067 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32299| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98067 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39818| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98066 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39583| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98066 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35041| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98064 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4516| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98064 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35385| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98063 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37845| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98063 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34461| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98062 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39004| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98062 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41385| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98061 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32143| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98061 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.29934| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98059 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33897| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98059 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40868| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98058 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39239| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98058 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41964| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98057 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41338| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98057 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44079| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98056 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36983| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98056 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40382| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98055 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42888| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98055 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41839| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98054 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38017| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98054 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38518| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98053 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.349| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98053 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39599| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98052 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41212| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98052 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39223| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9805 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44032| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9805 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3869| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98049 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35573| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98049 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3573| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98047 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36826| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98047 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33725| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98046 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35323| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98046 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35777| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98045 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4364| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98045 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37375| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98044 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39129| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98044 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38831| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98042 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35495| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98042 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40836| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98041 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3396| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98041 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43907| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9804 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39552| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9804 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38612| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98039 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3916| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98039 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40429| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98037 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45113| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98037 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34383| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98036 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38174| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98036 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41839| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98035 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42935| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98035 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35777| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98034 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33929| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98034 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33333| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98032 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42716| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98032 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39536| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98031 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3761| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98031 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34195| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98029 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3407| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98029 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39442| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98028 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37108| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98028 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40915| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98027 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.388| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98027 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36137| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98026 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4281| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98026 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39004| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98024 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36936| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98024 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38377| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98023 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37657| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98023 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37657| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98022 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37484| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98022 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37594| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98021 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39881| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98021 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43938| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98019 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39552| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98019 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34727| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98018 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35589| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98018 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38784| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98017 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4245| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98017 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40742| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98016 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35088| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98016 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36466| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98014 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37688| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98014 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39865| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98013 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39301| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98013 [2021-09-01 20:09:36,043] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-01 20:09:36,043] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36153| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98011 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38127| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98011 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34915| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9801 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34367| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9801 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38847| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98009 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36952| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98009 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34414| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98008 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36388| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98008 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35871| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98006 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38816| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98006 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40445| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98005 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43311| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98005 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38377| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98004 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37578| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98004 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36357| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98003 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4162| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98003 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40821| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98001 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.31767| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98001 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4364| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38769| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.98 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37249| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97999 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33882| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97999 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38268| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97998 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39286| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97998 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40335| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97996 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42873| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97996 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43296| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97995 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40836| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97995 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41557| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97993 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36999| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97993 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38471| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97992 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34164| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97992 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32863| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97991 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.49076| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97991 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42904| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9799 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38628| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9799 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39646| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97988 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41118| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97988 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.29104| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97987 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45066| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97987 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40946| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97986 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32096| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97986 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40711| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97985 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.32722| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97985 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36701| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97983 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44596| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97983 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3833| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97982 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39803| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97982 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42763| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97981 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43954| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97981 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34884| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9798 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37343| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9798 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40414| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97978 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36372| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97978 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37578| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97977 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3584| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97977 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39098| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97975 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35542| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97975 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36764| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97974 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39771| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97974 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36466| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97973 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40586| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97973 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.30702| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97972 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36999| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97972 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3526| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9797 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40132| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9797 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3963| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97969 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39959| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97969 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38925| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97968 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36623| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97968 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39051| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97967 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39223| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97967 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38972| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97965 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3844| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97965 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42528| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97964 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41573| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97964 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38471| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97963 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36952| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97963 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37892| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97962 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37218| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97962 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35573| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9796 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37093| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9796 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44799| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97959 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38095| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97959 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39474| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97957 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40727| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97957 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44753| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97957 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3808| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97957 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34868| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97956 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38581| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97956 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37265| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97955 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35135| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97955 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37782| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97953 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39897| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97953 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41103| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97952 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3974| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97952 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41447| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97951 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37312| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97951 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43656| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9795 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35276| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9795 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38111| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97948 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35464| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97948 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40993| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97947 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35887| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97947 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38377| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97946 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42246| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97946 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36701| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97945 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97945 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41087| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97943 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39552| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97943 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39489| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97942 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42638| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97942 [2021-09-01 20:18:48,403] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-01 20:18:48,403] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3407| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9794 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38095| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9794 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.32487| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97939 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4245| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97939 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41479| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97938 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3526| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97938 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41667| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97937 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34821| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97937 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35135| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97935 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41682| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97935 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38471| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97934 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97934 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44909| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97933 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36654| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97933 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4068| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97932 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36388| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97932 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36247| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9793 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37343| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9793 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3891| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97929 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44204| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97929 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38925| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97928 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37986| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97928 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38064| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97927 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40022| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97927 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40116| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97925| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33506| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97925 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41306| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97924 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42481| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97924 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40821| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97922 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40367| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97922 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41338| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97921 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38878| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97921 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43734| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9792 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39646| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9792 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38142| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97919 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38628| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97919 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4256| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97917 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39223| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97917 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44063| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97916 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33584| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97916 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40539| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97915 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42372| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97915 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39301| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97914 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37124| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97914 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37766| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97912 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42356| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97912 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35683| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97911 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43139| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97911 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35918| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9791 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42262| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9791 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41228| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97909 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34367| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97909 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.388| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97907 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39803| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97907 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36732| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97906 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45739| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97906 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41024| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97904 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35401| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97904 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41479| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97903 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40492| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97903 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34978| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97902 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36999| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97902 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40868| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97901 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37296| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97901 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36216| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97899 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33631| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97899 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44204| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97898 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41479| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97898 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37892| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97897 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38111| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97897 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35902| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97896 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39709| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97896 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.32957| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97894 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37845| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97894 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36513| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97893 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38158| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97893 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42231| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97892 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35338| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97892 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37406| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97891 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43907| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97891 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37046| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97889 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42043| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97889 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39066| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97888 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43499| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97888 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44518| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97886 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41682| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97886 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3692| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97885 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40836| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97885 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43076| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97884 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40523| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97884 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37453| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97883 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39803| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97883 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42638| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97881 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36967| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97881 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9788 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36216| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9788 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34806| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97879 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37719| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97879 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42121| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97878 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37672| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97878 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33349| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97876 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39192| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97876 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38017| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97875 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41024| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97875 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43484| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97874 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42215| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97874 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44862| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97873 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3739| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97873 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36513| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97871 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3786| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97871 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36451| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9787 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38377| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9787 [2021-09-01 20:27:59,990] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-01 20:27:59,990] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41416| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97868 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37688| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97868 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42607| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97867 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42074| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97867 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39051| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97866 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3562| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97866 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.31704| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97865 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35589| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97865 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43734| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97863 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36106| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97863 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33976| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97862 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39286| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97862 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38346| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97862 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4245| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97862 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35652| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97861 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33302| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97861 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38127| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97859 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42278| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97859 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38487| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97858 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39176| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97858 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41964| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97857 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40085| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97857 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.32409| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97856 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42168| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97856 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37986| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97854| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35307| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97854 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40226| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97853 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39912| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97853 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40273| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97851 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41463| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97851 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37108| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9785 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35244| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9785 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42544| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97849 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39239| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97849 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37375| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97848 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35981| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97848 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40915| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97846 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4021| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97846 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37751| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97845 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35197| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97845 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41698| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97844 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44377| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97844 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44283| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97843 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37296| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97843 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37296| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97841 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37672| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97841 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38252| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9784 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37406| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9784 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.33427| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97839 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42074| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97839 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39944| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97838 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4057| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97838 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39646| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97836 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40868| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97836 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4104| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97835 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43061| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97835 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41306| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97833 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36701| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97833 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34602| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97832 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41009| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97832 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39145| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97831 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36591| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97831 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40852| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9783 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38268| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9783 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35949| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97828 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38189| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97828 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33051| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97827 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39912| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97827 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36748| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97826 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37014| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97826 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37986| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97825 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34305| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97825 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3808| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97823 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43202| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97823 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40288| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97822 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3938| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97822 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36983| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97821 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40147| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97821 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39145| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9782 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38847| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9782 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36732| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97818 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38127| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97818 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36388| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97817 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36654| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97817 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37594| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97815 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3291| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97815 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40758| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97814 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43452| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97814 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36529| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97813 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44659| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97813 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35683| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97812 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3927| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97812 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34383| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9781 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39818| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9781 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37516| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97809 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40805| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97809 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37484| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97808 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3584| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97808 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36701| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97807 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34774| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97807 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39662| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97805 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38189| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97805 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34743| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97804 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.362| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97804 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39928| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97803 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36983| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97803 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35887| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97802 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34947| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97802 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38283| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.978 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39364| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.978 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39348| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97799 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40633| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97799 [2021-09-01 20:37:10,092] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-01 20:37:10,092] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42857| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97797 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37594| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97797 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39098| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97796 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34477| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97796 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40398| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97795 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41181| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97795 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38268| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97794 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40789| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97794 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36466| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97792| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39066| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97792 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39599| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97791 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36529| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97791 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41494| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9779 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39427| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9779 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43954| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97789 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40367| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97789 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38706| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97787| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4375| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97787 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34931| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97786 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39975| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97786 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39881| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97785 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35996| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97785 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36059| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97784 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42137| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97784 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37563| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97782| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40664| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97782 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39098| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97781 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44314| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97781 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33944| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97779 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40226| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97779 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36576| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97778 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36764| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97778 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39897| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97777 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38174| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97777 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39912| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97776 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38487| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97776 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40038| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97774 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39207| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97774 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34461| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97773 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42951| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97773 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37923| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97772 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38659| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97772 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42873| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97772 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35746| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97772 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35699| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9777 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39113| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9777 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40085| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97769 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37516| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97769 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43672| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97768 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38283| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97768 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34712| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97767 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35683| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97767 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40555| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97765 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36231| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97765 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42669| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97764 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38095| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97764 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35511| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97762 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41776| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97762 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42481| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97761 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37798| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97761 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34994| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9776 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38706| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9776 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38675| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97759 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37672| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97759 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41244| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97757 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42419| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97757 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38769| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97756 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38894| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97756 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45551| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97755 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36231| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97755 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4187| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97754 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36858| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97754 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41291| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97752 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40555| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97752 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39724| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97751 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39317| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97751 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39881| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9775 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38925| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9775 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35902| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97749 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37923| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97749 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36623| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97747 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39098| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97747 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34665| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97746 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35307| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97746 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36858| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97744 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37249| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97744 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45128| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97743 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35056| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97743 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38127| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97742 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39803| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97742 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36513| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97741 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36999| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97741 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43797| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97739 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37328| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97739 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37641| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97738 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36873| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97738 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3562| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97737 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42701| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97737 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38315| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97736 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38894| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97736 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38299| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97734 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36153| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97734 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38456| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97733 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39771| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97733 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37735| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97732 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37688| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97732 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42904| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97731 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34774| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97731 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41447| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97729 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38409| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97729 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.4032| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97728 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38346| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97728 [2021-09-01 20:46:20,592] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-01 20:46:20,592] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43358| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97726 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36842| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97726 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32096| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97725 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40022| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97725 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36936| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97724 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41447| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97724 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38205| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97723 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40053| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97723 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39019| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97721 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36654| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97721 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41087| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9772 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34273| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9772 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36216| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97719 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38189| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97719 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41353| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97718 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3786| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97718 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38409| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97716| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43374| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97716 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44345| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97715 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35667| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97715 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38174| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97714 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38941| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97714 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39552| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97713 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37876| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97713 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36153| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97711 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41698| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97711 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38299| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9771 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36811| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9771 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37719| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97708 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37954| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97708 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35526| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97707 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37093| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97707 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41432| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97706 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3786| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97706 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34649| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97705 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40883| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97705 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35542| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97703 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3833| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97703 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37719| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97702 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41745| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97702 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37923| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97701 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37876| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97701 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3927| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.977 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40508| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.977 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42027| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97698 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36216| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97698 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35824| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97697 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37014| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97697 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34398| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97696 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39474| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97696 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41087| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97695 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39662| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97695 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32581| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97693 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40492| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97693 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35667| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97692 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36685| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97692 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41823| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9769 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39881| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9769 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.33145| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97689 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39051| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97689 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35996| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97688 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37719| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97688 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40727| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97687 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36576| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97687 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38988| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97685 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42387| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97685 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3562| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97685 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42607| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97685 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42419| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97684 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.31563| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97684 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4433| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97683 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4563| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97683 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38628| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97681 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40508| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97681 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39145| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9768 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40946| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9768 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3916| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97679 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38299| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97679 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34179| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97678 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39489| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97678 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36764| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97676 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37907| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97676 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39724| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97675 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38878| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97675 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39912| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97673 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39458| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97673 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40899| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97672 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38033| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97672 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35213| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97671 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39176| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97671 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38628| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9767 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38268| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9767 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38377| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97668 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36701| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97668 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.46585| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97667 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36059| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97667 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3692| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97666 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35652| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97666 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40664| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97665 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37876| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97665 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39709| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97663 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39865| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97663 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43766| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97662 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42685| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97662 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42434| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97661 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35526| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97661 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40304| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9766 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.31516| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9766 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37578| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97658 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34571| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97658 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39505| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97657 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38283| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97657 [2021-09-01 20:55:32,293] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-01 20:55:32,293] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40367| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97655 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37484| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97655 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38471| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97654 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37014| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97654 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38565| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97653 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36451| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97653 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39944| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97652 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41447| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97652 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36764| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9765| loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3255| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9765 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32691| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97649 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36028| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97649 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40382| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97648 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37672| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97648 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39521| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97647 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32018| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97647 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44063| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97645 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41604| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97645 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35683| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97644 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40367| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97644 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37406| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97643 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40241| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97643 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42857| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97642 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38001| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97642 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.44298| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9764 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4093| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9764 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42011| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97639 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.32018| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97639 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37343| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97637 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35511| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97637 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34884| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97636 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41949| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97636 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40742| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97635 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39583| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97635 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38252| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97634 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37876| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97634 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38127| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97632 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39787| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97632 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35119| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97631 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.37892| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97631 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41385| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9763 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43343| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9763 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42497| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97629 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42779| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97629 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.46021| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97627 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40241| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97627 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4375| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97626 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41902| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97626 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39724| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97625 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38236| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97625 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39881| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97624 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35103| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97624 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3584| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97622 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35793| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97622 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40727| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97621 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38456| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97621 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3963| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97619 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.40789| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97619 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37516| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97618 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38205| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97618 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.35542| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97617 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36419| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97617 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3631| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97616 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4209| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97616 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43311| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97614 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38268| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97614 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41259| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97613 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41353| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97613 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42998| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97612 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37484| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97612 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34915| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97611 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37876| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97611 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39897| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97609 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39098| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97609 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.34837| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97608 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36529| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97608 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39505| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97607 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38675| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97607 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.42278| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97606 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36544| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97606 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36858| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97604 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37218| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97604 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37798| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97603 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39583| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97603 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.45238| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97601 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40508| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97601 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36247| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97601 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.38283| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97601 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40821| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.976 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36231| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.976 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37359| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97599 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39568| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97599 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38894| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97597 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42466| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97597 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3891| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97596 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.41651| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97596 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43499| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97595 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43155| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97595 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4057| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97594 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.34712| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97594 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38111| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97592 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4411| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97592 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38064| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97591 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39411| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97591 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.3703| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9759 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37155| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.9759 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37249| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97589 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.4422| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97589 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3761| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97587 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43374| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97587 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42152| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97586 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39991| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97586 [2021-09-01 21:04:44,679] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-01 21:04:44,679] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.39693| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97584 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38252| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97584 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.31156| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97583 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.35636| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97583 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38393| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97582 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38409| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97582 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.40335| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97581 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99688| %_mask_idx: 0.36952| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97581 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.42701| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97579 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.3714| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97579 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.37281| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97578 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39724| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97578 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.41291| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97577 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.39239| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97577 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.38409| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97576 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.43969| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97576 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.414| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97574 | loss: 0.02492| constrast_loss: 0.0| div_loss: 0.99687| %_mask_idx: 0.36999| ppl: 2.0| %_neg_is_pos: 1.0| lr: 0.0005| temp: 1.97574