[2021-09-01 21:14:03,199] [WARNING] [runner.py:122:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2021-09-01 21:14:03,238] [INFO] [runner.py:360:main] cmd = /home/patrick/anaconda3/envs/hugging_face/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 ./run_pretrain_no_trainer.py --output_dir=./test --max_train_steps=200000 --num_warmup_steps=100000 --gradient_accumulation_steps=4 --learning_rate=0.0001 --weight_decay=0.01 --max_duration_in_seconds=8.0 --model_name_or_path=./ --dataset_name=patrickvonplaten/librispeech_local --manual_data_dir=/home/patrick/wav2vec2_reproduce --dataset_config_name=clean --logging_steps=5 --per_device_train_batch_size=16 --per_device_eval_batch_size=16 [2021-09-01 21:14:03,590] [INFO] [launch.py:80:main] WORLD INFO DICT: {'localhost': [0, 1]} [2021-09-01 21:14:03,590] [INFO] [launch.py:86:main] nnodes=1, num_local_procs=2, node_rank=0 [2021-09-01 21:14:03,590] [INFO] [launch.py:101:main] global_rank_mapping=defaultdict(, {'localhost': [0, 1]}) [2021-09-01 21:14:03,590] [INFO] [launch.py:102:main] dist_world_size=2 [2021-09-01 21:14:03,590] [INFO] [launch.py:104:main] Setting CUDA_VISIBLE_DEVICES=0,1 [2021-09-01 21:18:08,863] [INFO] [utils.py:11:_initialize_parameter_parallel_groups] data_parallel_size: 2, parameter_parallel_size: 2 [2021-09-01 21:18:08,870] [INFO] [utils.py:11:_initialize_parameter_parallel_groups] data_parallel_size: 2, parameter_parallel_size: 2 [2021-09-01 21:18:08,915] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-01 21:18:08,915] [INFO] [engine.py:702:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-01 21:18:08,915] [INFO] [engine.py:707:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-01 21:18:08,915] [INFO] [engine.py:716:_configure_optimizer] DeepSpeed Basic Optimizer = AdamW [2021-09-01 21:18:08,915] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type= [2021-09-01 21:18:08,915] [WARNING] [engine.py:726:_configure_optimizer] **** You are using ZeRO with an untested optimizer, proceed with caution ***** [2021-09-01 21:18:08,915] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 2 optimizer [2021-09-01 21:18:08,915] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-01 21:18:08,915] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-01 21:18:08,916] [INFO] [stage2.py:108:__init__] CPU Offload: True [2021-09-01 21:18:08,916] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False Using /home/patrick/.cache/torch_extensions as PyTorch extensions root... Using /home/patrick/.cache/torch_extensions as PyTorch extensions root... Emitting ninja build file /home/patrick/.cache/torch_extensions/utils/build.ninja... Building extension module utils... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module utils... Time to load utils op: 0.5334851741790771 seconds Loading extension module utils... Time to load utils op: 0.6029212474822998 seconds Using /home/patrick/.cache/torch_extensions as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0003986358642578125 seconds [2021-09-01 21:18:11,992] [INFO] [stage2.py:416:__init__] optimizer state initialized [2021-09-01 21:18:11,992] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW [2021-09-01 21:18:11,992] [INFO] [engine.py:519:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-01 21:18:11,992] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = None [2021-09-01 21:18:11,992] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0001, 0.0001], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-01 21:18:11,992] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-01 21:18:11,992] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-01 21:18:11,992] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] amp_params ................... False [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] dump_state ................... False [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... None [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] gradient_accumulation_steps .. 4 [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] gradient_clipping ............ 0.0 [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4294967296 [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] pld_params ................... False [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-01 21:18:11,993] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-01 21:18:11,994] [INFO] [config.py:904:print] steps_per_print .............. inf [2021-09-01 21:18:11,994] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-01 21:18:11,994] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-01 21:18:11,994] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-01 21:18:11,994] [INFO] [config.py:904:print] train_batch_size ............. 128 [2021-09-01 21:18:11,994] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 16 [2021-09-01 21:18:11,994] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-01 21:18:11,994] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-01 21:18:11,994] [INFO] [config.py:904:print] world_size ................... 2 [2021-09-01 21:18:11,994] [INFO] [config.py:904:print] zero_allow_untested_optimizer True [2021-09-01 21:18:11,994] [INFO] [config.py:904:print] zero_config .................. { "stage": 2, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": { "device": "cpu", "nvme_path": null, "buffer_count": 4, "pin_memory": false, "pipeline_read": false, "pipeline_write": false, "fast_init": false, "pipeline": false }, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-01 21:18:11,994] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-01 21:18:11,994] [INFO] [config.py:904:print] zero_optimization_stage ...... 2 [2021-09-01 21:18:11,994] [INFO] [config.py:906:print] json = { "train_batch_size": 128, "gradient_accumulation_steps": 4, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "cpu" } }, "steps_per_print": inf, "zero_allow_untested_optimizer": true, "fp16": { "enabled": true } } Using /home/patrick/.cache/torch_extensions as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.00025534629821777344 seconds | loss: 1.17416| constrast_loss: 4.6084| div_loss: 0.88257| %_mask_idx: 0.36357| ppl: 75.15784| %_neg_is_pos: 0.03725| lr: 0.0| temp: 1.99999 | loss: 1.17657| constrast_loss: 4.61902| div_loss: 0.87269| %_mask_idx: 0.44377| ppl: 81.47771| %_neg_is_pos: 0.01389| lr: 0.0| temp: 1.99999 | loss: 1.17474| constrast_loss: 4.61205| div_loss: 0.86907| %_mask_idx: 0.39583| ppl: 83.79659| %_neg_is_pos: 0.02368| lr: 0.0| temp: 1.99998 | loss: 1.17456| constrast_loss: 4.61116| div_loss: 0.87066| %_mask_idx: 0.3927| ppl: 82.77641| %_neg_is_pos: 0.03896| lr: 0.0| temp: 1.99998 | loss: 1.17522| constrast_loss: 4.61223| div_loss: 0.88653| %_mask_idx: 0.36216| ppl: 72.62298| %_neg_is_pos: 0.04049| lr: 0.0| temp: 1.99997 | loss: 1.17501| constrast_loss: 4.61338| div_loss: 0.86658| %_mask_idx: 0.4032| ppl: 85.39172| %_neg_is_pos: 0.0332| lr: 0.0| temp: 1.99997 | loss: 1.17625| constrast_loss: 4.61847| div_loss: 0.86507| %_mask_idx: 0.42607| ppl: 86.35463| %_neg_is_pos: 0.01813| lr: 0.0| temp: 1.99996 | loss: 1.17717| constrast_loss: 4.62206| div_loss: 0.86634| %_mask_idx: 0.37813| ppl: 85.54285| %_neg_is_pos: 0.02742| lr: 0.0| temp: 1.99996 | loss: 1.1757| constrast_loss: 4.6157| div_loss: 0.87121| %_mask_idx: 0.36983| ppl: 82.42662| %_neg_is_pos: 0.02716| lr: 0.0| temp: 1.99994 | loss: 1.17633| constrast_loss: 4.61852| div_loss: 0.86788| %_mask_idx: 0.38189| ppl: 84.55788| %_neg_is_pos: 0.02287| lr: 0.0| temp: 1.99994 | loss: 1.17659| constrast_loss: 4.62047| div_loss: 0.85887| %_mask_idx: 0.39568| ppl: 90.32195| %_neg_is_pos: 0.02331| lr: 0.0| temp: 1.99993 | loss: 1.17449| constrast_loss: 4.61115| div_loss: 0.868| %_mask_idx: 0.38816| ppl: 84.47749| %_neg_is_pos: 0.03235| lr: 0.0| temp: 1.99993 | loss: 1.17701| constrast_loss: 4.62077| div_loss: 0.87252| %_mask_idx: 0.40163| ppl: 81.58904| %_neg_is_pos: 0.02012| lr: 0.0| temp: 1.99992 | loss: 1.17558| constrast_loss: 4.61499| div_loss: 0.87318| %_mask_idx: 0.388| ppl: 81.16501| %_neg_is_pos: 0.04386| lr: 0.0| temp: 1.99992 | loss: 1.17518| constrast_loss: 4.61248| div_loss: 0.88232| %_mask_idx: 0.39881| ppl: 75.31491| %_neg_is_pos: 0.03518| lr: 0.0| temp: 1.99991 | loss: 1.17528| constrast_loss: 4.61422| div_loss: 0.86884| %_mask_idx: 0.3985| ppl: 83.94267| %_neg_is_pos: 0.02222| lr: 0.0| temp: 1.99991 | loss: 1.17664| constrast_loss: 4.61946| div_loss: 0.87097| %_mask_idx: 0.39505| ppl: 82.58133| %_neg_is_pos: 0.02141| lr: 0.0| temp: 1.99989 | loss: 1.17529| constrast_loss: 4.61313| div_loss: 0.88036| %_mask_idx: 0.39912| ppl: 76.56905| %_neg_is_pos: 0.03503| lr: 0.0| temp: 1.99989 | loss: 1.17399| constrast_loss: 4.60811| div_loss: 0.87862| %_mask_idx: 0.36873| ppl: 77.6853| %_neg_is_pos: 0.04109| lr: 0.0| temp: 1.99988 | loss: 1.17482| constrast_loss: 4.61213| div_loss: 0.87153| %_mask_idx: 0.40038| ppl: 82.21902| %_neg_is_pos: 0.02122| lr: 0.0| temp: 1.99988 | loss: 1.17632| constrast_loss: 4.61768| div_loss: 0.87594| %_mask_idx: 0.35652| ppl: 79.3994| %_neg_is_pos: 0.0326| lr: 0.0| temp: 1.99987 | loss: 1.17469| constrast_loss: 4.61187| div_loss: 0.86898| %_mask_idx: 0.34806| ppl: 83.85374| %_neg_is_pos: 0.04175| lr: 0.0| temp: 1.99987 | loss: 1.17555| constrast_loss: 4.61425| div_loss: 0.8795| %_mask_idx: 0.37766| ppl: 77.11819| %_neg_is_pos: 0.02963| lr: 0.0| temp: 1.99986 | loss: 1.17477| constrast_loss: 4.61217| div_loss: 0.869| %_mask_idx: 0.33271| ppl: 83.83946| %_neg_is_pos: 0.04261| lr: 0.0| temp: 1.99986 | loss: 1.17516| constrast_loss: 4.61412| div_loss: 0.86513| %_mask_idx: 0.40586| ppl: 86.31496| %_neg_is_pos: 0.02182| lr: 0.0| temp: 1.99984 | loss: 1.17464| constrast_loss: 4.61114| div_loss: 0.87406| %_mask_idx: 0.35244| ppl: 80.60241| %_neg_is_pos: 0.03007| lr: 0.0| temp: 1.99984 | loss: 1.17638| constrast_loss: 4.61805| div_loss: 0.87456| %_mask_idx: 0.42434| ppl: 80.28326| %_neg_is_pos: 0.02481| lr: 0.0| temp: 1.99983 | loss: 1.17558| constrast_loss: 4.61508| div_loss: 0.87247| %_mask_idx: 0.39803| ppl: 81.62137| %_neg_is_pos: 0.03814| lr: 0.0| temp: 1.99983 | loss: 1.1751| constrast_loss: 4.61365| div_loss: 0.86748| %_mask_idx: 0.29308| ppl: 84.81561| %_neg_is_pos: 0.03391| lr: 0.0| temp: 1.99981 | loss: 1.1755| constrast_loss: 4.61497| div_loss: 0.8703| %_mask_idx: 0.36811| ppl: 83.00802| %_neg_is_pos: 0.03459| lr: 0.0| temp: 1.99981 | loss: 1.17572| constrast_loss: 4.61512| div_loss: 0.87776| %_mask_idx: 0.38878| ppl: 78.23534| %_neg_is_pos: 0.03507| lr: 0.0| temp: 1.9998 | loss: 1.17664| constrast_loss: 4.6202| div_loss: 0.8635| %_mask_idx: 0.38205| ppl: 87.35898| %_neg_is_pos: 0.02313| lr: 0.0| temp: 1.9998 | loss: 1.17681| constrast_loss: 4.62011| div_loss: 0.87145| %_mask_idx: 0.35699| ppl: 82.26974| %_neg_is_pos: 0.03309| lr: 0.0| temp: 1.99979 | loss: 1.17846| constrast_loss: 4.62699| div_loss: 0.86832| %_mask_idx: 0.34806| ppl: 84.27489| %_neg_is_pos: 0.02771| lr: 0.0| temp: 1.99979 | loss: 1.17558| constrast_loss: 4.61555| div_loss: 0.86775| %_mask_idx: 0.38847| ppl: 84.64142| %_neg_is_pos: 0.0287| lr: 0.0| temp: 1.99978 | loss: 1.17623| constrast_loss: 4.61784| div_loss: 0.87082| %_mask_idx: 0.37124| ppl: 82.67383| %_neg_is_pos: 0.02747| lr: 0.0| temp: 1.99978 | loss: 1.17455| constrast_loss: 4.61071| div_loss: 0.87494| %_mask_idx: 0.3385| ppl: 80.03691| %_neg_is_pos: 0.04956| lr: 0.0| temp: 1.99976 | loss: 1.17539| constrast_loss: 4.61427| div_loss: 0.87305| %_mask_idx: 0.40539| ppl: 81.24655| %_neg_is_pos: 0.02544| lr: 0.0| temp: 1.99976 | loss: 1.17697| constrast_loss: 4.62251| div_loss: 0.85357| %_mask_idx: 0.38315| ppl: 93.71217| %_neg_is_pos: 0.02137| lr: 0.0| temp: 1.99975 | loss: 1.17431| constrast_loss: 4.61052| div_loss: 0.8674| %_mask_idx: 0.43217| ppl: 84.86614| %_neg_is_pos: 0.02376| lr: 0.0| temp: 1.99975 | loss: 1.17645| constrast_loss: 4.61855| div_loss: 0.8727| %_mask_idx: 0.37547| ppl: 81.47276| %_neg_is_pos: 0.02303| lr: 0.0| temp: 1.99974 | loss: 1.17595| constrast_loss: 4.61777| div_loss: 0.86047| %_mask_idx: 0.38471| ppl: 89.29869| %_neg_is_pos: 0.01718| lr: 0.0| temp: 1.99974 | loss: 1.17522| constrast_loss: 4.61466| div_loss: 0.86199| %_mask_idx: 0.39145| ppl: 88.32475| %_neg_is_pos: 0.02627| lr: 0.0| temp: 1.99973 | loss: 1.17658| constrast_loss: 4.61856| div_loss: 0.87776| %_mask_idx: 0.34759| ppl: 78.2314| %_neg_is_pos: 0.03234| lr: 0.0| temp: 1.99973 | loss: 1.17628| constrast_loss: 4.61836| div_loss: 0.86739| %_mask_idx: 0.38722| ppl: 84.87273| %_neg_is_pos: 0.02971| lr: 0.0| temp: 1.99971 | loss: 1.1747| constrast_loss: 4.61142| div_loss: 0.87393| %_mask_idx: 0.36169| ppl: 80.6851| %_neg_is_pos: 0.03209| lr: 0.0| temp: 1.99971 | loss: 1.17512| constrast_loss: 4.61211| div_loss: 0.88351| %_mask_idx: 0.37469| ppl: 74.55336| %_neg_is_pos: 0.0431| lr: 0.0| temp: 1.9997 | loss: 1.17577| constrast_loss: 4.61571| div_loss: 0.87362| %_mask_idx: 0.40883| ppl: 80.8847| %_neg_is_pos: 0.02069| lr: 0.0| temp: 1.9997 | loss: 1.17459| constrast_loss: 4.61185| div_loss: 0.86487| %_mask_idx: 0.41682| ppl: 86.48233| %_neg_is_pos: 0.0226| lr: 0.0| temp: 1.99969 | loss: 1.17602| constrast_loss: 4.61773| div_loss: 0.86355| %_mask_idx: 0.40711| ppl: 87.32512| %_neg_is_pos: 0.01601| lr: 0.0| temp: 1.99969 | loss: 1.1744| constrast_loss: 4.61013| div_loss: 0.87479| %_mask_idx: 0.38362| ppl: 80.13724| %_neg_is_pos: 0.03594| lr: 0.0| temp: 1.99968 | loss: 1.17428| constrast_loss: 4.60982| div_loss: 0.87301| %_mask_idx: 0.40492| ppl: 81.27486| %_neg_is_pos: 0.0246| lr: 0.0| temp: 1.99968 | loss: 1.17611| constrast_loss: 4.61754| div_loss: 0.86905| %_mask_idx: 0.34132| ppl: 83.8083| %_neg_is_pos: 0.02831| lr: 0.0| temp: 1.99966 | loss: 1.17481| constrast_loss: 4.6121| div_loss: 0.87135| %_mask_idx: 0.38299| ppl: 82.33338| %_neg_is_pos: 0.03936| lr: 0.0| temp: 1.99966 | loss: 1.17564| constrast_loss: 4.61524| div_loss: 0.87337| %_mask_idx: 0.3844| ppl: 81.04351| %_neg_is_pos: 0.02583| lr: 0.0| temp: 1.99965 | loss: 1.17302| constrast_loss: 4.6035| div_loss: 0.88579| %_mask_idx: 0.35385| ppl: 73.0918| %_neg_is_pos: 0.04771| lr: 0.0| temp: 1.99965 | loss: 1.17435| constrast_loss: 4.61071| div_loss: 0.86684| %_mask_idx: 0.41698| ppl: 85.22024| %_neg_is_pos: 0.03033| lr: 0.0| temp: 1.99963 | loss: 1.17593| constrast_loss: 4.61677| div_loss: 0.86952| %_mask_idx: 0.39145| ppl: 83.50901| %_neg_is_pos: 0.03237| lr: 0.0| temp: 1.99963 | loss: 1.17299| constrast_loss: 4.60413| div_loss: 0.87828| %_mask_idx: 0.32738| ppl: 77.89853| %_neg_is_pos: 0.05344| lr: 0.0| temp: 1.99962 | loss: 1.17694| constrast_loss: 4.62169| div_loss: 0.86059| %_mask_idx: 0.40414| ppl: 89.21969| %_neg_is_pos: 0.02283| lr: 0.0| temp: 1.99962 | loss: 1.176| constrast_loss: 4.61838| div_loss: 0.85608| %_mask_idx: 0.44298| ppl: 92.11095| %_neg_is_pos: 0.00912| lr: 0.0| temp: 1.99961 | loss: 1.17542| constrast_loss: 4.6158| div_loss: 0.85873| %_mask_idx: 0.40335| ppl: 90.41145| %_neg_is_pos: 0.02041| lr: 0.0| temp: 1.99961 | loss: 1.17388| constrast_loss: 4.60788| div_loss: 0.87627| %_mask_idx: 0.40993| ppl: 79.18768| %_neg_is_pos: 0.03739| lr: 0.0| temp: 1.9996 | loss: 1.17583| constrast_loss: 4.61624| div_loss: 0.87079| %_mask_idx: 0.35605| ppl: 82.69569| %_neg_is_pos: 0.04011| lr: 0.0| temp: 1.9996 | loss: 1.17602| constrast_loss: 4.61734| div_loss: 0.8675| %_mask_idx: 0.38127| ppl: 84.80087| %_neg_is_pos: 0.02786| lr: 0.0| temp: 1.99958 | loss: 1.175| constrast_loss: 4.61227| div_loss: 0.8772| %_mask_idx: 0.34101| ppl: 78.58981| %_neg_is_pos: 0.0414| lr: 0.0| temp: 1.99958 | loss: 1.17534| constrast_loss: 4.61429| div_loss: 0.87085| %_mask_idx: 0.38315| ppl: 82.65894| %_neg_is_pos: 0.03591| lr: 0.0| temp: 1.99957 | loss: 1.17645| constrast_loss: 4.61823| div_loss: 0.87589| %_mask_idx: 0.3714| ppl: 79.42859| %_neg_is_pos: 0.03214| lr: 0.0| temp: 1.99957 | loss: 1.17474| constrast_loss: 4.61144| div_loss: 0.87506| %_mask_idx: 0.39223| ppl: 79.96413| %_neg_is_pos: 0.02632| lr: 0.0| temp: 1.99956 | loss: 1.17473| constrast_loss: 4.61184| div_loss: 0.87076| %_mask_idx: 0.39019| ppl: 82.71671| %_neg_is_pos: 0.0373| lr: 0.0| temp: 1.99956 | loss: 1.17559| constrast_loss: 4.61501| div_loss: 0.87372| %_mask_idx: 0.39583| ppl: 80.81714| %_neg_is_pos: 0.02574| lr: 0.0| temp: 1.99955 | loss: 1.17332| constrast_loss: 4.60545| div_loss: 0.87817| %_mask_idx: 0.43045| ppl: 77.97208| %_neg_is_pos: 0.03409| lr: 0.0| temp: 1.99955 | loss: 1.17339| constrast_loss: 4.60548| div_loss: 0.88061| %_mask_idx: 0.38362| ppl: 76.40987| %_neg_is_pos: 0.04153| lr: 0.0| temp: 1.99953 | loss: 1.17674| constrast_loss: 4.62014| div_loss: 0.86803| %_mask_idx: 0.42121| ppl: 84.46091| %_neg_is_pos: 0.01796| lr: 0.0| temp: 1.99953 | loss: 1.17445| constrast_loss: 4.60973| div_loss: 0.88085| %_mask_idx: 0.33991| ppl: 76.2543| %_neg_is_pos: 0.04061| lr: 0.0| temp: 1.99952 | loss: 1.17727| constrast_loss: 4.62279| div_loss: 0.8629| %_mask_idx: 0.42027| ppl: 87.74584| %_neg_is_pos: 0.01977| lr: 0.0| temp: 1.99952 | loss: 1.17612| constrast_loss: 4.61786| div_loss: 0.86628| %_mask_idx: 0.375| ppl: 85.58185| %_neg_is_pos: 0.02651| lr: 0.0| temp: 1.99951 | loss: 1.17518| constrast_loss: 4.61448| div_loss: 0.86229| %_mask_idx: 0.39019| ppl: 88.13239| %_neg_is_pos: 0.02962| lr: 0.0| temp: 1.99951 | loss: 1.17665| constrast_loss: 4.61972| div_loss: 0.869| %_mask_idx: 0.42419| ppl: 83.84214| %_neg_is_pos: 0.01945| lr: 0.0| temp: 1.9995 | loss: 1.17536| constrast_loss: 4.61469| div_loss: 0.86739| %_mask_idx: 0.34696| ppl: 84.87211| %_neg_is_pos: 0.0326| lr: 0.0| temp: 1.9995 | loss: 1.17514| constrast_loss: 4.61293| div_loss: 0.8764| %_mask_idx: 0.38095| ppl: 79.1008| %_neg_is_pos: 0.02881| lr: 0.0| temp: 1.99948 | loss: 1.17566| constrast_loss: 4.61564| div_loss: 0.86987| %_mask_idx: 0.40257| ppl: 83.28294| %_neg_is_pos: 0.0314| lr: 0.0| temp: 1.99948 | loss: 1.17504| constrast_loss: 4.61324| div_loss: 0.86908| %_mask_idx: 0.3761| ppl: 83.78812| %_neg_is_pos: 0.02893| lr: 0.0| temp: 1.99947 | loss: 1.1767| constrast_loss: 4.62036| div_loss: 0.86435| %_mask_idx: 0.40695| ppl: 86.81823| %_neg_is_pos: 0.02007| lr: 0.0| temp: 1.99947 [2021-09-01 21:24:51,272] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 4294967296, reducing to 2147483648.0 [2021-09-01 21:24:51,272] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 4294967296, reducing to 2147483648.0 | loss: 1.17543| constrast_loss: 4.6147| div_loss: 0.87013| %_mask_idx: 0.32989| ppl: 83.1185| %_neg_is_pos: 0.0412| lr: 0.0| temp: 1.99945 | loss: 1.17717| constrast_loss: 4.62159| div_loss: 0.8709| %_mask_idx: 0.40711| ppl: 82.62645| %_neg_is_pos: 0.01709| lr: 0.0| temp: 1.99945 [2021-09-01 21:24:59,070] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 2147483648.0, reducing to 1073741824.0 [2021-09-01 21:24:59,070] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 2147483648.0, reducing to 1073741824.0 | loss: 1.17597| constrast_loss: 4.61814| div_loss: 0.85734| %_mask_idx: 0.39912| ppl: 91.30342| %_neg_is_pos: 0.0164| lr: 0.0| temp: 1.99944 | loss: 1.17586| constrast_loss: 4.6158| div_loss: 0.87652| %_mask_idx: 0.40226| ppl: 79.0275| %_neg_is_pos: 0.03502| lr: 0.0| temp: 1.99944 [2021-09-01 21:25:06,875] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1073741824.0, reducing to 536870912.0 [2021-09-01 21:25:06,875] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1073741824.0, reducing to 536870912.0 | loss: 1.17689| constrast_loss: 4.6208| div_loss: 0.86753| %_mask_idx: 0.40492| ppl: 84.77861| %_neg_is_pos: 0.02077| lr: 0.0| temp: 1.99943 | loss: 1.17558| constrast_loss: 4.61488| div_loss: 0.87453| %_mask_idx: 0.38972| ppl: 80.30015| %_neg_is_pos: 0.02401| lr: 0.0| temp: 1.99943 [2021-09-01 21:25:15,116] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 536870912.0, reducing to 268435456.0 [2021-09-01 21:25:15,116] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 536870912.0, reducing to 268435456.0 | loss: 1.17535| constrast_loss: 4.61498| div_loss: 0.86405| %_mask_idx: 0.40429| ppl: 87.005| %_neg_is_pos: 0.01896| lr: 0.0| temp: 1.99942 | loss: 1.17512| constrast_loss: 4.61381| div_loss: 0.8668| %_mask_idx: 0.42027| ppl: 85.24995| %_neg_is_pos: 0.02918| lr: 0.0| temp: 1.99942 [2021-09-01 21:25:23,555] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 268435456.0, reducing to 134217728.0 [2021-09-01 21:25:23,555] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 268435456.0, reducing to 134217728.0 [2021-09-01 21:25:31,543] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 134217728.0, reducing to 67108864.0 [2021-09-01 21:25:31,543] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 134217728.0, reducing to 67108864.0 | loss: 1.17489| constrast_loss: 4.61147| div_loss: 0.88077| %_mask_idx: 0.40069| ppl: 76.30863| %_neg_is_pos: 0.02795| lr: 0.0| temp: 1.9994 | loss: 1.17559| constrast_loss: 4.61609| div_loss: 0.86256| %_mask_idx: 0.39959| ppl: 87.96375| %_neg_is_pos: 0.02199| lr: 0.0| temp: 1.9994 [2021-09-01 21:25:39,489] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 67108864.0, reducing to 33554432.0 [2021-09-01 21:25:39,489] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 67108864.0, reducing to 33554432.0 | loss: 1.17559| constrast_loss: 4.61412| div_loss: 0.88218| %_mask_idx: 0.41573| ppl: 75.40218| %_neg_is_pos: 0.0278| lr: 0.0| temp: 1.99939 | loss: 1.17809| constrast_loss: 4.62596| div_loss: 0.86403| %_mask_idx: 0.36357| ppl: 87.02325| %_neg_is_pos: 0.02638| lr: 0.0| temp: 1.99939 [2021-09-01 21:25:47,321] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 33554432.0, reducing to 16777216.0 [2021-09-01 21:25:47,321] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 33554432.0, reducing to 16777216.0 | loss: 1.17473| constrast_loss: 4.6103| div_loss: 0.88637| %_mask_idx: 0.36247| ppl: 72.72251| %_neg_is_pos: 0.04109| lr: 0.0| temp: 1.99938 | loss: 1.17518| constrast_loss: 4.61224| div_loss: 0.88468| %_mask_idx: 0.37547| ppl: 73.80407| %_neg_is_pos: 0.04485| lr: 0.0| temp: 1.99938 [2021-09-01 21:25:55,512] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 16777216.0, reducing to 8388608.0 [2021-09-01 21:25:55,512] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 16777216.0, reducing to 8388608.0 | loss: 1.17581| constrast_loss: 4.61528| div_loss: 0.87938| %_mask_idx: 0.34508| ppl: 77.19943| %_neg_is_pos: 0.03171| lr: 0.0| temp: 1.99937 | loss: 1.17575| constrast_loss: 4.61567| div_loss: 0.87348| %_mask_idx: 0.38518| ppl: 80.97064| %_neg_is_pos: 0.03415| lr: 0.0| temp: 1.99937 | loss: 1.17537| constrast_loss: 4.61553| div_loss: 0.85963| %_mask_idx: 0.41275| ppl: 89.83507| %_neg_is_pos: 0.02583| lr: 0.0| temp: 1.99935 | loss: 1.17628| constrast_loss: 4.61753| div_loss: 0.87583| %_mask_idx: 0.40257| ppl: 79.46609| %_neg_is_pos: 0.02472| lr: 0.0| temp: 1.99935 | loss: 1.17466| constrast_loss: 4.61138| div_loss: 0.87272| %_mask_idx: 0.45066| ppl: 81.46104| %_neg_is_pos: 0.01826| lr: 0.0| temp: 1.99934 | loss: 1.17598| constrast_loss: 4.61714| div_loss: 0.86771| %_mask_idx: 0.3537| ppl: 84.66612| %_neg_is_pos: 0.03573| lr: 0.0| temp: 1.99934 [2021-09-01 21:26:29,972] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 8388608.0, reducing to 4194304.0 [2021-09-01 21:26:29,972] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 8388608.0, reducing to 4194304.0 | loss: 1.17495| constrast_loss: 4.611| div_loss: 0.88809| %_mask_idx: 0.40382| ppl: 71.62444| %_neg_is_pos: 0.02765| lr: 0.0| temp: 1.99933 | loss: 1.17411| constrast_loss: 4.60823| div_loss: 0.88197| %_mask_idx: 0.38925| ppl: 75.54162| %_neg_is_pos: 0.02859| lr: 0.0| temp: 1.99933 | loss: 1.17578| constrast_loss: 4.6163| div_loss: 0.8682| %_mask_idx: 0.36623| ppl: 84.35468| %_neg_is_pos: 0.02474| lr: 0.0| temp: 1.99932 | loss: 1.17519| constrast_loss: 4.61374| div_loss: 0.87036| %_mask_idx: 0.39677| ppl: 82.96989| %_neg_is_pos: 0.02688| lr: 0.0| temp: 1.99932 | loss: 1.17365| constrast_loss: 4.60687| div_loss: 0.87711| %_mask_idx: 0.40069| ppl: 78.64652| %_neg_is_pos: 0.03193| lr: 0.0| temp: 1.9993 | loss: 1.17657| constrast_loss: 4.62093| div_loss: 0.85333| %_mask_idx: 0.42826| ppl: 93.8718| %_neg_is_pos: 0.01656| lr: 0.0| temp: 1.9993 | loss: 1.17499| constrast_loss: 4.61326| div_loss: 0.86706| %_mask_idx: 0.34101| ppl: 85.08402| %_neg_is_pos: 0.02497| lr: 0.0| temp: 1.99929 | loss: 1.17452| constrast_loss: 4.61115| div_loss: 0.86912| %_mask_idx: 0.40742| ppl: 83.76401| %_neg_is_pos: 0.02326| lr: 0.0| temp: 1.99929 | loss: 1.17281| constrast_loss: 4.60381| div_loss: 0.87423| %_mask_idx: 0.35479| ppl: 80.49133| %_neg_is_pos: 0.0442| lr: 0.0| temp: 1.99927 | loss: 1.17684| constrast_loss: 4.62068| div_loss: 0.8668| %_mask_idx: 0.42826| ppl: 85.25124| %_neg_is_pos: 0.02021| lr: 0.0| temp: 1.99927 | loss: 1.17622| constrast_loss: 4.61887| div_loss: 0.86007| %_mask_idx: 0.42497| ppl: 89.55561| %_neg_is_pos: 0.01575| lr: 0.0| temp: 1.99926 | loss: 1.17546| constrast_loss: 4.6156| div_loss: 0.86226| %_mask_idx: 0.40116| ppl: 88.15364| %_neg_is_pos: 0.0331| lr: 0.0| temp: 1.99926 | loss: 1.1755| constrast_loss: 4.61485| div_loss: 0.87153| %_mask_idx: 0.40461| ppl: 82.22183| %_neg_is_pos: 0.02082| lr: 0.0| temp: 1.99925 | loss: 1.17617| constrast_loss: 4.61826| div_loss: 0.86437| %_mask_idx: 0.45504| ppl: 86.80038| %_neg_is_pos: 0.01601| lr: 0.0| temp: 1.99925 | loss: 1.17626| constrast_loss: 4.61811| div_loss: 0.86936| %_mask_idx: 0.38941| ppl: 83.60943| %_neg_is_pos: 0.02323| lr: 0.0| temp: 1.99924 | loss: 1.17534| constrast_loss: 4.61523| div_loss: 0.86124| %_mask_idx: 0.3266| ppl: 88.80855| %_neg_is_pos: 0.03499| lr: 0.0| temp: 1.99924 | loss: 1.17379| constrast_loss: 4.60778| div_loss: 0.87387| %_mask_idx: 0.37516| ppl: 80.72137| %_neg_is_pos: 0.04218| lr: 0.0| temp: 1.99922 | loss: 1.17499| constrast_loss: 4.61236| div_loss: 0.87607| %_mask_idx: 0.34853| ppl: 79.31759| %_neg_is_pos: 0.03991| lr: 0.0| temp: 1.99922 | loss: 1.17591| constrast_loss: 4.6172| div_loss: 0.86434| %_mask_idx: 0.37171| ppl: 86.81932| %_neg_is_pos: 0.03173| lr: 0.0| temp: 1.99921 | loss: 1.17586| constrast_loss: 4.61703| div_loss: 0.86418| %_mask_idx: 0.34571| ppl: 86.92754| %_neg_is_pos: 0.04637| lr: 0.0| temp: 1.99921 | loss: 1.17528| constrast_loss: 4.61436| div_loss: 0.86748| %_mask_idx: 0.37359| ppl: 84.8141| %_neg_is_pos: 0.04095| lr: 0.0| temp: 1.9992 | loss: 1.1762| constrast_loss: 4.61747| div_loss: 0.87333| %_mask_idx: 0.38957| ppl: 81.06844| %_neg_is_pos: 0.01978| lr: 0.0| temp: 1.9992 | loss: 1.17601| constrast_loss: 4.61728| div_loss: 0.8677| %_mask_idx: 0.39176| ppl: 84.66948| %_neg_is_pos: 0.02423| lr: 0.0| temp: 1.99919 | loss: 1.17583| constrast_loss: 4.61574| div_loss: 0.87586| %_mask_idx: 0.38659| ppl: 79.45142| %_neg_is_pos: 0.02553| lr: 0.0| temp: 1.99919 | loss: 1.17554| constrast_loss: 4.61582| div_loss: 0.86336| %_mask_idx: 0.40821| ppl: 87.44872| %_neg_is_pos: 0.0241| lr: 0.0| temp: 1.99917 | loss: 1.17553| constrast_loss: 4.61664| div_loss: 0.8547| %_mask_idx: 0.39098| ppl: 92.99161| %_neg_is_pos: 0.0225| lr: 0.0| temp: 1.99917 | loss: 1.17598| constrast_loss: 4.6166| div_loss: 0.87328| %_mask_idx: 0.4093| ppl: 81.10308| %_neg_is_pos: 0.02816| lr: 0.0| temp: 1.99916 | loss: 1.1752| constrast_loss: 4.61417| div_loss: 0.86617| %_mask_idx: 0.37281| ppl: 85.65359| %_neg_is_pos: 0.02825| lr: 0.0| temp: 1.99916 | loss: 1.17517| constrast_loss: 4.61275| div_loss: 0.87944| %_mask_idx: 0.3844| ppl: 77.15777| %_neg_is_pos: 0.03375| lr: 0.0| temp: 1.99915 | loss: 1.17586| constrast_loss: 4.61569| div_loss: 0.87729| %_mask_idx: 0.39458| ppl: 78.53326| %_neg_is_pos: 0.03732| lr: 0.0| temp: 1.99915 | loss: 1.17687| constrast_loss: 4.62102| div_loss: 0.86462| %_mask_idx: 0.36137| ppl: 86.64626| %_neg_is_pos: 0.02107| lr: 0.0| temp: 1.99914 | loss: 1.17403| constrast_loss: 4.60732| div_loss: 0.88818| %_mask_idx: 0.34618| ppl: 71.56737| %_neg_is_pos: 0.04618| lr: 0.0| temp: 1.99914 | loss: 1.17507| constrast_loss: 4.61389| div_loss: 0.86405| %_mask_idx: 0.43076| ppl: 87.00951| %_neg_is_pos: 0.02065| lr: 0.0| temp: 1.99912 | loss: 1.17719| constrast_loss: 4.62312| div_loss: 0.85664| %_mask_idx: 0.39615| ppl: 91.74782| %_neg_is_pos: 0.02105| lr: 0.0| temp: 1.99912 | loss: 1.17506| constrast_loss: 4.61319| div_loss: 0.87043| %_mask_idx: 0.42372| ppl: 82.92728| %_neg_is_pos: 0.02213| lr: 0.0| temp: 1.99911 | loss: 1.17351| constrast_loss: 4.60577| div_loss: 0.88278| %_mask_idx: 0.36184| ppl: 75.02161| %_neg_is_pos: 0.04601| lr: 0.0| temp: 1.99911 | loss: 1.17532| constrast_loss: 4.61385| div_loss: 0.87427| %_mask_idx: 0.39756| ppl: 80.46922| %_neg_is_pos: 0.02268| lr: 0.0| temp: 1.99909 | loss: 1.17579| constrast_loss: 4.61628| div_loss: 0.86881| %_mask_idx: 0.38957| ppl: 83.96003| %_neg_is_pos: 0.02991| lr: 0.0| temp: 1.99909 | loss: 1.17559| constrast_loss: 4.61542| div_loss: 0.86919| %_mask_idx: 0.37437| ppl: 83.71542| %_neg_is_pos: 0.03244| lr: 0.0| temp: 1.99908 | loss: 1.17575| constrast_loss: 4.61548| div_loss: 0.87523| %_mask_idx: 0.37202| ppl: 79.85425| %_neg_is_pos: 0.0239| lr: 0.0| temp: 1.99908 | loss: 1.17427| constrast_loss: 4.60902| div_loss: 0.88049| %_mask_idx: 0.37046| ppl: 76.48511| %_neg_is_pos: 0.03444| lr: 0.0| temp: 1.99907 | loss: 1.17672| constrast_loss: 4.62087| div_loss: 0.85998| %_mask_idx: 0.3927| ppl: 89.61179| %_neg_is_pos: 0.02429| lr: 0.0| temp: 1.99907 | loss: 1.17664| constrast_loss: 4.62073| div_loss: 0.85815| %_mask_idx: 0.38596| ppl: 90.7845| %_neg_is_pos: 0.02037| lr: 0.0| temp: 1.99906 | loss: 1.1758| constrast_loss: 4.61478| div_loss: 0.88415| %_mask_idx: 0.39536| ppl: 74.14133| %_neg_is_pos: 0.02411| lr: 0.0| temp: 1.99906 | loss: 1.17647| constrast_loss: 4.61928| div_loss: 0.8661| %_mask_idx: 0.37249| ppl: 85.69836| %_neg_is_pos: 0.02195| lr: 0.0| temp: 1.99904 | loss: 1.17523| constrast_loss: 4.61399| div_loss: 0.86922| %_mask_idx: 0.43155| ppl: 83.69825| %_neg_is_pos: 0.01705| lr: 0.0| temp: 1.99904 | loss: 1.17539| constrast_loss: 4.61479| div_loss: 0.86758| %_mask_idx: 0.38675| ppl: 84.74947| %_neg_is_pos: 0.03021| lr: 0.0| temp: 1.99903 | loss: 1.17603| constrast_loss: 4.61685| div_loss: 0.87262| %_mask_idx: 0.3526| ppl: 81.52058| %_neg_is_pos: 0.03192| lr: 0.0| temp: 1.99903 | loss: 1.17603| constrast_loss: 4.6174| div_loss: 0.86729| %_mask_idx: 0.40648| ppl: 84.93177| %_neg_is_pos: 0.03564| lr: 0.0| temp: 1.99902 | loss: 1.1769| constrast_loss: 4.6216| div_loss: 0.85978| %_mask_idx: 0.34978| ppl: 89.73975| %_neg_is_pos: 0.03003| lr: 0.0| temp: 1.99902 | loss: 1.17562| constrast_loss: 4.61537| div_loss: 0.87093| %_mask_idx: 0.41792| ppl: 82.60388| %_neg_is_pos: 0.02073| lr: 0.0| temp: 1.99901 | loss: 1.17527| constrast_loss: 4.61382| div_loss: 0.87264| %_mask_idx: 0.36122| ppl: 81.5107| %_neg_is_pos: 0.03871| lr: 0.0| temp: 1.99901 | loss: 1.17606| constrast_loss: 4.61745| div_loss: 0.86773| %_mask_idx: 0.41839| ppl: 84.65219| %_neg_is_pos: 0.02358| lr: 0.0| temp: 1.99899 | loss: 1.17543| constrast_loss: 4.6141| div_loss: 0.8761| %_mask_idx: 0.4151| ppl: 79.29457| %_neg_is_pos: 0.02425| lr: 0.0| temp: 1.99899 | loss: 1.17705| constrast_loss: 4.62142| div_loss: 0.86789| %_mask_idx: 0.42888| ppl: 84.55029| %_neg_is_pos: 0.01519| lr: 0.0| temp: 1.99898 | loss: 1.17573| constrast_loss: 4.61612| div_loss: 0.86811| %_mask_idx: 0.38487| ppl: 84.40769| %_neg_is_pos: 0.03512| lr: 0.0| temp: 1.99898 | loss: 1.17396| constrast_loss: 4.60796| div_loss: 0.87878| %_mask_idx: 0.3573| ppl: 77.58064| %_neg_is_pos: 0.03547| lr: 0.0| temp: 1.99897 | loss: 1.17637| constrast_loss: 4.61953| div_loss: 0.85966| %_mask_idx: 0.40179| ppl: 89.81508| %_neg_is_pos: 0.01931| lr: 0.0| temp: 1.99897 | loss: 1.17564| constrast_loss: 4.61507| div_loss: 0.87487| %_mask_idx: 0.3786| ppl: 80.08398| %_neg_is_pos: 0.01914| lr: 0.0| temp: 1.99896 | loss: 1.17611| constrast_loss: 4.61722| div_loss: 0.87223| %_mask_idx: 0.35871| ppl: 81.77335| %_neg_is_pos: 0.01932| lr: 0.0| temp: 1.99896 | loss: 1.17539| constrast_loss: 4.61509| div_loss: 0.86486| %_mask_idx: 0.39583| ppl: 86.48701| %_neg_is_pos: 0.02201| lr: 0.0| temp: 1.99894 | loss: 1.17481| constrast_loss: 4.61096| div_loss: 0.88284| %_mask_idx: 0.34915| ppl: 74.98346| %_neg_is_pos: 0.0353| lr: 0.0| temp: 1.99894 | loss: 1.1747| constrast_loss: 4.61081| div_loss: 0.87991| %_mask_idx: 0.41353| ppl: 76.85717| %_neg_is_pos: 0.03101| lr: 0.0| temp: 1.99893 | loss: 1.17494| constrast_loss: 4.61217| div_loss: 0.87584| %_mask_idx: 0.41902| ppl: 79.46436| %_neg_is_pos: 0.03137| lr: 0.0| temp: 1.99893 | loss: 1.17607| constrast_loss: 4.61645| div_loss: 0.87812| %_mask_idx: 0.41714| ppl: 78.00011| %_neg_is_pos: 0.01708| lr: 0.0| temp: 1.99891 | loss: 1.17597| constrast_loss: 4.61647| div_loss: 0.87398| %_mask_idx: 0.39129| ppl: 80.65157| %_neg_is_pos: 0.01913| lr: 0.0| temp: 1.99891 | loss: 1.1757| constrast_loss: 4.61552| div_loss: 0.87296| %_mask_idx: 0.38158| ppl: 81.30658| %_neg_is_pos: 0.02244| lr: 0.0| temp: 1.9989 | loss: 1.1759| constrast_loss: 4.61765| div_loss: 0.85953| %_mask_idx: 0.37954| ppl: 89.90294| %_neg_is_pos: 0.02323| lr: 0.0| temp: 1.9989 | loss: 1.1744| constrast_loss: 4.60926| div_loss: 0.88334| %_mask_idx: 0.40257| ppl: 74.66418| %_neg_is_pos: 0.02891| lr: 0.0| temp: 1.99889 | loss: 1.17487| constrast_loss: 4.61208| div_loss: 0.87385| %_mask_idx: 0.40742| ppl: 80.734| %_neg_is_pos: 0.03687| lr: 0.0| temp: 1.99889 | loss: 1.17566| constrast_loss: 4.61593| div_loss: 0.86733| %_mask_idx: 0.42873| ppl: 84.90991| %_neg_is_pos: 0.0291| lr: 0.0| temp: 1.99888 | loss: 1.17609| constrast_loss: 4.61735| div_loss: 0.87015| %_mask_idx: 0.42246| ppl: 83.10693| %_neg_is_pos: 0.01921| lr: 0.0| temp: 1.99888 | loss: 1.17703| constrast_loss: 4.62086| div_loss: 0.87271| %_mask_idx: 0.41416| ppl: 81.46602| %_neg_is_pos: 0.02301| lr: 0.0| temp: 1.99886 | loss: 1.1772| constrast_loss: 4.62403| div_loss: 0.84761| %_mask_idx: 0.349| ppl: 97.52989| %_neg_is_pos: 0.02192| lr: 0.0| temp: 1.99886 | loss: 1.17486| constrast_loss: 4.61218| div_loss: 0.8724| %_mask_idx: 0.34994| ppl: 81.66646| %_neg_is_pos: 0.04136| lr: 0.0| temp: 1.99885 | loss: 1.17677| constrast_loss: 4.62019| div_loss: 0.86884| %_mask_idx: 0.35166| ppl: 83.9451| %_neg_is_pos: 0.02769| lr: 0.0| temp: 1.99885 | loss: 1.17441| constrast_loss: 4.61061| div_loss: 0.87045| %_mask_idx: 0.39427| ppl: 82.90964| %_neg_is_pos: 0.02469| lr: 0.0| temp: 1.99884 | loss: 1.17544| constrast_loss: 4.61395| div_loss: 0.87813| %_mask_idx: 0.35871| ppl: 77.99865| %_neg_is_pos: 0.03672| lr: 0.0| temp: 1.99884 | loss: 1.17422| constrast_loss: 4.60849| div_loss: 0.88401| %_mask_idx: 0.42168| ppl: 74.23137| %_neg_is_pos: 0.02976| lr: 0.0| temp: 1.99883 | loss: 1.17535| constrast_loss: 4.61334| div_loss: 0.88042| %_mask_idx: 0.41322| ppl: 76.53095| %_neg_is_pos: 0.02241| lr: 0.0| temp: 1.99883 | loss: 1.1762| constrast_loss: 4.61749| div_loss: 0.87302| %_mask_idx: 0.40241| ppl: 81.26503| %_neg_is_pos: 0.02246| lr: 0.0| temp: 1.99881 | loss: 1.1768| constrast_loss: 4.62092| div_loss: 0.86256| %_mask_idx: 0.42011| ppl: 87.95862| %_neg_is_pos: 0.0226| lr: 0.0| temp: 1.99881 | loss: 1.17709| constrast_loss: 4.62228| div_loss: 0.86069| %_mask_idx: 0.39959| ppl: 89.16112| %_neg_is_pos: 0.02041| lr: 0.0| temp: 1.9988 | loss: 1.17564| constrast_loss: 4.61545| div_loss: 0.87103| %_mask_idx: 0.39176| ppl: 82.53806| %_neg_is_pos: 0.03741| lr: 0.0| temp: 1.9988 | loss: 1.17389| constrast_loss: 4.6079| div_loss: 0.87663| %_mask_idx: 0.38299| ppl: 78.95871| %_neg_is_pos: 0.0475| lr: 0.0| temp: 1.99879 | loss: 1.17531| constrast_loss: 4.61473| div_loss: 0.86523| %_mask_idx: 0.36137| ppl: 86.25555| %_neg_is_pos: 0.03202| lr: 0.0| temp: 1.99879 | loss: 1.17591| constrast_loss: 4.61786| div_loss: 0.85773| %_mask_idx: 0.41573| ppl: 91.05585| %_neg_is_pos: 0.02002| lr: 0.0| temp: 1.99878 | loss: 1.17565| constrast_loss: 4.61525| div_loss: 0.87357| %_mask_idx: 0.3407| ppl: 80.91409| %_neg_is_pos: 0.03757| lr: 0.0| temp: 1.99878 | loss: 1.17572| constrast_loss: 4.61551| div_loss: 0.87367| %_mask_idx: 0.41886| ppl: 80.85091| %_neg_is_pos: 0.02834| lr: 0.0| temp: 1.99876 | loss: 1.17666| constrast_loss: 4.61945| div_loss: 0.8719| %_mask_idx: 0.39803| ppl: 81.98492| %_neg_is_pos: 0.03029| lr: 0.0| temp: 1.99876 | loss: 1.17485| constrast_loss: 4.61177| div_loss: 0.87625| %_mask_idx: 0.36607| ppl: 79.19855| %_neg_is_pos: 0.03789| lr: 0.0| temp: 1.99875 | loss: 1.17647| constrast_loss: 4.61903| div_loss: 0.86857| %_mask_idx: 0.41761| ppl: 84.11694| %_neg_is_pos: 0.01812| lr: 0.0| temp: 1.99875 [2021-09-01 21:33:57,000] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 4194304.0, reducing to 2097152.0 [2021-09-01 21:33:57,001] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 4194304.0, reducing to 2097152.0 | loss: 1.17657| constrast_loss: 4.6209| div_loss: 0.8537| %_mask_idx: 0.42732| ppl: 93.62886| %_neg_is_pos: 0.0158| lr: 0.0| temp: 1.99873 | loss: 1.17453| constrast_loss: 4.61003| div_loss: 0.88075| %_mask_idx: 0.41228| ppl: 76.31931| %_neg_is_pos: 0.02615| lr: 0.0| temp: 1.99873 | loss: 1.1757| constrast_loss: 4.61641| div_loss: 0.86387| %_mask_idx: 0.37798| ppl: 87.12177| %_neg_is_pos: 0.02252| lr: 0.0| temp: 1.99872 | loss: 1.1751| constrast_loss: 4.61377| div_loss: 0.86621| %_mask_idx: 0.36451| ppl: 85.62645| %_neg_is_pos: 0.04251| lr: 0.0| temp: 1.99872 | loss: 1.17489| constrast_loss: 4.61146| div_loss: 0.88099| %_mask_idx: 0.39928| ppl: 76.16382| %_neg_is_pos: 0.03257| lr: 0.0| temp: 1.99871 | loss: 1.17626| constrast_loss: 4.61939| div_loss: 0.85632| %_mask_idx: 0.38393| ppl: 91.95806| %_neg_is_pos: 0.01591| lr: 0.0| temp: 1.99871 | loss: 1.17636| constrast_loss: 4.61961| div_loss: 0.8583| %_mask_idx: 0.38221| ppl: 90.68944| %_neg_is_pos: 0.02535| lr: 0.0| temp: 1.9987 | loss: 1.17507| constrast_loss: 4.61306| div_loss: 0.8722| %_mask_idx: 0.34994| ppl: 81.79152| %_neg_is_pos: 0.03564| lr: 0.0| temp: 1.9987 | loss: 1.17636| constrast_loss: 4.61815| div_loss: 0.87282| %_mask_idx: 0.37735| ppl: 81.39458| %_neg_is_pos: 0.02625| lr: 0.0| temp: 1.99868 | loss: 1.17553| constrast_loss: 4.61568| div_loss: 0.86423| %_mask_idx: 0.3974| ppl: 86.89139| %_neg_is_pos: 0.02054| lr: 0.0| temp: 1.99868 | loss: 1.17525| constrast_loss: 4.61324| div_loss: 0.87739| %_mask_idx: 0.44157| ppl: 78.47133| %_neg_is_pos: 0.03116| lr: 0.0| temp: 1.99867 | loss: 1.17357| constrast_loss: 4.60669| div_loss: 0.87607| %_mask_idx: 0.3916| ppl: 79.31525| %_neg_is_pos: 0.0439| lr: 0.0| temp: 1.99867 | loss: 1.17466| constrast_loss: 4.61093| div_loss: 0.87699| %_mask_idx: 0.36795| ppl: 78.72327| %_neg_is_pos: 0.03733| lr: 0.0| temp: 1.99866 | loss: 1.176| constrast_loss: 4.61757| div_loss: 0.86437| %_mask_idx: 0.34821| ppl: 86.8058| %_neg_is_pos: 0.02698| lr: 0.0| temp: 1.99866 | loss: 1.17513| constrast_loss: 4.61444| div_loss: 0.86082| %_mask_idx: 0.37328| ppl: 89.07712| %_neg_is_pos: 0.0167| lr: 0.0| temp: 1.99865 | loss: 1.17658| constrast_loss: 4.6199| div_loss: 0.86435| %_mask_idx: 0.40069| ppl: 86.81811| %_neg_is_pos: 0.02136| lr: 0.0| temp: 1.99865 | loss: 1.17504| constrast_loss: 4.61192| div_loss: 0.88254| %_mask_idx: 0.35636| ppl: 75.17519| %_neg_is_pos: 0.03817| lr: 0.0| temp: 1.99863 | loss: 1.17609| constrast_loss: 4.61765| div_loss: 0.86714| %_mask_idx: 0.388| ppl: 85.02954| %_neg_is_pos: 0.02718| lr: 0.0| temp: 1.99863 | loss: 1.17619| constrast_loss: 4.61899| div_loss: 0.85783| %_mask_idx: 0.3869| ppl: 90.98608| %_neg_is_pos: 0.02245| lr: 0.0| temp: 1.99862 | loss: 1.17582| constrast_loss: 4.61732| div_loss: 0.85953| %_mask_idx: 0.39568| ppl: 89.8987| %_neg_is_pos: 0.01481| lr: 0.0| temp: 1.99862 | loss: 1.17771| constrast_loss: 4.62482| div_loss: 0.86037| %_mask_idx: 0.41134| ppl: 89.3623| %_neg_is_pos: 0.01344| lr: 0.0| temp: 1.99861 | loss: 1.17445| constrast_loss: 4.60959| div_loss: 0.88207| %_mask_idx: 0.33318| ppl: 75.47633| %_neg_is_pos: 0.03765| lr: 0.0| temp: 1.99861 | loss: 1.17279| constrast_loss: 4.60172| div_loss: 0.89426| %_mask_idx: 0.33036| ppl: 67.67216| %_neg_is_pos: 0.05148| lr: 0.0| temp: 1.9986 | loss: 1.17562| constrast_loss: 4.61592| div_loss: 0.8657| %_mask_idx: 0.40915| ppl: 85.95445| %_neg_is_pos: 0.01777| lr: 0.0| temp: 1.9986 | loss: 1.1745| constrast_loss: 4.61135| div_loss: 0.86647| %_mask_idx: 0.40633| ppl: 85.45752| %_neg_is_pos: 0.02013| lr: 0.0| temp: 1.99858 | loss: 1.17485| constrast_loss: 4.61225| div_loss: 0.8715| %_mask_idx: 0.38753| ppl: 82.23804| %_neg_is_pos: 0.02587| lr: 0.0| temp: 1.99858 | loss: 1.17514| constrast_loss: 4.61343| div_loss: 0.87126| %_mask_idx: 0.42262| ppl: 82.3941| %_neg_is_pos: 0.02889| lr: 0.0| temp: 1.99857 | loss: 1.17382| constrast_loss: 4.60764| div_loss: 0.87647| %_mask_idx: 0.41776| ppl: 79.05631| %_neg_is_pos: 0.03094| lr: 0.0| temp: 1.99857 | loss: 1.17626| constrast_loss: 4.61878| div_loss: 0.86258| %_mask_idx: 0.4234| ppl: 87.94694| %_neg_is_pos: 0.02537| lr: 0.0| temp: 1.99855 | loss: 1.17449| constrast_loss: 4.61191| div_loss: 0.8604| %_mask_idx: 0.40226| ppl: 89.34213| %_neg_is_pos: 0.02785| lr: 0.0| temp: 1.99855 | loss: 1.17403| constrast_loss: 4.6088| div_loss: 0.87307| %_mask_idx: 0.41855| ppl: 81.23721| %_neg_is_pos: 0.02763| lr: 0.0| temp: 1.99854 | loss: 1.17556| constrast_loss: 4.61606| div_loss: 0.86187| %_mask_idx: 0.42732| ppl: 88.40439| %_neg_is_pos: 0.01945| lr: 0.0| temp: 1.99854 | loss: 1.17594| constrast_loss: 4.61718| div_loss: 0.86569| %_mask_idx: 0.39912| ppl: 85.95884| %_neg_is_pos: 0.02729| lr: 0.0| temp: 1.99853 | loss: 1.17414| constrast_loss: 4.60893| div_loss: 0.87645| %_mask_idx: 0.41526| ppl: 79.07134| %_neg_is_pos: 0.03121| lr: 0.0| temp: 1.99853 | loss: 1.17575| constrast_loss: 4.61613| div_loss: 0.86879| %_mask_idx: 0.38048| ppl: 83.97365| %_neg_is_pos: 0.02774| lr: 0.0| temp: 1.99852 | loss: 1.17465| constrast_loss: 4.61069| div_loss: 0.87895| %_mask_idx: 0.3927| ppl: 77.47119| %_neg_is_pos: 0.02588| lr: 0.0| temp: 1.99852 | loss: 1.1748| constrast_loss: 4.61175| div_loss: 0.87441| %_mask_idx: 0.34618| ppl: 80.37924| %_neg_is_pos: 0.03221| lr: 0.0| temp: 1.9985 | loss: 1.17446| constrast_loss: 4.6104| div_loss: 0.87428| %_mask_idx: 0.38737| ppl: 80.46227| %_neg_is_pos: 0.02886| lr: 0.0| temp: 1.9985 | loss: 1.1754| constrast_loss: 4.61451| div_loss: 0.87081| %_mask_idx: 0.4021| ppl: 82.67994| %_neg_is_pos: 0.02563| lr: 0.0| temp: 1.99849 | loss: 1.17479| constrast_loss: 4.61319| div_loss: 0.8596| %_mask_idx: 0.41557| ppl: 89.85822| %_neg_is_pos: 0.02299| lr: 0.0| temp: 1.99849 | loss: 1.17463| constrast_loss: 4.61159| div_loss: 0.86921| %_mask_idx: 0.34821| ppl: 83.70441| %_neg_is_pos: 0.04698| lr: 0.0| temp: 1.99848 | loss: 1.17539| constrast_loss: 4.61587| div_loss: 0.85687| %_mask_idx: 0.36059| ppl: 91.60453| %_neg_is_pos: 0.02041| lr: 0.0| temp: 1.99848 | loss: 1.17509| constrast_loss: 4.61392| div_loss: 0.86424| %_mask_idx: 0.3963| ppl: 86.88881| %_neg_is_pos: 0.01864| lr: 0.0| temp: 1.99847 | loss: 1.17499| constrast_loss: 4.6129| div_loss: 0.87043| %_mask_idx: 0.37688| ppl: 82.92584| %_neg_is_pos: 0.02817| lr: 0.0| temp: 1.99847 | loss: 1.17566| constrast_loss: 4.61643| div_loss: 0.8622| %_mask_idx: 0.4256| ppl: 88.19173| %_neg_is_pos: 0.02263| lr: 0.0| temp: 1.99845 | loss: 1.17392| constrast_loss: 4.60943| div_loss: 0.86247| %_mask_idx: 0.33302| ppl: 88.02084| %_neg_is_pos: 0.03032| lr: 0.0| temp: 1.99845 | loss: 1.17431| constrast_loss: 4.61026| div_loss: 0.86969| %_mask_idx: 0.38377| ppl: 83.39767| %_neg_is_pos: 0.02503| lr: 0.0| temp: 1.99844 | loss: 1.17269| constrast_loss: 4.60258| div_loss: 0.88171| %_mask_idx: 0.35887| ppl: 75.70332| %_neg_is_pos: 0.04604| lr: 0.0| temp: 1.99844 | loss: 1.17415| constrast_loss: 4.60999| div_loss: 0.86621| %_mask_idx: 0.4422| ppl: 85.62579| %_neg_is_pos: 0.01651| lr: 0.0| temp: 1.99843 | loss: 1.17448| constrast_loss: 4.61053| div_loss: 0.87388| %_mask_idx: 0.36654| ppl: 80.71491| %_neg_is_pos: 0.03143| lr: 0.0| temp: 1.99843 | loss: 1.17259| constrast_loss: 4.60232| div_loss: 0.88043| %_mask_idx: 0.38283| ppl: 76.52307| %_neg_is_pos: 0.03363| lr: 0.0| temp: 1.99842 | loss: 1.17491| constrast_loss: 4.61225| div_loss: 0.87398| %_mask_idx: 0.36357| ppl: 80.65511| %_neg_is_pos: 0.02708| lr: 0.0| temp: 1.99842 | loss: 1.17383| constrast_loss: 4.60785| div_loss: 0.87458| %_mask_idx: 0.40821| ppl: 80.26702| %_neg_is_pos: 0.03073| lr: 0.0| temp: 1.9984 | loss: 1.17511| constrast_loss: 4.613| div_loss: 0.87458| %_mask_idx: 0.41557| ppl: 80.26637| %_neg_is_pos: 0.03451| lr: 0.0| temp: 1.9984 | loss: 1.17469| constrast_loss: 4.61177| div_loss: 0.86975| %_mask_idx: 0.39536| ppl: 83.36018| %_neg_is_pos: 0.02431| lr: 0.0| temp: 1.99839 | loss: 1.17526| constrast_loss: 4.61424| div_loss: 0.86794| %_mask_idx: 0.39897| ppl: 84.51718| %_neg_is_pos: 0.02145| lr: 0.0| temp: 1.99839 | loss: 1.17632| constrast_loss: 4.61867| div_loss: 0.86619| %_mask_idx: 0.38847| ppl: 85.63922| %_neg_is_pos: 0.02496| lr: 0.0| temp: 1.99837 | loss: 1.17593| constrast_loss: 4.61781| div_loss: 0.85902| %_mask_idx: 0.40742| ppl: 90.22652| %_neg_is_pos: 0.02184| lr: 0.0| temp: 1.99837 | loss: 1.17517| constrast_loss: 4.61406| div_loss: 0.86614| %_mask_idx: 0.38925| ppl: 85.67316| %_neg_is_pos: 0.03122| lr: 0.0| temp: 1.99836 | loss: 1.17518| constrast_loss: 4.61374| div_loss: 0.86964| %_mask_idx: 0.34477| ppl: 83.42994| %_neg_is_pos: 0.04291| lr: 0.0| temp: 1.99836 | loss: 1.17672| constrast_loss: 4.62048| div_loss: 0.86383| %_mask_idx: 0.40883| ppl: 87.15099| %_neg_is_pos: 0.01811| lr: 0.0| temp: 1.99835 | loss: 1.1737| constrast_loss: 4.60792| div_loss: 0.86877| %_mask_idx: 0.39458| ppl: 83.98848| %_neg_is_pos: 0.03435| lr: 0.0| temp: 1.99835 | loss: 1.17396| constrast_loss: 4.60807| div_loss: 0.87789| %_mask_idx: 0.36717| ppl: 78.15124| %_neg_is_pos: 0.03747| lr: 0.0| temp: 1.99834 | loss: 1.17361| constrast_loss: 4.60753| div_loss: 0.86919| %_mask_idx: 0.43922| ppl: 83.7207| %_neg_is_pos: 0.01466| lr: 0.0| temp: 1.99834 | loss: 1.17548| constrast_loss: 4.61586| div_loss: 0.86041| %_mask_idx: 0.35338| ppl: 89.33848| %_neg_is_pos: 0.01555| lr: 0.0| temp: 1.99832 | loss: 1.17547| constrast_loss: 4.61473| div_loss: 0.87163| %_mask_idx: 0.38236| ppl: 82.15633| %_neg_is_pos: 0.02246| lr: 0.0| temp: 1.99832 | loss: 1.17392| constrast_loss: 4.6092| div_loss: 0.86475| %_mask_idx: 0.3443| ppl: 86.56097| %_neg_is_pos: 0.03592| lr: 0.0| temp: 1.99831 | loss: 1.17614| constrast_loss: 4.6186| div_loss: 0.85969| %_mask_idx: 0.40899| ppl: 89.79829| %_neg_is_pos: 0.01936| lr: 0.0| temp: 1.99831 | loss: 1.17604| constrast_loss: 4.61777| div_loss: 0.86401| %_mask_idx: 0.3916| ppl: 87.03244| %_neg_is_pos: 0.02221| lr: 0.0| temp: 1.9983 | loss: 1.17487| constrast_loss: 4.61166| div_loss: 0.87825| %_mask_idx: 0.42607| ppl: 77.92032| %_neg_is_pos: 0.02665| lr: 0.0| temp: 1.9983 | loss: 1.17398| constrast_loss: 4.60759| div_loss: 0.88339| %_mask_idx: 0.39615| ppl: 74.62882| %_neg_is_pos: 0.02591| lr: 0.0| temp: 1.99829 | loss: 1.17368| constrast_loss: 4.60625| div_loss: 0.88466| %_mask_idx: 0.37845| ppl: 73.81923| %_neg_is_pos: 0.03403| lr: 0.0| temp: 1.99829 | loss: 1.17313| constrast_loss: 4.60433| div_loss: 0.88204| %_mask_idx: 0.38737| ppl: 75.49551| %_neg_is_pos: 0.03762| lr: 0.0| temp: 1.99827 | loss: 1.17492| constrast_loss: 4.6119| div_loss: 0.87768| %_mask_idx: 0.34947| ppl: 78.2831| %_neg_is_pos: 0.03083| lr: 0.0| temp: 1.99827 | loss: 1.17492| constrast_loss: 4.61304| div_loss: 0.86638| %_mask_idx: 0.3396| ppl: 85.51594| %_neg_is_pos: 0.02866| lr: 0.0| temp: 1.99826 | loss: 1.17533| constrast_loss: 4.61502| div_loss: 0.86295| %_mask_idx: 0.36466| ppl: 87.71072| %_neg_is_pos: 0.0276| lr: 0.0| temp: 1.99826 | loss: 1.17545| constrast_loss: 4.61636| div_loss: 0.8545| %_mask_idx: 0.3786| ppl: 93.11911| %_neg_is_pos: 0.02366| lr: 0.0| temp: 1.99825 | loss: 1.17592| constrast_loss: 4.61701| div_loss: 0.86658| %_mask_idx: 0.41714| ppl: 85.38799| %_neg_is_pos: 0.0131| lr: 0.0| temp: 1.99825 | loss: 1.1737| constrast_loss: 4.60626| div_loss: 0.88538| %_mask_idx: 0.33145| ppl: 73.35829| %_neg_is_pos: 0.03577| lr: 0.0| temp: 1.99824 | loss: 1.17455| constrast_loss: 4.6104| div_loss: 0.87817| %_mask_idx: 0.36858| ppl: 77.96972| %_neg_is_pos: 0.03395| lr: 0.0| temp: 1.99824 | loss: 1.17498| constrast_loss: 4.61272| div_loss: 0.872| %_mask_idx: 0.37281| ppl: 81.91782| %_neg_is_pos: 0.02904| lr: 0.0| temp: 1.99822 | loss: 1.17501| constrast_loss: 4.61311| div_loss: 0.86947| %_mask_idx: 0.38033| ppl: 83.53749| %_neg_is_pos: 0.02716| lr: 0.0| temp: 1.99822 | loss: 1.17423| constrast_loss: 4.6094| div_loss: 0.87503| %_mask_idx: 0.37735| ppl: 79.98372| %_neg_is_pos: 0.03244| lr: 0.0| temp: 1.99821 | loss: 1.17546| constrast_loss: 4.6159| div_loss: 0.85953| %_mask_idx: 0.38847| ppl: 89.90154| %_neg_is_pos: 0.02309| lr: 0.0| temp: 1.99821 | loss: 1.17574| constrast_loss: 4.61665| div_loss: 0.86304| %_mask_idx: 0.41698| ppl: 87.65323| %_neg_is_pos: 0.02273| lr: 0.0| temp: 1.99819 | loss: 1.17326| constrast_loss: 4.60479| div_loss: 0.88245| %_mask_idx: 0.37093| ppl: 75.23512| %_neg_is_pos: 0.05016| lr: 0.0| temp: 1.99819 | loss: 1.1757| constrast_loss: 4.61599| div_loss: 0.86819| %_mask_idx: 0.35714| ppl: 84.35939| %_neg_is_pos: 0.02505| lr: 0.0| temp: 1.99818 | loss: 1.17505| constrast_loss: 4.61202| div_loss: 0.88169| %_mask_idx: 0.36779| ppl: 75.71532| %_neg_is_pos: 0.02665| lr: 0.0| temp: 1.99818 | loss: 1.17498| constrast_loss: 4.61319| div_loss: 0.86745| %_mask_idx: 0.40241| ppl: 84.8317| %_neg_is_pos: 0.02756| lr: 0.0| temp: 1.99817 | loss: 1.17494| constrast_loss: 4.61267| div_loss: 0.87091| %_mask_idx: 0.36341| ppl: 82.61502| %_neg_is_pos: 0.0421| lr: 0.0| temp: 1.99817 | loss: 1.17599| constrast_loss: 4.61832| div_loss: 0.85628| %_mask_idx: 0.388| ppl: 91.98241| %_neg_is_pos: 0.02698| lr: 0.0| temp: 1.99816 | loss: 1.17599| constrast_loss: 4.61736| div_loss: 0.86612| %_mask_idx: 0.43922| ppl: 85.68593| %_neg_is_pos: 0.02102| lr: 0.0| temp: 1.99816 | loss: 1.17466| constrast_loss: 4.61118| div_loss: 0.87448| %_mask_idx: 0.35777| ppl: 80.33485| %_neg_is_pos: 0.0245| lr: 0.0| temp: 1.99814 | loss: 1.17536| constrast_loss: 4.61528| div_loss: 0.86156| %_mask_idx: 0.36717| ppl: 88.60323| %_neg_is_pos: 0.02009| lr: 0.0| temp: 1.99814 | loss: 1.17473| constrast_loss: 4.61142| div_loss: 0.87504| %_mask_idx: 0.36372| ppl: 79.97621| %_neg_is_pos: 0.03128| lr: 0.0| temp: 1.99813 | loss: 1.17595| constrast_loss: 4.6169| div_loss: 0.869| %_mask_idx: 0.37829| ppl: 83.83844| %_neg_is_pos: 0.02247| lr: 0.0| temp: 1.99813 | loss: 1.17593| constrast_loss: 4.61825| div_loss: 0.85491| %_mask_idx: 0.39803| ppl: 92.85767| %_neg_is_pos: 0.01799| lr: 0.0| temp: 1.99812 | loss: 1.17502| constrast_loss: 4.61293| div_loss: 0.87134| %_mask_idx: 0.40179| ppl: 82.34061| %_neg_is_pos: 0.02857| lr: 0.0| temp: 1.99812 | loss: 1.17441| constrast_loss: 4.60995| div_loss: 0.87706| %_mask_idx: 0.43719| ppl: 78.68031| %_neg_is_pos: 0.02108| lr: 0.0| temp: 1.99811 | loss: 1.17594| constrast_loss: 4.61678| div_loss: 0.86985| %_mask_idx: 0.39176| ppl: 83.29878| %_neg_is_pos: 0.02909| lr: 0.0| temp: 1.99811 | loss: 1.17396| constrast_loss: 4.60903| div_loss: 0.86806| %_mask_idx: 0.40868| ppl: 84.44315| %_neg_is_pos: 0.02667| lr: 0.0| temp: 1.99809 | loss: 1.17446| constrast_loss: 4.61071| div_loss: 0.87109| %_mask_idx: 0.36858| ppl: 82.50552| %_neg_is_pos: 0.02525| lr: 0.0| temp: 1.99809 | loss: 1.17503| constrast_loss: 4.61281| div_loss: 0.87293| %_mask_idx: 0.40852| ppl: 81.32336| %_neg_is_pos: 0.02961| lr: 0.0| temp: 1.99808 | loss: 1.17554| constrast_loss: 4.61541| div_loss: 0.8676| %_mask_idx: 0.40163| ppl: 84.73303| %_neg_is_pos: 0.03042| lr: 0.0| temp: 1.99808 | loss: 1.17588| constrast_loss: 4.61664| div_loss: 0.86891| %_mask_idx: 0.36811| ppl: 83.89625| %_neg_is_pos: 0.02309| lr: 0.0| temp: 1.99807 | loss: 1.17623| constrast_loss: 4.61826| div_loss: 0.86659| %_mask_idx: 0.33145| ppl: 85.38106| %_neg_is_pos: 0.02982| lr: 0.0| temp: 1.99807 | loss: 1.17602| constrast_loss: 4.6176| div_loss: 0.86465| %_mask_idx: 0.42137| ppl: 86.62615| %_neg_is_pos: 0.01621| lr: 0.0| temp: 1.99806 | loss: 1.17506| constrast_loss: 4.61412| div_loss: 0.86123| %_mask_idx: 0.42607| ppl: 88.80984| %_neg_is_pos: 0.01814| lr: 0.0| temp: 1.99806 | loss: 1.17339| constrast_loss: 4.60575| div_loss: 0.87816| %_mask_idx: 0.38925| ppl: 77.97551| %_neg_is_pos: 0.03758| lr: 0.0| temp: 1.99804 | loss: 1.17656| constrast_loss: 4.61942| div_loss: 0.86825| %_mask_idx: 0.4198| ppl: 84.3179| %_neg_is_pos: 0.02241| lr: 0.0| temp: 1.99804 | loss: 1.17488| constrast_loss: 4.61128| div_loss: 0.88224| %_mask_idx: 0.36842| ppl: 75.3638| %_neg_is_pos: 0.02382| lr: 0.0| temp: 1.99803 | loss: 1.17257| constrast_loss: 4.60242| div_loss: 0.87847| %_mask_idx: 0.33756| ppl: 77.77628| %_neg_is_pos: 0.0458| lr: 0.0| temp: 1.99803 [2021-09-01 21:43:11,653] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 2097152.0, reducing to 1048576.0 [2021-09-01 21:43:11,654] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 2097152.0, reducing to 1048576.0 | loss: 1.17503| constrast_loss: 4.61357| div_loss: 0.8657| %_mask_idx: 0.3537| ppl: 85.95504| %_neg_is_pos: 0.03563| lr: 0.0| temp: 1.99801 | loss: 1.17449| constrast_loss: 4.6105| div_loss: 0.87472| %_mask_idx: 0.39019| ppl: 80.17722| %_neg_is_pos: 0.03576| lr: 0.0| temp: 1.99801 | loss: 1.17443| constrast_loss: 4.61103| div_loss: 0.86698| %_mask_idx: 0.39897| ppl: 85.13306| %_neg_is_pos: 0.02983| lr: 0.0| temp: 1.998 | loss: 1.17481| constrast_loss: 4.61213| div_loss: 0.87111| %_mask_idx: 0.39583| ppl: 82.48893| %_neg_is_pos: 0.02921| lr: 0.0| temp: 1.998 | loss: 1.17515| constrast_loss: 4.61366| div_loss: 0.8695| %_mask_idx: 0.40351| ppl: 83.51984| %_neg_is_pos: 0.02586| lr: 0.0| temp: 1.99799 | loss: 1.17288| constrast_loss: 4.60416| div_loss: 0.87374| %_mask_idx: 0.37281| ppl: 80.80896| %_neg_is_pos: 0.02928| lr: 0.0| temp: 1.99799 | loss: 1.17408| constrast_loss: 4.61012| div_loss: 0.86212| %_mask_idx: 0.39442| ppl: 88.2451| %_neg_is_pos: 0.02227| lr: 0.0| temp: 1.99798 | loss: 1.17365| constrast_loss: 4.60701| div_loss: 0.8757| %_mask_idx: 0.36059| ppl: 79.5495| %_neg_is_pos: 0.03357| lr: 0.0| temp: 1.99798 | loss: 1.17572| constrast_loss: 4.61766| div_loss: 0.85205| %_mask_idx: 0.38456| ppl: 94.69057| %_neg_is_pos: 0.02032| lr: 0.0| temp: 1.99796 | loss: 1.17575| constrast_loss: 4.61642| div_loss: 0.86586| %_mask_idx: 0.35025| ppl: 85.85017| %_neg_is_pos: 0.02882| lr: 0.0| temp: 1.99796 | loss: 1.173| constrast_loss: 4.604| div_loss: 0.88014| %_mask_idx: 0.36795| ppl: 76.70811| %_neg_is_pos: 0.04819| lr: 0.0| temp: 1.99795 | loss: 1.17561| constrast_loss: 4.61526| div_loss: 0.87161| %_mask_idx: 0.36717| ppl: 82.1695| %_neg_is_pos: 0.02065| lr: 0.0| temp: 1.99795 | loss: 1.17581| constrast_loss: 4.61741| div_loss: 0.85845| %_mask_idx: 0.41886| ppl: 90.59022| %_neg_is_pos: 0.01278| lr: 0.0| temp: 1.99794 | loss: 1.17438| constrast_loss: 4.61075| div_loss: 0.86791| %_mask_idx: 0.4328| ppl: 84.53793| %_neg_is_pos: 0.02158| lr: 0.0| temp: 1.99794 | loss: 1.1745| constrast_loss: 4.61191| div_loss: 0.86083| %_mask_idx: 0.4104| ppl: 89.06789| %_neg_is_pos: 0.03115| lr: 0.0| temp: 1.99793 | loss: 1.17353| constrast_loss: 4.60648| div_loss: 0.87632| %_mask_idx: 0.41165| ppl: 79.15572| %_neg_is_pos: 0.031| lr: 0.0| temp: 1.99793 | loss: 1.17477| constrast_loss: 4.61182| div_loss: 0.87273| %_mask_idx: 0.40006| ppl: 81.45358| %_neg_is_pos: 0.02311| lr: 0.0| temp: 1.99791| loss: 1.17174| constrast_loss: 4.59951| div_loss: 0.87443| %_mask_idx: 0.40006| ppl: 80.36438| %_neg_is_pos: 0.04556| lr: 0.0| temp: 1.99791 | loss: 1.17433| constrast_loss: 4.61006| div_loss: 0.87257| %_mask_idx: 0.4198| ppl: 81.55235| %_neg_is_pos: 0.02083| lr: 0.0| temp: 1.9979 | loss: 1.17299| constrast_loss: 4.60397| div_loss: 0.88004| %_mask_idx: 0.40977| ppl: 76.77525| %_neg_is_pos: 0.03769| lr: 0.0| temp: 1.9979 | loss: 1.17478| constrast_loss: 4.61154| div_loss: 0.87563| %_mask_idx: 0.42011| ppl: 79.59459| %_neg_is_pos: 0.02295| lr: 0.0| temp: 1.99789 | loss: 1.17425| constrast_loss: 4.61031| div_loss: 0.86698| %_mask_idx: 0.35949| ppl: 85.13409| %_neg_is_pos: 0.03545| lr: 0.0| temp: 1.99789 | loss: 1.17294| constrast_loss: 4.60343| div_loss: 0.88327| %_mask_idx: 0.43233| ppl: 74.70806| %_neg_is_pos: 0.02279| lr: 0.0| temp: 1.99788 | loss: 1.1738| constrast_loss: 4.60755| div_loss: 0.8766| %_mask_idx: 0.36638| ppl: 78.97652| %_neg_is_pos: 0.03682| lr: 0.0| temp: 1.99788 | loss: 1.17308| constrast_loss: 4.60609| div_loss: 0.86215| %_mask_idx: 0.39301| ppl: 88.22208| %_neg_is_pos: 0.02337| lr: 0.0| temp: 1.99786 | loss: 1.17567| constrast_loss: 4.61681| div_loss: 0.85879| %_mask_idx: 0.42607| ppl: 90.37304| %_neg_is_pos: 0.02171| lr: 0.0| temp: 1.99786 | loss: 1.17424| constrast_loss: 4.60995| div_loss: 0.87002| %_mask_idx: 0.38988| ppl: 83.18426| %_neg_is_pos: 0.02466| lr: 0.0| temp: 1.99785 | loss: 1.17257| constrast_loss: 4.60186| div_loss: 0.8843| %_mask_idx: 0.38925| ppl: 74.0468| %_neg_is_pos: 0.03668| lr: 0.0| temp: 1.99785 | loss: 1.17349| constrast_loss: 4.60816| div_loss: 0.8581| %_mask_idx: 0.44345| ppl: 90.81415| %_neg_is_pos: 0.02278| lr: 0.0| temp: 1.99783 | loss: 1.17333| constrast_loss: 4.60656| div_loss: 0.86771| %_mask_idx: 0.38503| ppl: 84.6666| %_neg_is_pos: 0.03475| lr: 0.0| temp: 1.99783 | loss: 1.17392| constrast_loss: 4.60913| div_loss: 0.8656| %_mask_idx: 0.38189| ppl: 86.01807| %_neg_is_pos: 0.02483| lr: 0.0| temp: 1.99782 | loss: 1.17321| constrast_loss: 4.60572| div_loss: 0.87104| %_mask_idx: 0.35965| ppl: 82.53667| %_neg_is_pos: 0.02881| lr: 0.0| temp: 1.99782 | loss: 1.17278| constrast_loss: 4.60299| div_loss: 0.88152| %_mask_idx: 0.39724| ppl: 75.82983| %_neg_is_pos: 0.02764| lr: 0.0| temp: 1.99781 | loss: 1.17491| constrast_loss: 4.61364| div_loss: 0.86014| %_mask_idx: 0.34837| ppl: 89.50851| %_neg_is_pos: 0.02381| lr: 0.0| temp: 1.99781 | loss: 1.17462| constrast_loss: 4.61165| div_loss: 0.86827| %_mask_idx: 0.39348| ppl: 84.30422| %_neg_is_pos: 0.01503| lr: 0.0| temp: 1.9978 | loss: 1.17399| constrast_loss: 4.60958| div_loss: 0.86394| %_mask_idx: 0.40508| ppl: 87.07898| %_neg_is_pos: 0.02301| lr: 0.0| temp: 1.9978 | loss: 1.17296| constrast_loss: 4.60454| div_loss: 0.87318| %_mask_idx: 0.39395| ppl: 81.1646| %_neg_is_pos: 0.03216| lr: 0.0| temp: 1.99778 | loss: 1.17344| constrast_loss: 4.60621| div_loss: 0.87558| %_mask_idx: 0.39912| ppl: 79.62798| %_neg_is_pos: 0.02735| lr: 0.0| temp: 1.99778 | loss: 1.1741| constrast_loss: 4.61037| div_loss: 0.86034| %_mask_idx: 0.42309| ppl: 89.38044| %_neg_is_pos: 0.03164| lr: 0.0| temp: 1.99777 | loss: 1.17449| constrast_loss: 4.61147| div_loss: 0.86485| %_mask_idx: 0.38753| ppl: 86.49674| %_neg_is_pos: 0.02442| lr: 0.0| temp: 1.99777 | loss: 1.17499| constrast_loss: 4.61425| div_loss: 0.85703| %_mask_idx: 0.37531| ppl: 91.49779| %_neg_is_pos: 0.0284| lr: 0.0| temp: 1.99776 | loss: 1.1752| constrast_loss: 4.61455| div_loss: 0.86251| %_mask_idx: 0.42215| ppl: 87.99319| %_neg_is_pos: 0.0141| lr: 0.0| temp: 1.99776 | loss: 1.17457| constrast_loss: 4.61152| div_loss: 0.86743| %_mask_idx: 0.39004| ppl: 84.8476| %_neg_is_pos: 0.01934| lr: 0.0| temp: 1.99775 | loss: 1.17373| constrast_loss: 4.60785| div_loss: 0.87064| %_mask_idx: 0.44862| ppl: 82.79022| %_neg_is_pos: 0.01862| lr: 0.0| temp: 1.99775 | loss: 1.17502| constrast_loss: 4.61436| div_loss: 0.85739| %_mask_idx: 0.36983| ppl: 91.27078| %_neg_is_pos: 0.01862| lr: 0.0| temp: 1.99773 | loss: 1.17254| constrast_loss: 4.60226| div_loss: 0.87896| %_mask_idx: 0.39286| ppl: 77.46691| %_neg_is_pos: 0.03172| lr: 0.0| temp: 1.99773 | loss: 1.17261| constrast_loss: 4.60379| div_loss: 0.86644| %_mask_idx: 0.3891| ppl: 85.47729| %_neg_is_pos: 0.02592| lr: 0.0| temp: 1.99772 | loss: 1.17422| constrast_loss: 4.61044| div_loss: 0.8642| %_mask_idx: 0.44846| ppl: 86.91425| %_neg_is_pos: 0.01852| lr: 0.0| temp: 1.99772 | loss: 1.17143| constrast_loss: 4.5991| div_loss: 0.86615| %_mask_idx: 0.3078| ppl: 85.66692| %_neg_is_pos: 0.03539| lr: 0.0| temp: 1.99771 | loss: 1.17282| constrast_loss: 4.60387| div_loss: 0.87411| %_mask_idx: 0.33443| ppl: 80.56878| %_neg_is_pos: 0.04136| lr: 0.0| temp: 1.99771 | loss: 1.17393| constrast_loss: 4.6086| div_loss: 0.87105| %_mask_idx: 0.38894| ppl: 82.52856| %_neg_is_pos: 0.02763| lr: 0.0| temp: 1.9977 | loss: 1.17359| constrast_loss: 4.60714| div_loss: 0.87225| %_mask_idx: 0.34586| ppl: 81.7612| %_neg_is_pos: 0.02845| lr: 0.0| temp: 1.9977 | loss: 1.17452| constrast_loss: 4.61175| div_loss: 0.86316| %_mask_idx: 0.38236| ppl: 87.58037| %_neg_is_pos: 0.02523| lr: 0.0| temp: 1.99768 | loss: 1.1742| constrast_loss: 4.60959| div_loss: 0.8723| %_mask_idx: 0.36028| ppl: 81.72684| %_neg_is_pos: 0.04811| lr: 0.0| temp: 1.99768 | loss: 1.17598| constrast_loss: 4.61883| div_loss: 0.85086| %_mask_idx: 0.4057| ppl: 95.45204| %_neg_is_pos: 0.01808| lr: 0.0| temp: 1.99767 | loss: 1.1727| constrast_loss: 4.60285| div_loss: 0.87928| %_mask_idx: 0.40038| ppl: 77.25944| %_neg_is_pos: 0.04355| lr: 0.0| temp: 1.99767 | loss: 1.17534| constrast_loss: 4.6149| div_loss: 0.86471| %_mask_idx: 0.45238| ppl: 86.58357| %_neg_is_pos: 0.01953| lr: 0.0| temp: 1.99765 | loss: 1.17378| constrast_loss: 4.60798| div_loss: 0.87141| %_mask_idx: 0.42058| ppl: 82.29906| %_neg_is_pos: 0.02948| lr: 0.0| temp: 1.99765 | loss: 1.17516| constrast_loss: 4.61485| div_loss: 0.85798| %_mask_idx: 0.4057| ppl: 90.89394| %_neg_is_pos: 0.02482| lr: 0.0| temp: 1.99764 | loss: 1.17401| constrast_loss: 4.60966| div_loss: 0.86373| %_mask_idx: 0.41792| ppl: 87.21489| %_neg_is_pos: 0.02745| lr: 0.0| temp: 1.99764 | loss: 1.17346| constrast_loss: 4.60654| div_loss: 0.87285| %_mask_idx: 0.40711| ppl: 81.37407| %_neg_is_pos: 0.02374| lr: 0.0| temp: 1.99763 | loss: 1.17365| constrast_loss: 4.60892| div_loss: 0.8569| %_mask_idx: 0.39991| ppl: 91.58704| %_neg_is_pos: 0.0247| lr: 0.0| temp: 1.99763 | loss: 1.17227| constrast_loss: 4.60129| div_loss: 0.87768| %_mask_idx: 0.33741| ppl: 78.28619| %_neg_is_pos: 0.04942| lr: 0.0| temp: 1.99762 | loss: 1.17247| constrast_loss: 4.60274| div_loss: 0.87146| %_mask_idx: 0.42544| ppl: 82.26715| %_neg_is_pos: 0.02714| lr: 0.0| temp: 1.99762 | loss: 1.17412| constrast_loss: 4.60962| div_loss: 0.86856| %_mask_idx: 0.45504| ppl: 84.12176| %_neg_is_pos: 0.02458| lr: 0.0| temp: 1.9976 | loss: 1.17199| constrast_loss: 4.6007| div_loss: 0.87259| %_mask_idx: 0.43828| ppl: 81.54252| %_neg_is_pos: 0.03016| lr: 0.0| temp: 1.9976 | loss: 1.17546| constrast_loss: 4.61563| div_loss: 0.86207| %_mask_idx: 0.40022| ppl: 88.27663| %_neg_is_pos: 0.02821| lr: 0.0| temp: 1.99759 | loss: 1.1734| constrast_loss: 4.60609| div_loss: 0.87507| %_mask_idx: 0.35495| ppl: 79.95779| %_neg_is_pos: 0.03803| lr: 0.0| temp: 1.99759 | loss: 1.17279| constrast_loss: 4.60405| div_loss: 0.87115| %_mask_idx: 0.39129| ppl: 82.46096| %_neg_is_pos: 0.02581| lr: 0.0| temp: 1.99758 | loss: 1.174| constrast_loss: 4.60977| div_loss: 0.86238| %_mask_idx: 0.36184| ppl: 88.07481| %_neg_is_pos: 0.03143| lr: 0.0| temp: 1.99758 | loss: 1.17288| constrast_loss: 4.60382| div_loss: 0.87699| %_mask_idx: 0.39066| ppl: 78.72839| %_neg_is_pos: 0.0403| lr: 0.0| temp: 1.99757 | loss: 1.17364| constrast_loss: 4.60824| div_loss: 0.86312| %_mask_idx: 0.36216| ppl: 87.60008| %_neg_is_pos: 0.02261| lr: 0.0| temp: 1.99757 | loss: 1.1734| constrast_loss: 4.60689| div_loss: 0.86701| %_mask_idx: 0.401| ppl: 85.1118| %_neg_is_pos: 0.02834| lr: 0.0| temp: 1.99755 | loss: 1.17526| constrast_loss: 4.6141| div_loss: 0.86951| %_mask_idx: 0.38549| ppl: 83.5117| %_neg_is_pos: 0.02911| lr: 0.0| temp: 1.99755 | loss: 1.1746| constrast_loss: 4.61243| div_loss: 0.85968| %_mask_idx: 0.37923| ppl: 89.80544| %_neg_is_pos: 0.02709| lr: 0.0| temp: 1.99754 | loss: 1.17431| constrast_loss: 4.61031| div_loss: 0.86917| %_mask_idx: 0.41714| ppl: 83.73217| %_neg_is_pos: 0.0249| lr: 0.0| temp: 1.99754 | loss: 1.17383| constrast_loss: 4.60903| div_loss: 0.86304| %_mask_idx: 0.35871| ppl: 87.65154| %_neg_is_pos: 0.03183| lr: 0.0| temp: 1.99753 | loss: 1.17243| constrast_loss: 4.60225| div_loss: 0.87471| %_mask_idx: 0.39364| ppl: 80.18745| %_neg_is_pos: 0.04129| lr: 0.0| temp: 1.99753 | loss: 1.17339| constrast_loss: 4.60676| div_loss: 0.86785| %_mask_idx: 0.42184| ppl: 84.57741| %_neg_is_pos: 0.02021| lr: 0.0| temp: 1.99752 | loss: 1.17465| constrast_loss: 4.61294| div_loss: 0.85659| %_mask_idx: 0.40555| ppl: 91.78069| %_neg_is_pos: 0.03115| lr: 0.0| temp: 1.99752 | loss: 1.17392| constrast_loss: 4.60915| div_loss: 0.86542| %_mask_idx: 0.3385| ppl: 86.12918| %_neg_is_pos: 0.02826| lr: 0.0| temp: 1.9975 | loss: 1.17359| constrast_loss: 4.6072| div_loss: 0.87157| %_mask_idx: 0.39098| ppl: 82.19626| %_neg_is_pos: 0.01926| lr: 0.0| temp: 1.9975 | loss: 1.1741| constrast_loss: 4.60925| div_loss: 0.87152| %_mask_idx: 0.35025| ppl: 82.22691| %_neg_is_pos: 0.02172| lr: 0.0| temp: 1.99749 | loss: 1.17537| constrast_loss: 4.61446| div_loss: 0.87015| %_mask_idx: 0.41165| ppl: 83.10469| %_neg_is_pos: 0.02302| lr: 0.0| temp: 1.99749 | loss: 1.17466| constrast_loss: 4.61155| div_loss: 0.87108| %_mask_idx: 0.4386| ppl: 82.51125| %_neg_is_pos: 0.01467| lr: 0.0| temp: 1.99747 | loss: 1.17354| constrast_loss: 4.60724| div_loss: 0.86908| %_mask_idx: 0.39599| ppl: 83.78649| %_neg_is_pos: 0.03677| lr: 0.0| temp: 1.99747 | loss: 1.1738| constrast_loss: 4.60872| div_loss: 0.86473| %_mask_idx: 0.37845| ppl: 86.57543| %_neg_is_pos: 0.03687| lr: 0.0| temp: 1.99746 | loss: 1.17456| constrast_loss: 4.61345| div_loss: 0.84797| %_mask_idx: 0.36278| ppl: 97.30069| %_neg_is_pos: 0.0154| lr: 0.0| temp: 1.99746 | loss: 1.17164| constrast_loss: 4.59927| div_loss: 0.873| %_mask_idx: 0.40617| ppl: 81.28317| %_neg_is_pos: 0.02682| lr: 0.0| temp: 1.99745 | loss: 1.17178| constrast_loss: 4.59993| div_loss: 0.87199| %_mask_idx: 0.42857| ppl: 81.92497| %_neg_is_pos: 0.03308| lr: 0.0| temp: 1.99745 | loss: 1.17406| constrast_loss: 4.61032| div_loss: 0.85922| %_mask_idx: 0.35511| ppl: 90.09691| %_neg_is_pos: 0.02077| lr: 0.0| temp: 1.99744 | loss: 1.17482| constrast_loss: 4.61293| div_loss: 0.86355| %_mask_idx: 0.38503| ppl: 87.32793| %_neg_is_pos: 0.01595| lr: 0.0| temp: 1.99744 | loss: 1.1744| constrast_loss: 4.61035| div_loss: 0.87236| %_mask_idx: 0.37625| ppl: 81.68969| %_neg_is_pos: 0.03092| lr: 0.0| temp: 1.99742 | loss: 1.17362| constrast_loss: 4.60708| div_loss: 0.8739| %_mask_idx: 0.37046| ppl: 80.70177| %_neg_is_pos: 0.03364| lr: 0.0| temp: 1.99742 | loss: 1.17501| constrast_loss: 4.6141| div_loss: 0.85926| %_mask_idx: 0.33051| ppl: 90.07422| %_neg_is_pos: 0.02055| lr: 0.0| temp: 1.99741 | loss: 1.17367| constrast_loss: 4.60805| div_loss: 0.86631| %_mask_idx: 0.39818| ppl: 85.56197| %_neg_is_pos: 0.02272| lr: 0.0| temp: 1.99741 | loss: 1.1745| constrast_loss: 4.61175| div_loss: 0.86244| %_mask_idx: 0.39051| ppl: 88.03862| %_neg_is_pos: 0.02221| lr: 0.0| temp: 1.9974 | loss: 1.1743| constrast_loss: 4.61094| div_loss: 0.86239| %_mask_idx: 0.40648| ppl: 88.06874| %_neg_is_pos: 0.03002| lr: 0.0| temp: 1.9974 | loss: 1.17345| constrast_loss: 4.60678| div_loss: 0.87032| %_mask_idx: 0.36451| ppl: 82.99342| %_neg_is_pos: 0.01849| lr: 0.0| temp: 1.99739 | loss: 1.17379| constrast_loss: 4.60815| div_loss: 0.86997| %_mask_idx: 0.37375| ppl: 83.21867| %_neg_is_pos: 0.03194| lr: 0.0| temp: 1.99739 | loss: 1.17308| constrast_loss: 4.60487| div_loss: 0.87453| %_mask_idx: 0.36231| ppl: 80.30173| %_neg_is_pos: 0.03587| lr: 0.0| temp: 1.99737 | loss: 1.17312| constrast_loss: 4.60501| div_loss: 0.87447| %_mask_idx: 0.40132| ppl: 80.34193| %_neg_is_pos: 0.03511| lr: 0.0| temp: 1.99737 | loss: 1.17476| constrast_loss: 4.61279| div_loss: 0.86262| %_mask_idx: 0.40038| ppl: 87.92408| %_neg_is_pos: 0.02708| lr: 0.0| temp: 1.99736 | loss: 1.17215| constrast_loss: 4.60098| div_loss: 0.87618| %_mask_idx: 0.38753| ppl: 79.2431| %_neg_is_pos: 0.03051| lr: 0.0| temp: 1.99736 | loss: 1.17345| constrast_loss: 4.60727| div_loss: 0.86522| %_mask_idx: 0.37547| ppl: 86.26036| %_neg_is_pos: 0.02691| lr: 0.0| temp: 1.99735 | loss: 1.17382| constrast_loss: 4.60753| div_loss: 0.87729| %_mask_idx: 0.36701| ppl: 78.53241| %_neg_is_pos: 0.03195| lr: 0.0| temp: 1.99735 | loss: 1.17468| constrast_loss: 4.61329| div_loss: 0.85432| %_mask_idx: 0.41792| ppl: 93.23468| %_neg_is_pos: 0.01568| lr: 0.0| temp: 1.99734 | loss: 1.17087| constrast_loss: 4.59475| div_loss: 0.88718| %_mask_idx: 0.39129| ppl: 72.20759| %_neg_is_pos: 0.04647| lr: 0.0| temp: 1.99734 | loss: 1.17383| constrast_loss: 4.60771| div_loss: 0.87632| %_mask_idx: 0.3963| ppl: 79.15289| %_neg_is_pos: 0.0315| lr: 0.0| temp: 1.99732 | loss: 1.17208| constrast_loss: 4.60168| div_loss: 0.86619| %_mask_idx: 0.38675| ppl: 85.64134| %_neg_is_pos: 0.03378| lr: 0.0| temp: 1.99732 | loss: 1.17234| constrast_loss: 4.60081| div_loss: 0.88534| %_mask_idx: 0.36247| ppl: 73.37958| %_neg_is_pos: 0.04705| lr: 0.0| temp: 1.99731 | loss: 1.17264| constrast_loss: 4.60365| div_loss: 0.86922| %_mask_idx: 0.38315| ppl: 83.70039| %_neg_is_pos: 0.02853| lr: 0.0| temp: 1.99731 [2021-09-01 21:52:27,196] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1048576.0, reducing to 524288.0 [2021-09-01 21:52:27,196] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1048576.0, reducing to 524288.0 | loss: 1.17272| constrast_loss: 4.60349| div_loss: 0.87386| %_mask_idx: 0.41024| ppl: 80.72901| %_neg_is_pos: 0.02643| lr: 0.0| temp: 1.99729 | loss: 1.17544| constrast_loss: 4.61573| div_loss: 0.86045| %_mask_idx: 0.34195| ppl: 89.31332| %_neg_is_pos: 0.02685| lr: 0.0| temp: 1.99729 | loss: 1.17316| constrast_loss: 4.60583| div_loss: 0.86804| %_mask_idx: 0.40821| ppl: 84.457| %_neg_is_pos: 0.03169| lr: 0.0| temp: 1.99728 | loss: 1.17363| constrast_loss: 4.60756| div_loss: 0.86965| %_mask_idx: 0.39991| ppl: 83.4218| %_neg_is_pos: 0.02668| lr: 0.0| temp: 1.99728 | loss: 1.17498| constrast_loss: 4.6147| div_loss: 0.85223| %_mask_idx: 0.4057| ppl: 94.57499| %_neg_is_pos: 0.02108| lr: 0.0| temp: 1.99727 | loss: 1.1745| constrast_loss: 4.61173| div_loss: 0.8625| %_mask_idx: 0.3786| ppl: 88.001| %_neg_is_pos: 0.02949| lr: 0.0| temp: 1.99727 | loss: 1.17173| constrast_loss: 4.59907| div_loss: 0.87835| %_mask_idx: 0.44846| ppl: 77.85609| %_neg_is_pos: 0.0261| lr: 0.0| temp: 1.99726 | loss: 1.17391| constrast_loss: 4.60912| div_loss: 0.86534| %_mask_idx: 0.41761| ppl: 86.18432| %_neg_is_pos: 0.01814| lr: 0.0| temp: 1.99726 | loss: 1.17499| constrast_loss: 4.61451| div_loss: 0.85455| %_mask_idx: 0.35887| ppl: 93.08584| %_neg_is_pos: 0.03253| lr: 0.0| temp: 1.99724 | loss: 1.17393| constrast_loss: 4.60957| div_loss: 0.86168| %_mask_idx: 0.36216| ppl: 88.5226| %_neg_is_pos: 0.02741| lr: 0.0| temp: 1.99724 | loss: 1.17483| constrast_loss: 4.61315| div_loss: 0.86149| %_mask_idx: 0.38221| ppl: 88.64675| %_neg_is_pos: 0.0259| lr: 0.0| temp: 1.99723 | loss: 1.17359| constrast_loss: 4.60731| div_loss: 0.8703| %_mask_idx: 0.38675| ppl: 83.00935| %_neg_is_pos: 0.02401| lr: 0.0| temp: 1.99723 | loss: 1.17242| constrast_loss: 4.60257| div_loss: 0.87112| %_mask_idx: 0.40382| ppl: 82.48141| %_neg_is_pos: 0.03234| lr: 0.0| temp: 1.99722 | loss: 1.17365| constrast_loss: 4.60852| div_loss: 0.86095| %_mask_idx: 0.4292| ppl: 88.991| %_neg_is_pos: 0.01388| lr: 0.0| temp: 1.99722 | loss: 1.17386| constrast_loss: 4.60979| div_loss: 0.85656| %_mask_idx: 0.40648| ppl: 91.80464| %_neg_is_pos: 0.02797| lr: 0.0| temp: 1.99721 | loss: 1.17365| constrast_loss: 4.6086| div_loss: 0.85998| %_mask_idx: 0.38612| ppl: 89.60986| %_neg_is_pos: 0.0157| lr: 0.0| temp: 1.99721 | loss: 1.17274| constrast_loss: 4.60355| div_loss: 0.87409| %_mask_idx: 0.3656| ppl: 80.58328| %_neg_is_pos: 0.04098| lr: 0.0| temp: 1.99719| loss: 1.17232| constrast_loss: 4.6025| div_loss: 0.86781| %_mask_idx: 0.36341| ppl: 84.60079| %_neg_is_pos: 0.03688| lr: 0.0| temp: 1.99719 | loss: 1.17339| constrast_loss: 4.60771| div_loss: 0.85848| %_mask_idx: 0.43202| ppl: 90.57037| %_neg_is_pos: 0.01818| lr: 0.0| temp: 1.99718 | loss: 1.17386| constrast_loss: 4.60839| div_loss: 0.87044| %_mask_idx: 0.38549| ppl: 82.91849| %_neg_is_pos: 0.01856| lr: 0.0| temp: 1.99718 | loss: 1.17437| constrast_loss: 4.61107| div_loss: 0.86416| %_mask_idx: 0.33318| ppl: 86.93871| %_neg_is_pos: 0.01992| lr: 0.0| temp: 1.99717 | loss: 1.17423| constrast_loss: 4.61051| div_loss: 0.864| %_mask_idx: 0.36137| ppl: 87.04251| %_neg_is_pos: 0.03362| lr: 0.0| temp: 1.99717 | loss: 1.17495| constrast_loss: 4.614| div_loss: 0.85822| %_mask_idx: 0.47666| ppl: 90.73685| %_neg_is_pos: 0.01202| lr: 0.0| temp: 1.99716 | loss: 1.17148| constrast_loss: 4.59817| div_loss: 0.87737| %_mask_idx: 0.36623| ppl: 78.48402| %_neg_is_pos: 0.04518| lr: 0.0| temp: 1.99716 | loss: 1.17309| constrast_loss: 4.60524| div_loss: 0.87143| %_mask_idx: 0.39254| ppl: 82.28496| %_neg_is_pos: 0.03979| lr: 0.0| temp: 1.99714 | loss: 1.17218| constrast_loss: 4.60175| div_loss: 0.8696| %_mask_idx: 0.42826| ppl: 83.45312| %_neg_is_pos: 0.02362| lr: 0.0| temp: 1.99714 | loss: 1.17288| constrast_loss: 4.60493| div_loss: 0.86577| %_mask_idx: 0.37892| ppl: 85.90517| %_neg_is_pos: 0.03968| lr: 0.0| temp: 1.99713 | loss: 1.17482| constrast_loss: 4.61307| div_loss: 0.86198| %_mask_idx: 0.44001| ppl: 88.33298| %_neg_is_pos: 0.01194| lr: 0.0| temp: 1.99713 | loss: 1.17492| constrast_loss: 4.61487| div_loss: 0.8479| %_mask_idx: 0.39709| ppl: 97.34232| %_neg_is_pos: 0.02719| lr: 0.0| temp: 1.99711 | loss: 1.17318| constrast_loss: 4.6061| div_loss: 0.8662| %_mask_idx: 0.33913| ppl: 85.63287| %_neg_is_pos: 0.0363| lr: 0.0| temp: 1.99711 | loss: 1.17265| constrast_loss: 4.60499| div_loss: 0.85618| %_mask_idx: 0.42137| ppl: 92.04192| %_neg_is_pos: 0.03508| lr: 0.0| temp: 1.9971 | loss: 1.17388| constrast_loss: 4.61019| div_loss: 0.85313| %_mask_idx: 0.37813| ppl: 93.99931| %_neg_is_pos: 0.01878| lr: 0.0| temp: 1.9971 | loss: 1.17349| constrast_loss: 4.60672| div_loss: 0.87224| %_mask_idx: 0.36811| ppl: 81.7641| %_neg_is_pos: 0.03011| lr: 0.0| temp: 1.99709 | loss: 1.1755| constrast_loss: 4.61607| div_loss: 0.85917| %_mask_idx: 0.4433| ppl: 90.13169| %_neg_is_pos: 0.00908| lr: 0.0| temp: 1.99709 | loss: 1.17253| constrast_loss: 4.60323| div_loss: 0.86911| %_mask_idx: 0.38518| ppl: 83.77089| %_neg_is_pos: 0.02668| lr: 0.0| temp: 1.99708 | loss: 1.17306| constrast_loss: 4.60509| div_loss: 0.87133| %_mask_idx: 0.39395| ppl: 82.34695| %_neg_is_pos: 0.02991| lr: 0.0| temp: 1.99708 | loss: 1.17285| constrast_loss: 4.60518| div_loss: 0.8622| %_mask_idx: 0.39113| ppl: 88.19079| %_neg_is_pos: 0.03016| lr: 0.0| temp: 1.99706 | loss: 1.17345| constrast_loss: 4.60775| div_loss: 0.86045| %_mask_idx: 0.38769| ppl: 89.31387| %_neg_is_pos: 0.02995| lr: 0.0| temp: 1.99706 | loss: 1.17211| constrast_loss: 4.60213| div_loss: 0.86316| %_mask_idx: 0.43625| ppl: 87.57683| %_neg_is_pos: 0.02354| lr: 0.0| temp: 1.99705 | loss: 1.1738| constrast_loss: 4.60857| div_loss: 0.8663| %_mask_idx: 0.33239| ppl: 85.56623| %_neg_is_pos: 0.02944| lr: 0.0| temp: 1.99705 | loss: 1.17324| constrast_loss: 4.60624| div_loss: 0.86737| %_mask_idx: 0.37516| ppl: 84.88589| %_neg_is_pos: 0.03743| lr: 0.0| temp: 1.99704 | loss: 1.17301| constrast_loss: 4.60514| div_loss: 0.86916| %_mask_idx: 0.36826| ppl: 83.73772| %_neg_is_pos: 0.03157| lr: 0.0| temp: 1.99704 | loss: 1.16955| constrast_loss: 4.59075| div_loss: 0.87439| %_mask_idx: 0.33506| ppl: 80.39278| %_neg_is_pos: 0.04363| lr: 0.0| temp: 1.99703 | loss: 1.17428| constrast_loss: 4.6103| div_loss: 0.86827| %_mask_idx: 0.39019| ppl: 84.30785| %_neg_is_pos: 0.02205| lr: 0.0| temp: 1.99703 | loss: 1.17266| constrast_loss: 4.6043| div_loss: 0.86339| %_mask_idx: 0.37312| ppl: 87.42978| %_neg_is_pos: 0.02977| lr: 0.0| temp: 1.99701 | loss: 1.17373| constrast_loss: 4.60866| div_loss: 0.86271| %_mask_idx: 0.38127| ppl: 87.86702| %_neg_is_pos: 0.03345| lr: 0.0| temp: 1.99701 | loss: 1.17304| constrast_loss: 4.60517| div_loss: 0.86999| %_mask_idx: 0.39129| ppl: 83.20851| %_neg_is_pos: 0.04312| lr: 0.0| temp: 1.997 | loss: 1.17378| constrast_loss: 4.60977| div_loss: 0.85343| %_mask_idx: 0.38346| ppl: 93.80469| %_neg_is_pos: 0.02057| lr: 0.0| temp: 1.997 | loss: 1.17187| constrast_loss: 4.6003| div_loss: 0.87182| %_mask_idx: 0.35526| ppl: 82.03207| %_neg_is_pos: 0.03336| lr: 0.0| temp: 1.99699 | loss: 1.17411| constrast_loss: 4.61078| div_loss: 0.85664| %_mask_idx: 0.42935| ppl: 91.75343| %_neg_is_pos: 0.01944| lr: 0.0| temp: 1.99699 | loss: 1.17305| constrast_loss: 4.60633| div_loss: 0.85872| %_mask_idx: 0.38409| ppl: 90.42076| %_neg_is_pos: 0.01969| lr: 0.0| temp: 1.99698 | loss: 1.17423| constrast_loss: 4.61141| div_loss: 0.85506| %_mask_idx: 0.38001| ppl: 92.76102| %_neg_is_pos: 0.02634| lr: 0.0| temp: 1.99698 | loss: 1.17338| constrast_loss: 4.60784| div_loss: 0.85666| %_mask_idx: 0.37469| ppl: 91.73595| %_neg_is_pos: 0.02352| lr: 0.0| temp: 1.99696 | loss: 1.17394| constrast_loss: 4.60933| div_loss: 0.86433| %_mask_idx: 0.34101| ppl: 86.83121| %_neg_is_pos: 0.02325| lr: 0.0| temp: 1.99696 | loss: 1.17382| constrast_loss: 4.60878| div_loss: 0.86491| %_mask_idx: 0.42058| ppl: 86.45868| %_neg_is_pos: 0.02866| lr: 0.0| temp: 1.99695 | loss: 1.17438| constrast_loss: 4.61135| div_loss: 0.86178| %_mask_idx: 0.39865| ppl: 88.46236| %_neg_is_pos: 0.02016| lr: 0.0| temp: 1.99695 | loss: 1.1736| constrast_loss: 4.60827| div_loss: 0.86121| %_mask_idx: 0.39991| ppl: 88.82368| %_neg_is_pos: 0.03268| lr: 0.0| temp: 1.99693 | loss: 1.17208| constrast_loss: 4.60264| div_loss: 0.85673| %_mask_idx: 0.36654| ppl: 91.69582| %_neg_is_pos: 0.03682| lr: 0.0| temp: 1.99693 | loss: 1.17365| constrast_loss: 4.60914| div_loss: 0.85457| %_mask_idx: 0.41275| ppl: 93.07311| %_neg_is_pos: 0.02039| lr: 0.0| temp: 1.99692 | loss: 1.17492| constrast_loss: 4.61396| div_loss: 0.85719| %_mask_idx: 0.39317| ppl: 91.39796| %_neg_is_pos: 0.02764| lr: 0.0| temp: 1.99692 | loss: 1.17327| constrast_loss: 4.60736| div_loss: 0.85715| %_mask_idx: 0.33349| ppl: 91.42194| %_neg_is_pos: 0.02198| lr: 0.0| temp: 1.99691 | loss: 1.174| constrast_loss: 4.61048| div_loss: 0.85528| %_mask_idx: 0.39709| ppl: 92.61987| %_neg_is_pos: 0.02807| lr: 0.0| temp: 1.99691 | loss: 1.17384| constrast_loss: 4.60904| div_loss: 0.86326| %_mask_idx: 0.37343| ppl: 87.51552| %_neg_is_pos: 0.02673| lr: 0.0| temp: 1.9969 | loss: 1.17401| constrast_loss: 4.61005| div_loss: 0.8597| %_mask_idx: 0.36482| ppl: 89.79149| %_neg_is_pos: 0.02371| lr: 0.0| temp: 1.9969 | loss: 1.17401| constrast_loss: 4.61048| div_loss: 0.85551| %_mask_idx: 0.40868| ppl: 92.47407| %_neg_is_pos: 0.01405| lr: 0.0| temp: 1.99688 | loss: 1.17453| constrast_loss: 4.6132| div_loss: 0.84913| %_mask_idx: 0.41447| ppl: 96.5544| %_neg_is_pos: 0.0205| lr: 0.0| temp: 1.99688 | loss: 1.17346| constrast_loss: 4.60753| div_loss: 0.86311| %_mask_idx: 0.44251| ppl: 87.60994| %_neg_is_pos: 0.01582| lr: 0.0| temp: 1.99687 | loss: 1.17373| constrast_loss: 4.60877| div_loss: 0.86165| %_mask_idx: 0.43828| ppl: 88.5463| %_neg_is_pos: 0.01506| lr: 0.0| temp: 1.99687 | loss: 1.17285| constrast_loss: 4.60385| div_loss: 0.87555| %_mask_idx: 0.34179| ppl: 79.64562| %_neg_is_pos: 0.03301| lr: 0.0| temp: 1.99686 | loss: 1.17298| constrast_loss: 4.60588| div_loss: 0.86023| %_mask_idx: 0.37061| ppl: 89.44965| %_neg_is_pos: 0.02711| lr: 0.0| temp: 1.99686 | loss: 1.174| constrast_loss: 4.60961| div_loss: 0.86374| %_mask_idx: 0.38706| ppl: 87.20491| %_neg_is_pos: 0.02231| lr: 0.0| temp: 1.99685 | loss: 1.1734| constrast_loss: 4.60791| div_loss: 0.85688| %_mask_idx: 0.39583| ppl: 91.59467| %_neg_is_pos: 0.02292| lr: 0.0| temp: 1.99685 | loss: 1.17247| constrast_loss: 4.60231| div_loss: 0.87581| %_mask_idx: 0.3891| ppl: 79.48339| %_neg_is_pos: 0.03181| lr: 0.0| temp: 1.99683 | loss: 1.17389| constrast_loss: 4.60954| div_loss: 0.86003| %_mask_idx: 0.38487| ppl: 89.58195| %_neg_is_pos: 0.02214| lr: 0.0| temp: 1.99683 | loss: 1.17367| constrast_loss: 4.60926| div_loss: 0.85431| %_mask_idx: 0.35808| ppl: 93.24126| %_neg_is_pos: 0.03758| lr: 0.0| temp: 1.99682 | loss: 1.16994| constrast_loss: 4.59277| div_loss: 0.86968| %_mask_idx: 0.34539| ppl: 83.40181| %_neg_is_pos: 0.04676| lr: 0.0| temp: 1.99682 | loss: 1.17383| constrast_loss: 4.60874| div_loss: 0.86593| %_mask_idx: 0.42591| ppl: 85.80744| %_neg_is_pos: 0.01927| lr: 0.0| temp: 1.99681 | loss: 1.17257| constrast_loss: 4.60432| div_loss: 0.85957| %_mask_idx: 0.39098| ppl: 89.87488| %_neg_is_pos: 0.0295| lr: 0.0| temp: 1.99681 | loss: 1.17315| constrast_loss: 4.60561| div_loss: 0.87009| %_mask_idx: 0.36811| ppl: 83.14433| %_neg_is_pos: 0.03878| lr: 0.0| temp: 1.9968 | loss: 1.17363| constrast_loss: 4.60858| div_loss: 0.8595| %_mask_idx: 0.44815| ppl: 89.91898| %_neg_is_pos: 0.01971| lr: 0.0| temp: 1.9968 | loss: 1.17355| constrast_loss: 4.6087| div_loss: 0.85498| %_mask_idx: 0.35605| ppl: 92.80972| %_neg_is_pos: 0.02983| lr: 0.0| temp: 1.99678 | loss: 1.17339| constrast_loss: 4.60676| div_loss: 0.86801| %_mask_idx: 0.39709| ppl: 84.47449| %_neg_is_pos: 0.02007| lr: 0.0| temp: 1.99678 | loss: 1.17421| constrast_loss: 4.61151| div_loss: 0.85308| %_mask_idx: 0.45692| ppl: 94.02848| %_neg_is_pos: 0.01211| lr: 0.0| temp: 1.99677 | loss: 1.17308| constrast_loss: 4.60595| div_loss: 0.8638| %_mask_idx: 0.43233| ppl: 87.16731| %_neg_is_pos: 0.03177| lr: 0.0| temp: 1.99677 | loss: 1.17358| constrast_loss: 4.60815| div_loss: 0.86165| %_mask_idx: 0.36325| ppl: 88.54704| %_neg_is_pos: 0.02795| lr: 0.0| temp: 1.99675 | loss: 1.17428| constrast_loss: 4.61057| div_loss: 0.86551| %_mask_idx: 0.40006| ppl: 86.07499| %_neg_is_pos: 0.0209| lr: 0.0| temp: 1.99675 | loss: 1.17247| constrast_loss: 4.60338| div_loss: 0.86486| %_mask_idx: 0.39192| ppl: 86.48659| %_neg_is_pos: 0.02863| lr: 0.0| temp: 1.99674 | loss: 1.17178| constrast_loss: 4.59992| div_loss: 0.87185| %_mask_idx: 0.37672| ppl: 82.0178| %_neg_is_pos: 0.03067| lr: 0.0| temp: 1.99674 | loss: 1.1753| constrast_loss: 4.6153| div_loss: 0.85905| %_mask_idx: 0.37296| ppl: 90.20732| %_neg_is_pos: 0.02066| lr: 0.0| temp: 1.99673 | loss: 1.17384| constrast_loss: 4.60968| div_loss: 0.85697| %_mask_idx: 0.42325| ppl: 91.53948| %_neg_is_pos: 0.02484| lr: 0.0| temp: 1.99673 | loss: 1.17317| constrast_loss: 4.60708| div_loss: 0.85586| %_mask_idx: 0.3714| ppl: 92.25255| %_neg_is_pos: 0.02378| lr: 0.0| temp: 1.99672 | loss: 1.17408| constrast_loss: 4.60987| div_loss: 0.86464| %_mask_idx: 0.35495| ppl: 86.63321| %_neg_is_pos: 0.0274| lr: 0.0| temp: 1.99672 | loss: 1.17521| constrast_loss: 4.61491| div_loss: 0.85914| %_mask_idx: 0.38111| ppl: 90.14845| %_neg_is_pos: 0.02232| lr: 0.0| temp: 1.9967 | loss: 1.17482| constrast_loss: 4.61442| div_loss: 0.84855| %_mask_idx: 0.41134| ppl: 96.92554| %_neg_is_pos: 0.01201| lr: 0.0| temp: 1.9967 | loss: 1.17201| constrast_loss: 4.60153| div_loss: 0.86507| %_mask_idx: 0.36889| ppl: 86.35841| %_neg_is_pos: 0.03226| lr: 0.0| temp: 1.99669 | loss: 1.1734| constrast_loss: 4.60731| div_loss: 0.86304| %_mask_idx: 0.4021| ppl: 87.65526| %_neg_is_pos: 0.02693| lr: 0.0| temp: 1.99669 | loss: 1.17506| constrast_loss: 4.61498| div_loss: 0.85256| %_mask_idx: 0.38017| ppl: 94.36246| %_neg_is_pos: 0.02107| lr: 0.0| temp: 1.99668 | loss: 1.17479| constrast_loss: 4.6134| div_loss: 0.85773| %_mask_idx: 0.39881| ppl: 91.0556| %_neg_is_pos: 0.02002| lr: 0.0| temp: 1.99668 | loss: 1.17309| constrast_loss: 4.60678| div_loss: 0.85581| %_mask_idx: 0.44126| ppl: 92.28367| %_neg_is_pos: 0.01583| lr: 0.0| temp: 1.99667 | loss: 1.17399| constrast_loss: 4.61051| div_loss: 0.8547| %_mask_idx: 0.42544| ppl: 92.98979| %_neg_is_pos: 0.01167| lr: 0.0| temp: 1.99667 | loss: 1.17442| constrast_loss: 4.61176| div_loss: 0.85928| %_mask_idx: 0.40633| ppl: 90.05888| %_neg_is_pos: 0.014| lr: 0.0| temp: 1.99665 | loss: 1.17151| constrast_loss: 4.5992| div_loss: 0.86853| %_mask_idx: 0.3808| ppl: 84.1421| %_neg_is_pos: 0.03727| lr: 0.0| temp: 1.99665 | loss: 1.17392| constrast_loss: 4.60979| div_loss: 0.85902| %_mask_idx: 0.40194| ppl: 90.22869| %_neg_is_pos: 0.02683| lr: 0.0| temp: 1.99664 | loss: 1.17156| constrast_loss: 4.59892| div_loss: 0.87307| %_mask_idx: 0.34665| ppl: 81.2334| %_neg_is_pos: 0.04708| lr: 0.0| temp: 1.99664 | loss: 1.1738| constrast_loss: 4.60914| div_loss: 0.86048| %_mask_idx: 0.46444| ppl: 89.29523| %_neg_is_pos: 0.02023| lr: 0.0| temp: 1.99663 | loss: 1.17302| constrast_loss: 4.60526| div_loss: 0.86816| %_mask_idx: 0.38831| ppl: 84.37659| %_neg_is_pos: 0.02633| lr: 0.0| temp: 1.99663 | loss: 1.17323| constrast_loss: 4.60648| div_loss: 0.86436| %_mask_idx: 0.39458| ppl: 86.8104| %_neg_is_pos: 0.02592| lr: 0.0| temp: 1.99662 | loss: 1.17366| constrast_loss: 4.60892| div_loss: 0.85733| %_mask_idx: 0.35699| ppl: 91.31154| %_neg_is_pos: 0.0234| lr: 0.0| temp: 1.99662 | loss: 1.17406| constrast_loss: 4.61064| div_loss: 0.85619| %_mask_idx: 0.33647| ppl: 92.03906| %_neg_is_pos: 0.02908| lr: 0.0| temp: 1.9966 | loss: 1.17351| constrast_loss: 4.60727| div_loss: 0.86785| %_mask_idx: 0.38769| ppl: 84.57691| %_neg_is_pos: 0.02353| lr: 0.0| temp: 1.9966 | loss: 1.17326| constrast_loss: 4.60614| div_loss: 0.86898| %_mask_idx: 0.43499| ppl: 83.85095| %_neg_is_pos: 0.02178| lr: 0.0| temp: 1.99659 | loss: 1.17449| constrast_loss: 4.61322| div_loss: 0.84752| %_mask_idx: 0.39035| ppl: 97.58458| %_neg_is_pos: 0.02436| lr: 0.0| temp: 1.99659 [2021-09-01 22:01:42,604] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 524288.0, reducing to 262144.0 [2021-09-01 22:01:42,605] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 524288.0, reducing to 262144.0 | loss: 1.17302| constrast_loss: 4.60507| div_loss: 0.87012| %_mask_idx: 0.38549| ppl: 83.12012| %_neg_is_pos: 0.03074| lr: 0.0| temp: 1.99657 | loss: 1.17341| constrast_loss: 4.60774| div_loss: 0.85902| %_mask_idx: 0.41103| ppl: 90.22439| %_neg_is_pos: 0.03034| lr: 0.0| temp: 1.99657 | loss: 1.1732| constrast_loss: 4.60757| div_loss: 0.8524| %_mask_idx: 0.36717| ppl: 94.4659| %_neg_is_pos: 0.03009| lr: 0.0| temp: 1.99656 | loss: 1.17417| constrast_loss: 4.6112| div_loss: 0.8547| %_mask_idx: 0.38362| ppl: 92.99419| %_neg_is_pos: 0.02192| lr: 0.0| temp: 1.99656 | loss: 1.17363| constrast_loss: 4.60907| div_loss: 0.85455| %_mask_idx: 0.40492| ppl: 93.08783| %_neg_is_pos: 0.0328| lr: 0.0| temp: 1.99655 | loss: 1.17397| constrast_loss: 4.61108| div_loss: 0.84806| %_mask_idx: 0.40006| ppl: 97.24117| %_neg_is_pos: 0.01405| lr: 0.0| temp: 1.99655 | loss: 1.17264| constrast_loss: 4.60463| div_loss: 0.85923| %_mask_idx: 0.37108| ppl: 90.09319| %_neg_is_pos: 0.02796| lr: 0.0| temp: 1.99654 | loss: 1.1748| constrast_loss: 4.61371| div_loss: 0.85486| %_mask_idx: 0.39991| ppl: 92.88969| %_neg_is_pos: 0.0222| lr: 0.0| temp: 1.99654 | loss: 1.17357| constrast_loss: 4.60795| div_loss: 0.86337| %_mask_idx: 0.3927| ppl: 87.44305| %_neg_is_pos: 0.02865| lr: 0.0| temp: 1.99652 | loss: 1.17481| constrast_loss: 4.61288| div_loss: 0.86369| %_mask_idx: 0.35636| ppl: 87.23528| %_neg_is_pos: 0.02405| lr: 0.0| temp: 1.99652 | loss: 1.17305| constrast_loss: 4.60641| div_loss: 0.85801| %_mask_idx: 0.44204| ppl: 90.87404| %_neg_is_pos: 0.01991| lr: 0.0| temp: 1.99651 | loss: 1.17359| constrast_loss: 4.60946| div_loss: 0.84919| %_mask_idx: 0.38503| ppl: 96.51875| %_neg_is_pos: 0.01493| lr: 0.0| temp: 1.99651 | loss: 1.17107| constrast_loss: 4.59684| div_loss: 0.87445| %_mask_idx: 0.37798| ppl: 80.35025| %_neg_is_pos: 0.04944| lr: 0.0| temp: 1.9965 | loss: 1.17439| constrast_loss: 4.61169| div_loss: 0.85859| %_mask_idx: 0.38064| ppl: 90.50395| %_neg_is_pos: 0.02207| lr: 0.0| temp: 1.9965 | loss: 1.17183| constrast_loss: 4.60083| div_loss: 0.86488| %_mask_idx: 0.3692| ppl: 86.48| %_neg_is_pos: 0.02648| lr: 0.0| temp: 1.99649 | loss: 1.17385| constrast_loss: 4.60975| div_loss: 0.85655| %_mask_idx: 0.40429| ppl: 91.80971| %_neg_is_pos: 0.01597| lr: 0.0| temp: 1.99649 | loss: 1.17475| constrast_loss: 4.61332| div_loss: 0.85693| %_mask_idx: 0.3833| ppl: 91.56357| %_neg_is_pos: 0.01828| lr: 0.0| temp: 1.99647 | loss: 1.17219| constrast_loss: 4.60175| div_loss: 0.87016| %_mask_idx: 0.34398| ppl: 83.09576| %_neg_is_pos: 0.0393| lr: 0.0| temp: 1.99647 | loss: 1.17494| constrast_loss: 4.61513| div_loss: 0.84628| %_mask_idx: 0.42794| ppl: 98.37845| %_neg_is_pos: 0.00863| lr: 0.0| temp: 1.99646 | loss: 1.17223| constrast_loss: 4.60259| div_loss: 0.86351| %_mask_idx: 0.38221| ppl: 87.35269| %_neg_is_pos: 0.03356| lr: 0.0| temp: 1.99646 | loss: 1.1746| constrast_loss: 4.61325| div_loss: 0.85133| %_mask_idx: 0.41181| ppl: 95.14841| %_neg_is_pos: 0.01201| lr: 0.0| temp: 1.99645 | loss: 1.17371| constrast_loss: 4.60862| div_loss: 0.86225| %_mask_idx: 0.3479| ppl: 88.16216| %_neg_is_pos: 0.02394| lr: 0.0| temp: 1.99645 | loss: 1.17528| constrast_loss: 4.61692| div_loss: 0.84204| %_mask_idx: 0.39286| ppl: 101.09465| %_neg_is_pos: 0.01608| lr: 0.0| temp: 1.99644 | loss: 1.17355| constrast_loss: 4.60872| div_loss: 0.85476| %_mask_idx: 0.38315| ppl: 92.95538| %_neg_is_pos: 0.02205| lr: 0.0| temp: 1.99644 | loss: 1.17227| constrast_loss: 4.6031| div_loss: 0.85989| %_mask_idx: 0.38878| ppl: 89.66879| %_neg_is_pos: 0.01922| lr: 0.0| temp: 1.99642| loss: 1.17331| constrast_loss: 4.60735| div_loss: 0.85904| %_mask_idx: 0.38941| ppl: 90.2142| %_neg_is_pos: 0.02006| lr: 0.0| temp: 1.99642 | loss: 1.17347| constrast_loss: 4.60812| div_loss: 0.85772| %_mask_idx: 0.36858| ppl: 91.05845| %_neg_is_pos: 0.03255| lr: 0.0| temp: 1.99641 | loss: 1.1741| constrast_loss: 4.61121| div_loss: 0.85175| %_mask_idx: 0.43828| ppl: 94.88307| %_neg_is_pos: 0.01408| lr: 0.0| temp: 1.99641 | loss: 1.17397| constrast_loss: 4.61131| div_loss: 0.84582| %_mask_idx: 0.41463| ppl: 98.6774| %_neg_is_pos: 0.01669| lr: 0.0| temp: 1.99639 | loss: 1.1725| constrast_loss: 4.60501| div_loss: 0.84982| %_mask_idx: 0.39568| ppl: 96.11305| %_neg_is_pos: 0.01977| lr: 0.0| temp: 1.99639 | loss: 1.17332| constrast_loss: 4.60771| div_loss: 0.85577| %_mask_idx: 0.3761| ppl: 92.3102| %_neg_is_pos: 0.02687| lr: 0.0| temp: 1.99638 | loss: 1.17289| constrast_loss: 4.6055| div_loss: 0.86071| %_mask_idx: 0.40962| ppl: 89.14761| %_neg_is_pos: 0.01624| lr: 0.0| temp: 1.99638 | loss: 1.17292| constrast_loss: 4.6055| div_loss: 0.86165| %_mask_idx: 0.33412| ppl: 88.54575| %_neg_is_pos: 0.02093| lr: 0.0| temp: 1.99637 | loss: 1.17385| constrast_loss: 4.60996| div_loss: 0.85423| %_mask_idx: 0.45269| ppl: 93.29004| %_neg_is_pos: 0.01256| lr: 0.0| temp: 1.99637 | loss: 1.17179| constrast_loss: 4.60115| div_loss: 0.86004| %_mask_idx: 0.35573| ppl: 89.57408| %_neg_is_pos: 0.02719| lr: 0.0| temp: 1.99636 | loss: 1.17386| constrast_loss: 4.61102| div_loss: 0.84426| %_mask_idx: 0.42685| ppl: 99.67081| %_neg_is_pos: 0.01265| lr: 0.0| temp: 1.99636 | loss: 1.17215| constrast_loss: 4.60232| div_loss: 0.86281| %_mask_idx: 0.4256| ppl: 87.80103| %_neg_is_pos: 0.01976| lr: 0.0| temp: 1.99634 | loss: 1.17374| constrast_loss: 4.60954| div_loss: 0.85421| %_mask_idx: 0.36513| ppl: 93.30497| %_neg_is_pos: 0.02083| lr: 0.0| temp: 1.99634 | loss: 1.17268| constrast_loss: 4.6058| div_loss: 0.84929| %_mask_idx: 0.40006| ppl: 96.45306| %_neg_is_pos: 0.02296| lr: 0.0| temp: 1.99633 | loss: 1.17309| constrast_loss: 4.60672| div_loss: 0.85635| %_mask_idx: 0.37766| ppl: 91.93733| %_neg_is_pos: 0.02333| lr: 0.0| temp: 1.99633 | loss: 1.17402| constrast_loss: 4.61114| div_loss: 0.84935| %_mask_idx: 0.38863| ppl: 96.41645| %_neg_is_pos: 0.02067| lr: 0.0| temp: 1.99632 | loss: 1.17278| constrast_loss: 4.60508| div_loss: 0.86036| %_mask_idx: 0.36654| ppl: 89.3708| %_neg_is_pos: 0.02951| lr: 0.0| temp: 1.99632 | loss: 1.17278| constrast_loss: 4.60518| div_loss: 0.85938| %_mask_idx: 0.38816| ppl: 89.99402| %_neg_is_pos: 0.02927| lr: 0.0| temp: 1.99631 | loss: 1.1739| constrast_loss: 4.61078| div_loss: 0.84827| %_mask_idx: 0.41729| ppl: 97.10664| %_neg_is_pos: 0.01379| lr: 0.0| temp: 1.99631 | loss: 1.17419| constrast_loss: 4.61187| div_loss: 0.84899| %_mask_idx: 0.38722| ppl: 96.64659| %_neg_is_pos: 0.0154| lr: 0.0| temp: 1.99629 | loss: 1.17413| constrast_loss: 4.61139| div_loss: 0.85136| %_mask_idx: 0.35385| ppl: 95.1322| %_neg_is_pos: 0.01641| lr: 0.0| temp: 1.99629 | loss: 1.17401| constrast_loss: 4.61042| div_loss: 0.85633| %_mask_idx: 0.41698| ppl: 91.94647| %_neg_is_pos: 0.01194| lr: 0.0| temp: 1.99628 | loss: 1.17341| constrast_loss: 4.60881| div_loss: 0.84838| %_mask_idx: 0.42951| ppl: 97.0392| %_neg_is_pos: 0.01806| lr: 0.0| temp: 1.99628 | loss: 1.17347| constrast_loss: 4.60847| div_loss: 0.8543| %_mask_idx: 0.41745| ppl: 93.24696| %_neg_is_pos: 0.01863| lr: 0.0| temp: 1.99627 | loss: 1.17413| constrast_loss: 4.61111| div_loss: 0.85433| %_mask_idx: 0.39286| ppl: 93.22746| %_neg_is_pos: 0.01616| lr: 0.0| temp: 1.99627 | loss: 1.17378| constrast_loss: 4.61081| div_loss: 0.84321| %_mask_idx: 0.39035| ppl: 100.34876| %_neg_is_pos: 0.01667| lr: 0.0| temp: 1.99626 | loss: 1.17332| constrast_loss: 4.60848| div_loss: 0.84804| %_mask_idx: 0.38565| ppl: 97.25431| %_neg_is_pos: 0.02341| lr: 0.0| temp: 1.99626 | loss: 1.17345| constrast_loss: 4.60923| div_loss: 0.8459| %_mask_idx: 0.36012| ppl: 98.6209| %_neg_is_pos: 0.01969| lr: 0.0| temp: 1.99624 | loss: 1.1732| constrast_loss: 4.60742| div_loss: 0.85377| %_mask_idx: 0.37437| ppl: 93.5845| %_neg_is_pos: 0.02942| lr: 0.0| temp: 1.99624 | loss: 1.17404| constrast_loss: 4.61025| div_loss: 0.85927| %_mask_idx: 0.38471| ppl: 90.06765| %_neg_is_pos: 0.02444| lr: 0.0| temp: 1.99623 | loss: 1.17375| constrast_loss: 4.61055| div_loss: 0.84458| %_mask_idx: 0.39113| ppl: 99.46645| %_neg_is_pos: 0.01529| lr: 0.0| temp: 1.99623 | loss: 1.17385| constrast_loss: 4.6099| div_loss: 0.85481| %_mask_idx: 0.33537| ppl: 92.91895| %_neg_is_pos: 0.01896| lr: 0.0| temp: 1.99621 | loss: 1.17104| constrast_loss: 4.59844| div_loss: 0.85719| %_mask_idx: 0.3938| ppl: 91.39672| %_neg_is_pos: 0.0413| lr: 0.0| temp: 1.99621 | loss: 1.17416| constrast_loss: 4.61178| div_loss: 0.84861| %_mask_idx: 0.4068| ppl: 96.88729| %_neg_is_pos: 0.01669| lr: 0.0| temp: 1.9962 | loss: 1.17342| constrast_loss: 4.60789| div_loss: 0.85798| %_mask_idx: 0.43515| ppl: 90.89264| %_neg_is_pos: 0.01033| lr: 0.0| temp: 1.9962 | loss: 1.17352| constrast_loss: 4.60934| div_loss: 0.84745| %_mask_idx: 0.40367| ppl: 97.63309| %_neg_is_pos: 0.01347| lr: 0.0| temp: 1.99619 | loss: 1.17252| constrast_loss: 4.60437| div_loss: 0.85729| %_mask_idx: 0.36341| ppl: 91.33614| %_neg_is_pos: 0.02154| lr: 0.0| temp: 1.99619 | loss: 1.17408| constrast_loss: 4.61191| div_loss: 0.84425| %_mask_idx: 0.40226| ppl: 99.68012| %_neg_is_pos: 0.01513| lr: 0.0| temp: 1.99618 | loss: 1.17271| constrast_loss: 4.60552| div_loss: 0.85335| %_mask_idx: 0.33788| ppl: 93.85446| %_neg_is_pos: 0.02681| lr: 0.0| temp: 1.99618 | loss: 1.17334| constrast_loss: 4.60777| div_loss: 0.85609| %_mask_idx: 0.3703| ppl: 92.104| %_neg_is_pos: 0.03179| lr: 0.0| temp: 1.99616 | loss: 1.17246| constrast_loss: 4.60497| div_loss: 0.84881| %_mask_idx: 0.37657| ppl: 96.76393| %_neg_is_pos: 0.02845| lr: 0.0| temp: 1.99616 | loss: 1.17404| constrast_loss: 4.61172| div_loss: 0.8445| %_mask_idx: 0.39489| ppl: 99.52027| %_neg_is_pos: 0.01488| lr: 0.0| temp: 1.99615 | loss: 1.17347| constrast_loss: 4.60942| div_loss: 0.84476| %_mask_idx: 0.37218| ppl: 99.35583| %_neg_is_pos: 0.02729| lr: 0.0| temp: 1.99615 | loss: 1.1734| constrast_loss: 4.60705| div_loss: 0.86546| %_mask_idx: 0.39677| ppl: 86.10349| %_neg_is_pos: 0.02778| lr: 0.0| temp: 1.99614 | loss: 1.17388| constrast_loss: 4.61017| div_loss: 0.85354| %_mask_idx: 0.42591| ppl: 93.7373| %_neg_is_pos: 0.01397| lr: 0.0| temp: 1.99614 | loss: 1.17413| constrast_loss: 4.61072| div_loss: 0.85798| %_mask_idx: 0.39881| ppl: 90.89023| %_neg_is_pos: 0.01497| lr: 0.0| temp: 1.99613 | loss: 1.17287| constrast_loss: 4.60598| div_loss: 0.85485| %_mask_idx: 0.401| ppl: 92.89719| %_neg_is_pos: 0.02173| lr: 0.0| temp: 1.99613 | loss: 1.17247| constrast_loss: 4.60374| div_loss: 0.8615| %_mask_idx: 0.33584| ppl: 88.63789| %_neg_is_pos: 0.03357| lr: 0.0| temp: 1.99611 | loss: 1.17294| constrast_loss: 4.60599| div_loss: 0.8577| %_mask_idx: 0.45927| ppl: 91.07379| %_neg_is_pos: 0.0175| lr: 0.0| temp: 1.99611 | loss: 1.17248| constrast_loss: 4.604| div_loss: 0.85942| %_mask_idx: 0.41228| ppl: 89.97253| %_neg_is_pos: 0.01901| lr: 0.0| temp: 1.9961 | loss: 1.17362| constrast_loss: 4.60888| div_loss: 0.85593| %_mask_idx: 0.3974| ppl: 92.20567| %_neg_is_pos: 0.01564| lr: 0.0| temp: 1.9961 | loss: 1.17273| constrast_loss: 4.60491| div_loss: 0.86003| %_mask_idx: 0.41275| ppl: 89.58054| %_neg_is_pos: 0.02462| lr: 0.0| temp: 1.99609 | loss: 1.17362| constrast_loss: 4.60949| div_loss: 0.85003| %_mask_idx: 0.36967| ppl: 95.98249| %_neg_is_pos: 0.02555| lr: 0.0| temp: 1.99609 | loss: 1.17434| constrast_loss: 4.61226| div_loss: 0.85079| %_mask_idx: 0.36717| ppl: 95.49339| %_neg_is_pos: 0.02095| lr: 0.0| temp: 1.99608 | loss: 1.17425| constrast_loss: 4.611| div_loss: 0.85976| %_mask_idx: 0.32989| ppl: 89.75613| %_neg_is_pos: 0.02546| lr: 0.0| temp: 1.99608 | loss: 1.17289| constrast_loss: 4.60674| div_loss: 0.84813| %_mask_idx: 0.41259| ppl: 97.19953| %_neg_is_pos: 0.02026| lr: 0.0| temp: 1.99606 | loss: 1.17293| constrast_loss: 4.60636| div_loss: 0.85363| %_mask_idx: 0.37688| ppl: 93.67563| %_neg_is_pos: 0.02214| lr: 0.0| temp: 1.99606 | loss: 1.17449| constrast_loss: 4.61228| div_loss: 0.85699| %_mask_idx: 0.43515| ppl: 91.52747| %_neg_is_pos: 0.01294| lr: 0.0| temp: 1.99605 | loss: 1.17386| constrast_loss: 4.60957| div_loss: 0.85857| %_mask_idx: 0.38142| ppl: 90.51723| %_neg_is_pos: 0.01895| lr: 0.0| temp: 1.99605 | loss: 1.17315| constrast_loss: 4.60761| div_loss: 0.8499| %_mask_idx: 0.39129| ppl: 96.06514| %_neg_is_pos: 0.02469| lr: 0.0| temp: 1.99603 | loss: 1.17379| constrast_loss: 4.61037| div_loss: 0.84784| %_mask_idx: 0.3114| ppl: 97.38416| %_neg_is_pos: 0.03022| lr: 0.0| temp: 1.99603 | loss: 1.17383| constrast_loss: 4.61025| div_loss: 0.85091| %_mask_idx: 0.33584| ppl: 95.41696| %_neg_is_pos: 0.01537| lr: 0.0| temp: 1.99602 | loss: 1.17386| constrast_loss: 4.61062| div_loss: 0.84834| %_mask_idx: 0.38831| ppl: 97.06057| %_neg_is_pos: 0.01421| lr: 0.0| temp: 1.99602 | loss: 1.17289| constrast_loss: 4.60584| div_loss: 0.85732| %_mask_idx: 0.42982| ppl: 91.31441| %_neg_is_pos: 0.0199| lr: 0.0| temp: 1.99601 | loss: 1.1713| constrast_loss: 4.59932| div_loss: 0.85876| %_mask_idx: 0.3656| ppl: 90.39194| %_neg_is_pos: 0.03356| lr: 0.0| temp: 1.99601 | loss: 1.17327| constrast_loss: 4.60802| div_loss: 0.85048| %_mask_idx: 0.3396| ppl: 95.69153| %_neg_is_pos: 0.02047| lr: 0.0| temp: 1.996 | loss: 1.17362| constrast_loss: 4.60992| div_loss: 0.84562| %_mask_idx: 0.39944| ppl: 98.80222| %_neg_is_pos: 0.01664| lr: 0.0| temp: 1.996 | loss: 1.17335| constrast_loss: 4.6087| div_loss: 0.84721| %_mask_idx: 0.39803| ppl: 97.78722| %_neg_is_pos: 0.02007| lr: 0.0| temp: 1.99598 | loss: 1.17379| constrast_loss: 4.61059| div_loss: 0.84564| %_mask_idx: 0.3891| ppl: 98.79044| %_neg_is_pos: 0.01497| lr: 0.0| temp: 1.99598 | loss: 1.17286| constrast_loss: 4.6052| div_loss: 0.8622| %_mask_idx: 0.4162| ppl: 88.18974| %_neg_is_pos: 0.02236| lr: 0.0| temp: 1.99597 | loss: 1.17322| constrast_loss: 4.60639| div_loss: 0.86495| %_mask_idx: 0.40335| ppl: 86.43118| %_neg_is_pos: 0.02156| lr: 0.0| temp: 1.99597 | loss: 1.17342| constrast_loss: 4.60844| div_loss: 0.85241| %_mask_idx: 0.4375| ppl: 94.46053| %_neg_is_pos: 0.01117| lr: 0.0| temp: 1.99596 | loss: 1.1741| constrast_loss: 4.61206| div_loss: 0.84342| %_mask_idx: 0.44455| ppl: 100.21336| %_neg_is_pos: 0.00896| lr: 0.0| temp: 1.99596 | loss: 1.17288| constrast_loss: 4.60669| div_loss: 0.84825| %_mask_idx: 0.43703| ppl: 97.11685| %_neg_is_pos: 0.01981| lr: 0.0| temp: 1.99595 | loss: 1.17319| constrast_loss: 4.60598| div_loss: 0.86784| %_mask_idx: 0.36466| ppl: 84.58409| %_neg_is_pos: 0.02323| lr: 0.0| temp: 1.99595 | loss: 1.17383| constrast_loss: 4.61| div_loss: 0.85342| %_mask_idx: 0.35934| ppl: 93.8091| %_neg_is_pos: 0.01559| lr: 0.0| temp: 1.99593 | loss: 1.17333| constrast_loss: 4.60855| div_loss: 0.84782| %_mask_idx: 0.39489| ppl: 97.39225| %_neg_is_pos: 0.01714| lr: 0.0| temp: 1.99593 | loss: 1.17266| constrast_loss: 4.60498| div_loss: 0.85656| %_mask_idx: 0.37798| ppl: 91.79842| %_neg_is_pos: 0.02391| lr: 0.0| temp: 1.99592 | loss: 1.17414| constrast_loss: 4.61222| div_loss: 0.84365| %_mask_idx: 0.41886| ppl: 100.06503| %_neg_is_pos: 0.01321| lr: 0.0| temp: 1.99592 | loss: 1.17427| constrast_loss: 4.61216| div_loss: 0.84907| %_mask_idx: 0.40711| ppl: 96.59473| %_neg_is_pos: 0.02036| lr: 0.0| temp: 1.99591 | loss: 1.17355| constrast_loss: 4.60923| div_loss: 0.84987| %_mask_idx: 0.35589| ppl: 96.08032| %_neg_is_pos: 0.01787| lr: 0.0| temp: 1.99591 | loss: 1.17353| constrast_loss: 4.60773| div_loss: 0.86387| %_mask_idx: 0.36278| ppl: 87.12532| %_neg_is_pos: 0.02151| lr: 0.0| temp: 1.9959 | loss: 1.17109| constrast_loss: 4.59832| div_loss: 0.86048| %_mask_idx: 0.31814| ppl: 89.2938| %_neg_is_pos: 0.03117| lr: 0.0| temp: 1.9959 | loss: 1.17327| constrast_loss: 4.6077| div_loss: 0.85391| %_mask_idx: 0.41823| ppl: 93.4973| %_neg_is_pos: 0.01467| lr: 0.0| temp: 1.99588 | loss: 1.17397| constrast_loss: 4.61054| div_loss: 0.85333| %_mask_idx: 0.42293| ppl: 93.8692| %_neg_is_pos: 0.01827| lr: 0.0| temp: 1.99588 | loss: 1.17246| constrast_loss: 4.60387| div_loss: 0.85971| %_mask_idx: 0.40335| ppl: 89.78291| %_neg_is_pos: 0.02175| lr: 0.0| temp: 1.99587 | loss: 1.17421| constrast_loss: 4.61293| div_loss: 0.83911| %_mask_idx: 0.40993| ppl: 102.9697| %_neg_is_pos: 0.01476| lr: 0.0| temp: 1.99587 [2021-09-01 22:10:56,829] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 262144.0, reducing to 131072.0 [2021-09-01 22:10:56,829] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 262144.0, reducing to 131072.0 | loss: 1.17339| constrast_loss: 4.60765| div_loss: 0.85902| %_mask_idx: 0.42967| ppl: 90.22466| %_neg_is_pos: 0.01831| lr: 0.0| temp: 1.99585 | loss: 1.1725| constrast_loss: 4.60356| div_loss: 0.8644| %_mask_idx: 0.42152| ppl: 86.78316| %_neg_is_pos: 0.02476| lr: 0.0| temp: 1.99585 | loss: 1.17372| constrast_loss: 4.60973| div_loss: 0.85142| %_mask_idx: 0.45113| ppl: 95.09105| %_neg_is_pos: 0.01546| lr: 0.0| temp: 1.99584 | loss: 1.17397| constrast_loss: 4.61138| div_loss: 0.84514| %_mask_idx: 0.39693| ppl: 99.11339| %_neg_is_pos: 0.0139| lr: 0.0| temp: 1.99584 | loss: 1.17413| constrast_loss: 4.61071| div_loss: 0.85794| %_mask_idx: 0.35808| ppl: 90.91729| %_neg_is_pos: 0.01958| lr: 0.0| temp: 1.99583 | loss: 1.17468| constrast_loss: 4.61395| div_loss: 0.84771| %_mask_idx: 0.40053| ppl: 97.46489| %_neg_is_pos: 0.01056| lr: 0.0| temp: 1.99583 | loss: 1.17337| constrast_loss: 4.60892| div_loss: 0.84548| %_mask_idx: 0.3938| ppl: 98.89574| %_neg_is_pos: 0.01667| lr: 0.0| temp: 1.99582 | loss: 1.17333| constrast_loss: 4.60706| div_loss: 0.86256| %_mask_idx: 0.39098| ppl: 87.96104| %_neg_is_pos: 0.03004| lr: 0.0| temp: 1.99582 | loss: 1.1737| constrast_loss: 4.60987| div_loss: 0.84927| %_mask_idx: 0.38831| ppl: 96.46516| %_neg_is_pos: 0.01348| lr: 0.0| temp: 1.9958 | loss: 1.17327| constrast_loss: 4.60717| div_loss: 0.85929| %_mask_idx: 0.37594| ppl: 90.05485| %_neg_is_pos: 0.02002| lr: 0.0| temp: 1.9958 | loss: 1.17338| constrast_loss: 4.60807| div_loss: 0.85463| %_mask_idx: 0.38737| ppl: 93.03891| %_neg_is_pos: 0.01546| lr: 0.0| temp: 1.99579 | loss: 1.17341| constrast_loss: 4.60914| div_loss: 0.84507| %_mask_idx: 0.38017| ppl: 99.15746| %_neg_is_pos: 0.01538| lr: 0.0| temp: 1.99579 | loss: 1.17308| constrast_loss: 4.607| div_loss: 0.85297| %_mask_idx: 0.3786| ppl: 94.09992| %_neg_is_pos: 0.02636| lr: 0.0| temp: 1.99578 | loss: 1.17378| constrast_loss: 4.61145| div_loss: 0.83677| %_mask_idx: 0.42466| ppl: 104.46962| %_neg_is_pos: 0.00902| lr: 0.0| temp: 1.99578 | loss: 1.17347| constrast_loss: 4.60965| div_loss: 0.84231| %_mask_idx: 0.43766| ppl: 100.92311| %_neg_is_pos: 0.01108| lr: 0.0| temp: 1.99577 | loss: 1.1743| constrast_loss: 4.61358| div_loss: 0.83599| %_mask_idx: 0.37813| ppl: 104.96521| %_neg_is_pos: 0.01491| lr: 0.0| temp: 1.99577 | loss: 1.17394| constrast_loss: 4.61251| div_loss: 0.83231| %_mask_idx: 0.40695| ppl: 107.32199| %_neg_is_pos: 0.01337| lr: 0.0| temp: 1.99575 | loss: 1.17366| constrast_loss: 4.61178| div_loss: 0.82874| %_mask_idx: 0.42043| ppl: 109.6073| %_neg_is_pos: 0.01553| lr: 0.0| temp: 1.99575 | loss: 1.1738| constrast_loss: 4.61038| div_loss: 0.84808| %_mask_idx: 0.37171| ppl: 97.2282| %_neg_is_pos: 0.01828| lr: 0.0| temp: 1.99574 | loss: 1.1718| constrast_loss: 4.60294| div_loss: 0.84244| %_mask_idx: 0.3916| ppl: 100.83849| %_neg_is_pos: 0.02867| lr: 0.0| temp: 1.99574 | loss: 1.17444| constrast_loss: 4.61412| div_loss: 0.83633| %_mask_idx: 0.39395| ppl: 104.74844| %_neg_is_pos: 0.01516| lr: 0.0| temp: 1.99573 | loss: 1.17388| constrast_loss: 4.61052| div_loss: 0.84992| %_mask_idx: 0.46115| ppl: 96.04907| %_neg_is_pos: 0.01902| lr: 0.0| temp: 1.99573 | loss: 1.17275| constrast_loss: 4.60625| div_loss: 0.84747| %_mask_idx: 0.44048| ppl: 97.618| %_neg_is_pos: 0.01931| lr: 0.0| temp: 1.99572 | loss: 1.17217| constrast_loss: 4.60439| div_loss: 0.84273| %_mask_idx: 0.38001| ppl: 100.65369| %_neg_is_pos: 0.033| lr: 0.0| temp: 1.99572 | loss: 1.17207| constrast_loss: 4.6021| div_loss: 0.86183| %_mask_idx: 0.36811| ppl: 88.43013| %_neg_is_pos: 0.03313| lr: 0.0| temp: 1.9957 | loss: 1.17393| constrast_loss: 4.61181| div_loss: 0.839| %_mask_idx: 0.31908| ppl: 103.03878| %_neg_is_pos: 0.01933| lr: 0.0| temp: 1.9957 | loss: 1.17369| constrast_loss: 4.61082| div_loss: 0.83948| %_mask_idx: 0.39724| ppl: 102.73003| %_neg_is_pos: 0.01486| lr: 0.0| temp: 1.99569 | loss: 1.172| constrast_loss: 4.60362| div_loss: 0.84367| %_mask_idx: 0.37923| ppl: 100.05403| %_neg_is_pos: 0.03118| lr: 0.0| temp: 1.99569 | loss: 1.17266| constrast_loss: 4.60574| div_loss: 0.84885| %_mask_idx: 0.33036| ppl: 96.73669| %_neg_is_pos: 0.0286| lr: 0.0| temp: 1.99567 | loss: 1.17327| constrast_loss: 4.60943| div_loss: 0.83662| %_mask_idx: 0.42591| ppl: 104.56389| %_neg_is_pos: 0.01306| lr: 0.0| temp: 1.99567 | loss: 1.17408| constrast_loss: 4.61361| div_loss: 0.82729| %_mask_idx: 0.37265| ppl: 110.53484| %_neg_is_pos: 0.01515| lr: 0.0| temp: 1.99566 | loss: 1.17425| constrast_loss: 4.61436| div_loss: 0.82658| %_mask_idx: 0.40695| ppl: 110.98778| %_neg_is_pos: 0.01334| lr: 0.0| temp: 1.99566 | loss: 1.17283| constrast_loss: 4.60764| div_loss: 0.83668| %_mask_idx: 0.40774| ppl: 104.52233| %_neg_is_pos: 0.02297| lr: 0.0| temp: 1.99565 | loss: 1.17204| constrast_loss: 4.60301| div_loss: 0.85144| %_mask_idx: 0.42998| ppl: 95.08086| %_neg_is_pos: 0.02703| lr: 0.0| temp: 1.99565 | loss: 1.17348| constrast_loss: 4.61032| div_loss: 0.83595| %_mask_idx: 0.36858| ppl: 104.99092| %_neg_is_pos: 0.01751| lr: 0.0| temp: 1.99564 | loss: 1.1721| constrast_loss: 4.60418| div_loss: 0.84238| %_mask_idx: 0.40038| ppl: 100.87669| %_neg_is_pos: 0.02357| lr: 0.0| temp: 1.99564 | loss: 1.17312| constrast_loss: 4.60848| div_loss: 0.83983| %_mask_idx: 0.34492| ppl: 102.50981| %_neg_is_pos: 0.02184| lr: 0.0| temp: 1.99562 | loss: 1.17278| constrast_loss: 4.60704| div_loss: 0.84094| %_mask_idx: 0.39787| ppl: 101.79721| %_neg_is_pos: 0.02703| lr: 0.0| temp: 1.99562 | loss: 1.17327| constrast_loss: 4.60902| div_loss: 0.84063| %_mask_idx: 0.38283| ppl: 101.99976| %_neg_is_pos: 0.02485| lr: 0.0| temp: 1.99561 | loss: 1.17261| constrast_loss: 4.60719| div_loss: 0.83265| %_mask_idx: 0.35432| ppl: 107.10559| %_neg_is_pos: 0.02224| lr: 0.0| temp: 1.99561 | loss: 1.1725| constrast_loss: 4.60676| div_loss: 0.83252| %_mask_idx: 0.33662| ppl: 107.18684| %_neg_is_pos: 0.02994| lr: 0.0| temp: 1.9956 | loss: 1.17327| constrast_loss: 4.60987| div_loss: 0.83196| %_mask_idx: 0.36372| ppl: 107.54797| %_neg_is_pos: 0.02223| lr: 0.0| temp: 1.9956 | loss: 1.17338| constrast_loss: 4.60858| div_loss: 0.84927| %_mask_idx: 0.36059| ppl: 96.46535| %_neg_is_pos: 0.02565| lr: 0.0| temp: 1.99559 | loss: 1.17299| constrast_loss: 4.6087| div_loss: 0.83252| %_mask_idx: 0.41839| ppl: 107.18936| %_neg_is_pos: 0.01997| lr: 0.0| temp: 1.99559 | loss: 1.1738| constrast_loss: 4.61138| div_loss: 0.83822| %_mask_idx: 0.42841| ppl: 103.54063| %_neg_is_pos: 0.01214| lr: 0.0| temp: 1.99557 | loss: 1.17298| constrast_loss: 4.60781| div_loss: 0.84106| %_mask_idx: 0.42841| ppl: 101.71871| %_neg_is_pos: 0.01447| lr: 0.0| temp: 1.99557 | loss: 1.17305| constrast_loss: 4.60841| div_loss: 0.83779| %_mask_idx: 0.39442| ppl: 103.81168| %_neg_is_pos: 0.01341| lr: 0.0| temp: 1.99556 | loss: 1.17039| constrast_loss: 4.5967| div_loss: 0.84851| %_mask_idx: 0.34305| ppl: 96.95108| %_neg_is_pos: 0.04845| lr: 0.0| temp: 1.99556 | loss: 1.17238| constrast_loss: 4.60463| div_loss: 0.84874| %_mask_idx: 0.38142| ppl: 96.80466| %_neg_is_pos: 0.03306| lr: 0.0| temp: 1.99555 | loss: 1.17269| constrast_loss: 4.60727| div_loss: 0.83483| %_mask_idx: 0.42528| ppl: 105.71049| %_neg_is_pos: 0.0191| lr: 0.0| temp: 1.99555 | loss: 1.17289| constrast_loss: 4.60893| div_loss: 0.82651| %_mask_idx: 0.40445| ppl: 111.03108| %_neg_is_pos: 0.01705| lr: 0.0| temp: 1.99554 | loss: 1.17285| constrast_loss: 4.60778| div_loss: 0.83616| %_mask_idx: 0.37876| ppl: 104.85551| %_neg_is_pos: 0.02218| lr: 0.0| temp: 1.99554 | loss: 1.17355| constrast_loss: 4.61154| div_loss: 0.8267| %_mask_idx: 0.4021| ppl: 110.91149| %_neg_is_pos: 0.00683| lr: 0.0| temp: 1.99553 | loss: 1.17337| constrast_loss: 4.60937| div_loss: 0.84115| %_mask_idx: 0.42199| ppl: 101.66151| %_neg_is_pos: 0.01323| lr: 0.0| temp: 1.99553 | loss: 1.17259| constrast_loss: 4.60421| div_loss: 0.86162| %_mask_idx: 0.37766| ppl: 88.56598| %_neg_is_pos: 0.02664| lr: 0.0| temp: 1.99551 | loss: 1.17323| constrast_loss: 4.60929| div_loss: 0.83638| %_mask_idx: 0.41103| ppl: 104.71892| %_neg_is_pos: 0.02067| lr: 0.0| temp: 1.99551 | loss: 1.17329| constrast_loss: 4.60919| div_loss: 0.83962| %_mask_idx: 0.34978| ppl: 102.64624| %_neg_is_pos: 0.0246| lr: 0.0| temp: 1.9955 | loss: 1.1744| constrast_loss: 4.61438| div_loss: 0.8321| %_mask_idx: 0.39019| ppl: 107.45406| %_neg_is_pos: 0.00877| lr: 0.0| temp: 1.9955 | loss: 1.1723| constrast_loss: 4.60471| div_loss: 0.84507| %_mask_idx: 0.3714| ppl: 99.15645| %_neg_is_pos: 0.02087| lr: 0.0| temp: 1.99549 | loss: 1.17422| constrast_loss: 4.61379| div_loss: 0.83095| %_mask_idx: 0.37437| ppl: 108.19443| %_neg_is_pos: 0.01545| lr: 0.0| temp: 1.99549 | loss: 1.17269| constrast_loss: 4.60686| div_loss: 0.839| %_mask_idx: 0.38863| ppl: 103.03847| %_neg_is_pos: 0.02141| lr: 0.0| temp: 1.99548 | loss: 1.17355| constrast_loss: 4.60931| div_loss: 0.84901| %_mask_idx: 0.34461| ppl: 96.63587| %_neg_is_pos: 0.01937| lr: 0.0| temp: 1.99548 | loss: 1.1708| constrast_loss: 4.59959| div_loss: 0.83594| %_mask_idx: 0.37296| ppl: 104.99608| %_neg_is_pos: 0.04979| lr: 0.0| temp: 1.99547 | loss: 1.17325| constrast_loss: 4.6096| div_loss: 0.83404| %_mask_idx: 0.33803| ppl: 106.21356| %_neg_is_pos: 0.01596| lr: 0.0| temp: 1.99547 | loss: 1.17157| constrast_loss: 4.60096| div_loss: 0.85318| %_mask_idx: 0.35573| ppl: 93.96725| %_neg_is_pos: 0.0345| lr: 0.0| temp: 1.99545 | loss: 1.17238| constrast_loss: 4.60552| div_loss: 0.83996| %_mask_idx: 0.37657| ppl: 102.42863| %_neg_is_pos: 0.02908| lr: 0.0| temp: 1.99545 | loss: 1.17414| constrast_loss: 4.61343| div_loss: 0.83118| %_mask_idx: 0.42873| ppl: 108.04202| %_neg_is_pos: 0.00872| lr: 0.0| temp: 1.99544 | loss: 1.17379| constrast_loss: 4.61188| div_loss: 0.83271| %_mask_idx: 0.39019| ppl: 107.06313| %_neg_is_pos: 0.01151| lr: 0.0| temp: 1.99544 | loss: 1.17308| constrast_loss: 4.60821| div_loss: 0.84121| %_mask_idx: 0.37281| ppl: 101.62399| %_neg_is_pos: 0.01264| lr: 0.0| temp: 1.99543 | loss: 1.17041| constrast_loss: 4.59628| div_loss: 0.85373| %_mask_idx: 0.336| ppl: 93.61237| %_neg_is_pos: 0.04727| lr: 0.0| temp: 1.99543 | loss: 1.17328| constrast_loss: 4.60963| div_loss: 0.83502| %_mask_idx: 0.37798| ppl: 105.58481| %_neg_is_pos: 0.0155| lr: 0.0| temp: 1.99542 | loss: 1.17403| constrast_loss: 4.61258| div_loss: 0.83531| %_mask_idx: 0.37923| ppl: 105.40019| %_neg_is_pos: 0.01462| lr: 0.0| temp: 1.99542 | loss: 1.1732| constrast_loss: 4.60861| div_loss: 0.84186| %_mask_idx: 0.39803| ppl: 101.20862| %_neg_is_pos: 0.02554| lr: 0.0| temp: 1.9954 | loss: 1.17299| constrast_loss: 4.60825| div_loss: 0.83688| %_mask_idx: 0.38596| ppl: 104.39439| %_neg_is_pos: 0.01765| lr: 0.0| temp: 1.9954 | loss: 1.17303| constrast_loss: 4.60882| div_loss: 0.8331| %_mask_idx: 0.35526| ppl: 106.81596| %_neg_is_pos: 0.02101| lr: 0.0| temp: 1.99539 | loss: 1.1715| constrast_loss: 4.6014| div_loss: 0.84582| %_mask_idx: 0.39004| ppl: 98.67596| %_neg_is_pos: 0.01862| lr: 0.0| temp: 1.99539 | loss: 1.17364| constrast_loss: 4.6108| div_loss: 0.83771| %_mask_idx: 0.40602| ppl: 103.86487| %_neg_is_pos: 0.01143| lr: 0.0| temp: 1.99538 | loss: 1.17154| constrast_loss: 4.60251| div_loss: 0.83643| %_mask_idx: 0.36451| ppl: 104.68288| %_neg_is_pos: 0.03323| lr: 0.0| temp: 1.99538 | loss: 1.173| constrast_loss: 4.60734| div_loss: 0.84657| %_mask_idx: 0.37516| ppl: 98.1945| %_neg_is_pos: 0.02231| lr: 0.0| temp: 1.99537 | loss: 1.17293| constrast_loss: 4.60842| div_loss: 0.83303| %_mask_idx: 0.38064| ppl: 106.86074| %_neg_is_pos: 0.02136| lr: 0.0| temp: 1.99537 | loss: 1.17133| constrast_loss: 4.60042| div_loss: 0.8491| %_mask_idx: 0.39333| ppl: 96.579| %_neg_is_pos: 0.04202| lr: 0.0| temp: 1.99535 | loss: 1.17229| constrast_loss: 4.60514| div_loss: 0.84016| %_mask_idx: 0.3714| ppl: 102.29668| %_neg_is_pos: 0.02951| lr: 0.0| temp: 1.99535 | loss: 1.17367| constrast_loss: 4.61156| div_loss: 0.83127| %_mask_idx: 0.46241| ppl: 107.98967| %_neg_is_pos: 0.00849| lr: 0.0| temp: 1.99534 | loss: 1.17344| constrast_loss: 4.61002| div_loss: 0.83758| %_mask_idx: 0.38001| ppl: 103.951| %_neg_is_pos: 0.02179| lr: 0.0| temp: 1.99534 | loss: 1.17375| constrast_loss: 4.6112| div_loss: 0.83787| %_mask_idx: 0.45128| ppl: 103.76183| %_neg_is_pos: 0.01367| lr: 0.0| temp: 1.99532 | loss: 1.17258| constrast_loss: 4.60542| div_loss: 0.84902| %_mask_idx: 0.3963| ppl: 96.62972| %_neg_is_pos: 0.02405| lr: 0.0| temp: 1.99532 | loss: 1.1742| constrast_loss: 4.61396| div_loss: 0.82847| %_mask_idx: 0.41635| ppl: 109.78009| %_neg_is_pos: 0.01143| lr: 0.0| temp: 1.99531 | loss: 1.1737| constrast_loss: 4.61184| div_loss: 0.82977| %_mask_idx: 0.43233| ppl: 108.94964| %_neg_is_pos: 0.01048| lr: 0.0| temp: 1.99531 | loss: 1.17399| constrast_loss: 4.61233| div_loss: 0.83644| %_mask_idx: 0.36685| ppl: 104.67923| %_neg_is_pos: 0.0173| lr: 0.0| temp: 1.9953 | loss: 1.17192| constrast_loss: 4.60462| div_loss: 0.83056| %_mask_idx: 0.3808| ppl: 108.44293| %_neg_is_pos: 0.02202| lr: 0.0| temp: 1.9953 | loss: 1.17117| constrast_loss: 4.59929| div_loss: 0.85388| %_mask_idx: 0.38816| ppl: 93.51698| %_neg_is_pos: 0.03393| lr: 0.0| temp: 1.99529 | loss: 1.17321| constrast_loss: 4.6096| div_loss: 0.8324| %_mask_idx: 0.41902| ppl: 107.26263| %_neg_is_pos: 0.01761| lr: 0.0| temp: 1.99529 | loss: 1.17276| constrast_loss: 4.6074| div_loss: 0.83641| %_mask_idx: 0.35996| ppl: 104.69887| %_neg_is_pos: 0.02351| lr: 0.0| temp: 1.99527 | loss: 1.17275| constrast_loss: 4.60681| div_loss: 0.84194| %_mask_idx: 0.40915| ppl: 101.16063| %_neg_is_pos: 0.02886| lr: 0.0| temp: 1.99527 | loss: 1.17242| constrast_loss: 4.60573| div_loss: 0.83954| %_mask_idx: 0.35981| ppl: 102.69293| %_neg_is_pos: 0.03251| lr: 0.0| temp: 1.99526 | loss: 1.1728| constrast_loss: 4.6073| div_loss: 0.839| %_mask_idx: 0.31579| ppl: 103.04232| %_neg_is_pos: 0.03873| lr: 0.0| temp: 1.99526 | loss: 1.17219| constrast_loss: 4.60485| div_loss: 0.83917| %_mask_idx: 0.39552| ppl: 102.93028| %_neg_is_pos: 0.02727| lr: 0.0| temp: 1.99525 | loss: 1.17223| constrast_loss: 4.60564| div_loss: 0.83281| %_mask_idx: 0.38628| ppl: 107.00418| %_neg_is_pos: 0.02387| lr: 0.0| temp: 1.99525 | loss: 1.17214| constrast_loss: 4.60562| div_loss: 0.82926| %_mask_idx: 0.35949| ppl: 109.27148| %_neg_is_pos: 0.02954| lr: 0.0| temp: 1.99524 | loss: 1.1719| constrast_loss: 4.60271| div_loss: 0.84892| %_mask_idx: 0.35354| ppl: 96.68909| %_neg_is_pos: 0.02975| lr: 0.0| temp: 1.99524 | loss: 1.17296| constrast_loss: 4.60944| div_loss: 0.82383| %_mask_idx: 0.35981| ppl: 112.74672| %_neg_is_pos: 0.01889| lr: 0.0| temp: 1.99522 | loss: 1.17346| constrast_loss: 4.61019| div_loss: 0.83645| %_mask_idx: 0.40883| ppl: 104.67516| %_neg_is_pos: 0.02129| lr: 0.0| temp: 1.99522 | loss: 1.17385| constrast_loss: 4.61197| div_loss: 0.83432| %_mask_idx: 0.40962| ppl: 106.03273| %_neg_is_pos: 0.01371| lr: 0.0| temp: 1.99521 | loss: 1.17228| constrast_loss: 4.60563| div_loss: 0.83493| %_mask_idx: 0.37343| ppl: 105.6468| %_neg_is_pos: 0.02644| lr: 0.0| temp: 1.99521 | loss: 1.17273| constrast_loss: 4.6068| div_loss: 0.84105| %_mask_idx: 0.34837| ppl: 101.73078| %_neg_is_pos: 0.02458| lr: 0.0| temp: 1.9952 | loss: 1.17304| constrast_loss: 4.60731| div_loss: 0.84847| %_mask_idx: 0.37453| ppl: 96.98189| %_neg_is_pos: 0.01668| lr: 0.0| temp: 1.9952 | loss: 1.17093| constrast_loss: 4.5987| div_loss: 0.85034| %_mask_idx: 0.31234| ppl: 95.78221| %_neg_is_pos: 0.04969| lr: 0.0| temp: 1.99519 | loss: 1.17333| constrast_loss: 4.60667| div_loss: 0.86634| %_mask_idx: 0.34023| ppl: 85.54257| %_neg_is_pos: 0.01963| lr: 0.0| temp: 1.99519 | loss: 1.17357| constrast_loss: 4.61072| div_loss: 0.83557| %_mask_idx: 0.34085| ppl: 105.23551| %_neg_is_pos: 0.01538| lr: 0.0| temp: 1.99517 | loss: 1.17396| constrast_loss: 4.61284| div_loss: 0.83001| %_mask_idx: 0.37719| ppl: 108.79247| %_neg_is_pos: 0.01851| lr: 0.0| temp: 1.99517 | loss: 1.17393| constrast_loss: 4.61174| div_loss: 0.83966| %_mask_idx: 0.40382| ppl: 102.6196| %_neg_is_pos: 0.01264| lr: 0.0| temp: 1.99516 | loss: 1.17349| constrast_loss: 4.61061| div_loss: 0.83334| %_mask_idx: 0.35855| ppl: 106.66426| %_neg_is_pos: 0.01863| lr: 0.0| temp: 1.99516 [2021-09-01 22:20:14,458] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 131072.0, reducing to 65536.0 [2021-09-01 22:20:14,458] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 131072.0, reducing to 65536.0 | loss: 1.17339| constrast_loss: 4.60988| div_loss: 0.83692| %_mask_idx: 0.39646| ppl: 104.37006| %_neg_is_pos: 0.01835| lr: 0.0| temp: 1.99514 | loss: 1.17362| constrast_loss: 4.60972| div_loss: 0.8476| %_mask_idx: 0.38346| ppl: 97.53372| %_neg_is_pos: 0.01602| lr: 0.0| temp: 1.99514 | loss: 1.1729| constrast_loss: 4.60855| div_loss: 0.83064| %_mask_idx: 0.38377| ppl: 108.39294| %_neg_is_pos: 0.02372| lr: 0.0| temp: 1.99513 | loss: 1.17369| constrast_loss: 4.61233| div_loss: 0.82443| %_mask_idx: 0.43202| ppl: 112.36232| %_neg_is_pos: 0.00598| lr: 0.0| temp: 1.99513 | loss: 1.17169| constrast_loss: 4.60144| div_loss: 0.85342| %_mask_idx: 0.36607| ppl: 93.8134| %_neg_is_pos: 0.03322| lr: 0.0| temp: 1.99512 | loss: 1.17306| constrast_loss: 4.60798| div_loss: 0.84259| %_mask_idx: 0.39348| ppl: 100.74487| %_neg_is_pos: 0.01917| lr: 0.0| temp: 1.99512 | loss: 1.17359| constrast_loss: 4.61119| div_loss: 0.83174| %_mask_idx: 0.34727| ppl: 107.68352| %_neg_is_pos: 0.01615| lr: 0.0| temp: 1.99511 | loss: 1.17263| constrast_loss: 4.60779| div_loss: 0.82745| %_mask_idx: 0.36795| ppl: 110.4325| %_neg_is_pos: 0.02121| lr: 0.0| temp: 1.99511 | loss: 1.1725| constrast_loss: 4.60595| div_loss: 0.84052| %_mask_idx: 0.35213| ppl: 102.06575| %_neg_is_pos: 0.03859| lr: 0.0| temp: 1.99509 | loss: 1.17275| constrast_loss: 4.6075| div_loss: 0.83496| %_mask_idx: 0.39897| ppl: 105.62267| %_neg_is_pos: 0.02527| lr: 0.0| temp: 1.99509 | loss: 1.17164| constrast_loss: 4.60177| div_loss: 0.84802| %_mask_idx: 0.33584| ppl: 97.26968| %_neg_is_pos: 0.02756| lr: 0.0| temp: 1.99508 | loss: 1.17328| constrast_loss: 4.61079| div_loss: 0.82345| %_mask_idx: 0.42231| ppl: 112.99452| %_neg_is_pos: 0.02067| lr: 0.0| temp: 1.99508 | loss: 1.17379| constrast_loss: 4.61337| div_loss: 0.81786| %_mask_idx: 0.42199| ppl: 116.57175| %_neg_is_pos: 0.00898| lr: 0.0| temp: 1.99507 | loss: 1.17259| constrast_loss: 4.60832| div_loss: 0.82059| %_mask_idx: 0.39536| ppl: 114.82531| %_neg_is_pos: 0.02301| lr: 0.0| temp: 1.99507 | loss: 1.17376| constrast_loss: 4.61225| div_loss: 0.82806| %_mask_idx: 0.36638| ppl: 110.04134| %_neg_is_pos: 0.01773| lr: 0.0| temp: 1.99506 | loss: 1.17296| constrast_loss: 4.60861| div_loss: 0.83222| %_mask_idx: 0.34884| ppl: 107.37704| %_neg_is_pos: 0.02634| lr: 0.0| temp: 1.99506 | loss: 1.17337| constrast_loss: 4.61103| div_loss: 0.82432| %_mask_idx: 0.41291| ppl: 112.43228| %_neg_is_pos: 0.00848| lr: 0.0| temp: 1.99504| loss: 1.17223| constrast_loss: 4.6049| div_loss: 0.8403| %_mask_idx: 0.35401| ppl: 102.21048| %_neg_is_pos: 0.02979| lr: 0.0| temp: 1.99504 | loss: 1.17343| constrast_loss: 4.61094| div_loss: 0.82764| %_mask_idx: 0.39677| ppl: 110.30914| %_neg_is_pos: 0.00749| lr: 0.0| temp: 1.99503 | loss: 1.17405| constrast_loss: 4.61313| div_loss: 0.8307| %_mask_idx: 0.42419| ppl: 108.34998| %_neg_is_pos: 0.00612| lr: 0.0| temp: 1.99503 | loss: 1.17309| constrast_loss: 4.60907| div_loss: 0.83307| %_mask_idx: 0.41056| ppl: 106.83762| %_neg_is_pos: 0.01554| lr: 0.0| temp: 1.99502 | loss: 1.17342| constrast_loss: 4.61068| div_loss: 0.83014| %_mask_idx: 0.4068| ppl: 108.70798| %_neg_is_pos: 0.01197| lr: 0.0| temp: 1.99502 | loss: 1.17168| constrast_loss: 4.60316| div_loss: 0.83582| %_mask_idx: 0.34633| ppl: 105.07773| %_neg_is_pos: 0.03418| lr: 0.0| temp: 1.99501 | loss: 1.17238| constrast_loss: 4.60812| div_loss: 0.81417| %_mask_idx: 0.38001| ppl: 118.93082| %_neg_is_pos: 0.02561| lr: 0.0| temp: 1.99501 | loss: 1.17264| constrast_loss: 4.60792| div_loss: 0.82635| %_mask_idx: 0.47259| ppl: 111.13655| %_neg_is_pos: 0.01436| lr: 0.0| temp: 1.99499 | loss: 1.17292| constrast_loss: 4.60955| div_loss: 0.82143| %_mask_idx: 0.36826| ppl: 114.28198| %_neg_is_pos: 0.01143| lr: 0.0| temp: 1.99499 | loss: 1.17376| constrast_loss: 4.61341| div_loss: 0.81619| %_mask_idx: 0.42528| ppl: 117.63704| %_neg_is_pos: 0.00696| lr: 0.0| temp: 1.99498 | loss: 1.17359| constrast_loss: 4.61341| div_loss: 0.80943| %_mask_idx: 0.38048| ppl: 121.96202| %_neg_is_pos: 0.01091| lr: 0.0| temp: 1.99498 | loss: 1.17402| constrast_loss: 4.61306| div_loss: 0.83012| %_mask_idx: 0.40492| ppl: 108.72453| %_neg_is_pos: 0.00708| lr: 0.0| temp: 1.99496 | loss: 1.17301| constrast_loss: 4.60905| div_loss: 0.82988| %_mask_idx: 0.3833| ppl: 108.877| %_neg_is_pos: 0.01224| lr: 0.0| temp: 1.99496 | loss: 1.1732| constrast_loss: 4.61117| div_loss: 0.81654| %_mask_idx: 0.40022| ppl: 117.41586| %_neg_is_pos: 0.00978| lr: 0.0| temp: 1.99495 | loss: 1.17364| constrast_loss: 4.61288| div_loss: 0.81689| %_mask_idx: 0.42732| ppl: 117.18794| %_neg_is_pos: 0.00788| lr: 0.0| temp: 1.99495 | loss: 1.1728| constrast_loss: 4.60799| div_loss: 0.83193| %_mask_idx: 0.37688| ppl: 107.56303| %_neg_is_pos: 0.01462| lr: 0.0| temp: 1.99494 | loss: 1.1729| constrast_loss: 4.60977| div_loss: 0.81833| %_mask_idx: 0.41228| ppl: 116.26599| %_neg_is_pos: 0.01156| lr: 0.0| temp: 1.99494 | loss: 1.1742| constrast_loss: 4.61528| div_loss: 0.81539| %_mask_idx: 0.42685| ppl: 118.15167| %_neg_is_pos: 0.01366| lr: 0.0| temp: 1.99493 | loss: 1.17288| constrast_loss: 4.60956| div_loss: 0.81946| %_mask_idx: 0.35558| ppl: 115.5424| %_neg_is_pos: 0.01679| lr: 0.0| temp: 1.99493 | loss: 1.17295| constrast_loss: 4.60969| div_loss: 0.82125| %_mask_idx: 0.39348| ppl: 114.40001| %_neg_is_pos: 0.0135| lr: 0.0| temp: 1.99491 | loss: 1.1725| constrast_loss: 4.6073| div_loss: 0.82684| %_mask_idx: 0.39066| ppl: 110.81963| %_neg_is_pos: 0.01426| lr: 0.0| temp: 1.99491 | loss: 1.17244| constrast_loss: 4.60664| div_loss: 0.83133| %_mask_idx: 0.36654| ppl: 107.94779| %_neg_is_pos: 0.01951| lr: 0.0| temp: 1.9949 | loss: 1.17323| constrast_loss: 4.61094| div_loss: 0.8198| %_mask_idx: 0.35573| ppl: 115.32529| %_neg_is_pos: 0.01555| lr: 0.0| temp: 1.9949 | loss: 1.17265| constrast_loss: 4.60888| div_loss: 0.81721| %_mask_idx: 0.36764| ppl: 116.98454| %_neg_is_pos: 0.01417| lr: 0.0| temp: 1.99489 | loss: 1.17203| constrast_loss: 4.60625| div_loss: 0.81858| %_mask_idx: 0.36513| ppl: 116.10846| %_neg_is_pos: 0.0194| lr: 0.0| temp: 1.99489 | loss: 1.17385| constrast_loss: 4.6128| div_loss: 0.82613| %_mask_idx: 0.37907| ppl: 111.27826| %_neg_is_pos: 0.01823| lr: 0.0| temp: 1.99488 | loss: 1.17243| constrast_loss: 4.60734| div_loss: 0.82367| %_mask_idx: 0.32409| ppl: 112.8493| %_neg_is_pos: 0.01683| lr: 0.0| temp: 1.99488 | loss: 1.17337| constrast_loss: 4.61208| div_loss: 0.8139| %_mask_idx: 0.43625| ppl: 119.1028| %_neg_is_pos: 0.00421| lr: 0.0| temp: 1.99486 | loss: 1.17398| constrast_loss: 4.61375| div_loss: 0.82168| %_mask_idx: 0.40852| ppl: 114.12546| %_neg_is_pos: 0.00549| lr: 0.0| temp: 1.99486 | loss: 1.17368| constrast_loss: 4.61284| div_loss: 0.81897| %_mask_idx: 0.37876| ppl: 115.85638| %_neg_is_pos: 0.00651| lr: 0.0| temp: 1.99485 | loss: 1.17299| constrast_loss: 4.61115| div_loss: 0.80796| %_mask_idx: 0.40883| ppl: 122.90398| %_neg_is_pos: 0.01258| lr: 0.0| temp: 1.99485 | loss: 1.17314| constrast_loss: 4.61183| div_loss: 0.80716| %_mask_idx: 0.3985| ppl: 123.41994| %_neg_is_pos: 0.00963| lr: 0.0| temp: 1.99484 | loss: 1.17239| constrast_loss: 4.60672| div_loss: 0.82824| %_mask_idx: 0.35025| ppl: 109.92535| %_neg_is_pos: 0.01935| lr: 0.0| temp: 1.99484 | loss: 1.17284| constrast_loss: 4.60904| div_loss: 0.82302| %_mask_idx: 0.41275| ppl: 113.26919| %_neg_is_pos: 0.01775| lr: 0.0| temp: 1.99483 | loss: 1.17162| constrast_loss: 4.605| div_loss: 0.81486| %_mask_idx: 0.37484| ppl: 118.49032| %_neg_is_pos: 0.02473| lr: 0.0| temp: 1.99483 | loss: 1.17285| constrast_loss: 4.60801| div_loss: 0.834| %_mask_idx: 0.38596| ppl: 106.2398| %_neg_is_pos: 0.01119| lr: 0.0| temp: 1.99481 | loss: 1.17197| constrast_loss: 4.60591| div_loss: 0.81978| %_mask_idx: 0.40742| ppl: 115.33971| %_neg_is_pos: 0.01586| lr: 0.0| temp: 1.99481 | loss: 1.17331| constrast_loss: 4.61134| div_loss: 0.81906| %_mask_idx: 0.40147| ppl: 115.80348| %_neg_is_pos: 0.01164| lr: 0.0| temp: 1.9948 | loss: 1.17425| constrast_loss: 4.61485| div_loss: 0.82168| %_mask_idx: 0.4093| ppl: 114.12483| %_neg_is_pos: 0.01556| lr: 0.0| temp: 1.9948 | loss: 1.17342| constrast_loss: 4.61142| div_loss: 0.8224| %_mask_idx: 0.37657| ppl: 113.66348| %_neg_is_pos: 0.01897| lr: 0.0| temp: 1.99478 | loss: 1.17329| constrast_loss: 4.61165| div_loss: 0.81512| %_mask_idx: 0.42826| ppl: 118.32315| %_neg_is_pos: 0.00969| lr: 0.0| temp: 1.99478 | loss: 1.1737| constrast_loss: 4.61148| div_loss: 0.83307| %_mask_idx: 0.35448| ppl: 106.83784| %_neg_is_pos: 0.01686| lr: 0.0| temp: 1.99477 | loss: 1.17435| constrast_loss: 4.61539| div_loss: 0.81989| %_mask_idx: 0.37735| ppl: 115.27167| %_neg_is_pos: 0.01182| lr: 0.0| temp: 1.99477 | loss: 1.1719| constrast_loss: 4.60422| div_loss: 0.83389| %_mask_idx: 0.34038| ppl: 106.3087| %_neg_is_pos: 0.02656| lr: 0.0| temp: 1.99476 | loss: 1.17284| constrast_loss: 4.60925| div_loss: 0.82111| %_mask_idx: 0.40038| ppl: 114.48898| %_neg_is_pos: 0.0109| lr: 0.0| temp: 1.99476 | loss: 1.17325| constrast_loss: 4.61166| div_loss: 0.81334| %_mask_idx: 0.3844| ppl: 119.46198| %_neg_is_pos: 0.00725| lr: 0.0| temp: 1.99475 | loss: 1.17245| constrast_loss: 4.60681| div_loss: 0.8297| %_mask_idx: 0.39317| ppl: 108.99372| %_neg_is_pos: 0.01893| lr: 0.0| temp: 1.99475 | loss: 1.17313| constrast_loss: 4.61071| div_loss: 0.81819| %_mask_idx: 0.38456| ppl: 116.35582| %_neg_is_pos: 0.01061| lr: 0.0| temp: 1.99473 | loss: 1.17341| constrast_loss: 4.61192| div_loss: 0.81708| %_mask_idx: 0.3891| ppl: 117.07169| %_neg_is_pos: 0.01307| lr: 0.0| temp: 1.99473 | loss: 1.17303| constrast_loss: 4.60921| div_loss: 0.82904| %_mask_idx: 0.39082| ppl: 109.41528| %_neg_is_pos: 0.01176| lr: 0.0| temp: 1.99472 | loss: 1.1736| constrast_loss: 4.61172| div_loss: 0.82695| %_mask_idx: 0.33521| ppl: 110.75502| %_neg_is_pos: 0.0153| lr: 0.0| temp: 1.99472 | loss: 1.17316| constrast_loss: 4.61145| div_loss: 0.81177| %_mask_idx: 0.40461| ppl: 120.46867| %_neg_is_pos: 0.01649| lr: 0.0| temp: 1.99471 | loss: 1.17361| constrast_loss: 4.61215| div_loss: 0.82292| %_mask_idx: 0.40742| ppl: 113.33112| %_neg_is_pos: 0.01104| lr: 0.0| temp: 1.99471 | loss: 1.17284| constrast_loss: 4.6096| div_loss: 0.81752| %_mask_idx: 0.36482| ppl: 116.79021| %_neg_is_pos: 0.02081| lr: 0.0| temp: 1.9947 | loss: 1.17249| constrast_loss: 4.60798| div_loss: 0.81996| %_mask_idx: 0.31234| ppl: 115.22484| %_neg_is_pos: 0.01827| lr: 0.0| temp: 1.9947 | loss: 1.17313| constrast_loss: 4.61009| div_loss: 0.82442| %_mask_idx: 0.41526| ppl: 112.36806| %_neg_is_pos: 0.00736| lr: 0.0| temp: 1.99468 | loss: 1.17363| constrast_loss: 4.61205| div_loss: 0.82461| %_mask_idx: 0.36936| ppl: 112.24924| %_neg_is_pos: 0.01753| lr: 0.0| temp: 1.99468 | loss: 1.17311| constrast_loss: 4.61142| div_loss: 0.81024| %_mask_idx: 0.41964| ppl: 121.44897| %_neg_is_pos: 0.00826| lr: 0.0| temp: 1.99467 | loss: 1.17351| constrast_loss: 4.61315| div_loss: 0.80872| %_mask_idx: 0.43249| ppl: 122.41959| %_neg_is_pos: 0.00556| lr: 0.0| temp: 1.99467 | loss: 1.17328| constrast_loss: 4.61137| div_loss: 0.81737| %_mask_idx: 0.36544| ppl: 116.88446| %_neg_is_pos: 0.01065| lr: 0.0| temp: 1.99466 | loss: 1.17305| constrast_loss: 4.60957| div_loss: 0.82637| %_mask_idx: 0.42105| ppl: 111.12131| %_neg_is_pos: 0.01306| lr: 0.0| temp: 1.99466 | loss: 1.17324| constrast_loss: 4.60902| div_loss: 0.83946| %_mask_idx: 0.41526| ppl: 102.74832| %_neg_is_pos: 0.00873| lr: 0.0| temp: 1.99465 | loss: 1.17185| constrast_loss: 4.60442| div_loss: 0.82966| %_mask_idx: 0.38581| ppl: 109.01848| %_neg_is_pos: 0.01777| lr: 0.0| temp: 1.99465 | loss: 1.17256| constrast_loss: 4.60787| div_loss: 0.82366| %_mask_idx: 0.37484| ppl: 112.85683| %_neg_is_pos: 0.02248| lr: 0.0| temp: 1.99463 | loss: 1.17343| constrast_loss: 4.61297| div_loss: 0.80765| %_mask_idx: 0.38346| ppl: 123.10704| %_neg_is_pos: 0.00649| lr: 0.0| temp: 1.99463 | loss: 1.17277| constrast_loss: 4.60906| div_loss: 0.82045| %_mask_idx: 0.39865| ppl: 114.91338| %_neg_is_pos: 0.01388| lr: 0.0| temp: 1.99462 | loss: 1.17333| constrast_loss: 4.61032| div_loss: 0.83017| %_mask_idx: 0.38941| ppl: 108.69312| %_neg_is_pos: 0.01819| lr: 0.0| temp: 1.99462 | loss: 1.17352| constrast_loss: 4.61177| div_loss: 0.82297| %_mask_idx: 0.36466| ppl: 113.30017| %_neg_is_pos: 0.02516| lr: 0.0| temp: 1.9946 | loss: 1.17331| constrast_loss: 4.61089| div_loss: 0.82356| %_mask_idx: 0.38612| ppl: 112.92336| %_neg_is_pos: 0.01414| lr: 0.0| temp: 1.9946 | loss: 1.17351| constrast_loss: 4.61202| div_loss: 0.82007| %_mask_idx: 0.38189| ppl: 115.15575| %_neg_is_pos: 0.01182| lr: 0.0| temp: 1.99459 | loss: 1.17325| constrast_loss: 4.61072| div_loss: 0.82295| %_mask_idx: 0.4115| ppl: 113.30945| %_neg_is_pos: 0.01216| lr: 0.0| temp: 1.99459 | loss: 1.17309| constrast_loss: 4.61083| div_loss: 0.81521| %_mask_idx: 0.40946| ppl: 118.26865| %_neg_is_pos: 0.01066| lr: 0.0| temp: 1.99458 | loss: 1.17377| constrast_loss: 4.61303| div_loss: 0.82049| %_mask_idx: 0.43217| ppl: 114.88612| %_neg_is_pos: 0.00479| lr: 0.0| temp: 1.99458 | loss: 1.17347| constrast_loss: 4.61281| div_loss: 0.8108| %_mask_idx: 0.43233| ppl: 121.08566| %_neg_is_pos: 0.0149| lr: 0.0| temp: 1.99457 | loss: 1.17318| constrast_loss: 4.61133| div_loss: 0.81375| %_mask_idx: 0.38252| ppl: 119.19829| %_neg_is_pos: 0.01649| lr: 0.0| temp: 1.99457 | loss: 1.17138| constrast_loss: 4.6003| div_loss: 0.85237| %_mask_idx: 0.37719| ppl: 94.48322| %_neg_is_pos: 0.03247| lr: 0.0| temp: 1.99455 | loss: 1.17222| constrast_loss: 4.6072| div_loss: 0.81669| %_mask_idx: 0.38033| ppl: 117.3213| %_neg_is_pos: 0.02248| lr: 0.0| temp: 1.99455 | loss: 1.17379| constrast_loss: 4.61361| div_loss: 0.81566| %_mask_idx: 0.45238| ppl: 117.98076| %_neg_is_pos: 0.00565| lr: 0.0| temp: 1.99454 | loss: 1.17323| constrast_loss: 4.61164| div_loss: 0.81264| %_mask_idx: 0.37061| ppl: 119.90964| %_neg_is_pos: 0.00683| lr: 0.0| temp: 1.99454 | loss: 1.17414| constrast_loss: 4.61503| div_loss: 0.81518| %_mask_idx: 0.37892| ppl: 118.28282| %_neg_is_pos: 0.00555| lr: 0.0| temp: 1.99453 | loss: 1.17366| constrast_loss: 4.61273| div_loss: 0.81927| %_mask_idx: 0.42262| ppl: 115.66787| %_neg_is_pos: 0.00977| lr: 0.0| temp: 1.99453 | loss: 1.17349| constrast_loss: 4.60992| div_loss: 0.84026| %_mask_idx: 0.32675| ppl: 102.23659| %_neg_is_pos: 0.01505| lr: 0.0| temp: 1.99452 | loss: 1.17354| constrast_loss: 4.61022| div_loss: 0.83936| %_mask_idx: 0.41823| ppl: 102.80874| %_neg_is_pos: 0.00762| lr: 0.0| temp: 1.99452 | loss: 1.17257| constrast_loss: 4.60757| div_loss: 0.82709| %_mask_idx: 0.37876| ppl: 110.66457| %_neg_is_pos: 0.02134| lr: 0.0| temp: 1.9945 | loss: 1.17299| constrast_loss: 4.60971| div_loss: 0.82269| %_mask_idx: 0.41855| ppl: 113.48102| %_neg_is_pos: 0.01342| lr: 0.0| temp: 1.9945 | loss: 1.17363| constrast_loss: 4.6121| div_loss: 0.82412| %_mask_idx: 0.32848| ppl: 112.56282| %_neg_is_pos: 0.01176| lr: 0.0| temp: 1.99449 | loss: 1.17246| constrast_loss: 4.60726| div_loss: 0.82589| %_mask_idx: 0.35526| ppl: 111.4317| %_neg_is_pos: 0.02121| lr: 0.0| temp: 1.99449 | loss: 1.17176| constrast_loss: 4.60354| div_loss: 0.8348| %_mask_idx: 0.38769| ppl: 105.72611| %_neg_is_pos: 0.0179| lr: 0.0| temp: 1.99448 | loss: 1.17344| constrast_loss: 4.61157| div_loss: 0.82207| %_mask_idx: 0.41306| ppl: 113.87236| %_neg_is_pos: 0.01456| lr: 0.0| temp: 1.99448 | loss: 1.17252| constrast_loss: 4.60783| div_loss: 0.8223| %_mask_idx: 0.37986| ppl: 113.72765| %_neg_is_pos: 0.01718| lr: 0.0| temp: 1.99447 | loss: 1.17304| constrast_loss: 4.60881| div_loss: 0.83356| %_mask_idx: 0.41056| ppl: 106.52071| %_neg_is_pos: 0.01533| lr: 0.0| temp: 1.99447 | loss: 1.17346| constrast_loss: 4.61116| div_loss: 0.8268| %_mask_idx: 0.40899| ppl: 110.84592| %_neg_is_pos: 0.00705| lr: 0.0| temp: 1.99445 | loss: 1.17396| constrast_loss: 4.61317| div_loss: 0.82677| %_mask_idx: 0.36482| ppl: 110.8646| %_neg_is_pos: 0.01745| lr: 0.0| temp: 1.99445 | loss: 1.17291| constrast_loss: 4.60863| div_loss: 0.82997| %_mask_idx: 0.3609| ppl: 108.8175| %_neg_is_pos: 0.01945| lr: 0.0| temp: 1.99444 | loss: 1.17292| constrast_loss: 4.60913| div_loss: 0.82563| %_mask_idx: 0.36325| ppl: 111.59637| %_neg_is_pos: 0.01556| lr: 0.0| temp: 1.99444 [2021-09-01 22:29:29,098] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 65536.0, reducing to 32768.0 [2021-09-01 22:29:29,098] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 65536.0, reducing to 32768.0 | loss: 1.17301| constrast_loss: 4.60991| div_loss: 0.82138| %_mask_idx: 0.41259| ppl: 114.31772| %_neg_is_pos: 0.01727| lr: 0.0| temp: 1.99442 | loss: 1.17218| constrast_loss: 4.60611| div_loss: 0.82604| %_mask_idx: 0.33506| ppl: 111.33641| %_neg_is_pos: 0.03191| lr: 0.0| temp: 1.99442 | loss: 1.17316| constrast_loss: 4.61254| div_loss: 0.80097| %_mask_idx: 0.42575| ppl: 127.37837| %_neg_is_pos: 0.00663| lr: 0.0| temp: 1.99441 | loss: 1.17356| constrast_loss: 4.61138| div_loss: 0.8285| %_mask_idx: 0.40147| ppl: 109.76111| %_neg_is_pos: 0.0091| lr: 0.0| temp: 1.99441 | loss: 1.17336| constrast_loss: 4.61248| div_loss: 0.80957| %_mask_idx: 0.36576| ppl: 121.87256| %_neg_is_pos: 0.00813| lr: 0.0| temp: 1.9944 | loss: 1.17216| constrast_loss: 4.607| div_loss: 0.81635| %_mask_idx: 0.40602| ppl: 117.53893| %_neg_is_pos: 0.02549| lr: 0.0| temp: 1.9944 | loss: 1.17191| constrast_loss: 4.60496| div_loss: 0.82663| %_mask_idx: 0.36936| ppl: 110.95541| %_neg_is_pos: 0.02368| lr: 0.0| temp: 1.99439 | loss: 1.17347| constrast_loss: 4.61241| div_loss: 0.81454| %_mask_idx: 0.40335| ppl: 118.69562| %_neg_is_pos: 0.00818| lr: 0.0| temp: 1.99439 | loss: 1.17351| constrast_loss: 4.61057| div_loss: 0.83453| %_mask_idx: 0.37046| ppl: 105.90154| %_neg_is_pos: 0.00928| lr: 0.0| temp: 1.99437 | loss: 1.17344| constrast_loss: 4.61242| div_loss: 0.81357| %_mask_idx: 0.3349| ppl: 119.31528| %_neg_is_pos: 0.01276| lr: 0.0| temp: 1.99437 | loss: 1.17292| constrast_loss: 4.60952| div_loss: 0.82178| %_mask_idx: 0.42356| ppl: 114.05769| %_neg_is_pos: 0.0061| lr: 0.0| temp: 1.99436 | loss: 1.17382| constrast_loss: 4.61431| div_loss: 0.80978| %_mask_idx: 0.39912| ppl: 121.74312| %_neg_is_pos: 0.0075| lr: 0.0| temp: 1.99436 | loss: 1.17333| constrast_loss: 4.6125| div_loss: 0.80829| %_mask_idx: 0.34367| ppl: 122.69366| %_neg_is_pos: 0.00647| lr: 0.0| temp: 1.99435 | loss: 1.17339| constrast_loss: 4.61348| div_loss: 0.80075| %_mask_idx: 0.43202| ppl: 127.51851| %_neg_is_pos: 0.00674| lr: 0.0| temp: 1.99435 | loss: 1.17288| constrast_loss: 4.60985| div_loss: 0.81671| %_mask_idx: 0.39286| ppl: 117.3087| %_neg_is_pos: 0.01469| lr: 0.0| temp: 1.99434 | loss: 1.17289| constrast_loss: 4.60956| div_loss: 0.81981| %_mask_idx: 0.34962| ppl: 115.32402| %_neg_is_pos: 0.01379| lr: 0.0| temp: 1.99434 | loss: 1.17297| constrast_loss: 4.60961| div_loss: 0.82283| %_mask_idx: 0.38847| ppl: 113.38948| %_neg_is_pos: 0.01382| lr: 0.0| temp: 1.99432 | loss: 1.17299| constrast_loss: 4.61115| div_loss: 0.80817| %_mask_idx: 0.38549| ppl: 122.76955| %_neg_is_pos: 0.00776| lr: 0.0| temp: 1.99432 | loss: 1.17367| constrast_loss: 4.61332| div_loss: 0.81354| %_mask_idx: 0.38722| ppl: 119.33721| %_neg_is_pos: 0.0079| lr: 0.0| temp: 1.99431 | loss: 1.17355| constrast_loss: 4.6125| div_loss: 0.81712| %_mask_idx: 0.37155| ppl: 117.04227| %_neg_is_pos: 0.00757| lr: 0.0| temp: 1.99431 | loss: 1.1729| constrast_loss: 4.61002| div_loss: 0.81559| %_mask_idx: 0.33443| ppl: 118.02385| %_neg_is_pos: 0.01085| lr: 0.0| temp: 1.9943 | loss: 1.17252| constrast_loss: 4.60814| div_loss: 0.8193| %_mask_idx: 0.3537| ppl: 115.64545| %_neg_is_pos: 0.01414| lr: 0.0| temp: 1.9943 | loss: 1.17281| constrast_loss: 4.6103| div_loss: 0.80927| %_mask_idx: 0.38001| ppl: 122.0676| %_neg_is_pos: 0.00687| lr: 0.0| temp: 1.99429 | loss: 1.17284| constrast_loss: 4.61121| div_loss: 0.80161| %_mask_idx: 0.37798| ppl: 126.97092| %_neg_is_pos: 0.0093| lr: 0.0| temp: 1.99429 | loss: 1.1731| constrast_loss: 4.61318| div_loss: 0.79205| %_mask_idx: 0.4093| ppl: 133.09039| %_neg_is_pos: 0.00381| lr: 0.0| temp: 1.99427 | loss: 1.1736| constrast_loss: 4.61374| div_loss: 0.80654| %_mask_idx: 0.39301| ppl: 123.8164| %_neg_is_pos: 0.00604| lr: 0.0| temp: 1.99427 | loss: 1.17271| constrast_loss: 4.60994| div_loss: 0.80896| %_mask_idx: 0.34226| ppl: 122.2641| %_neg_is_pos: 0.00928| lr: 0.0| temp: 1.99426 | loss: 1.17348| constrast_loss: 4.61396| div_loss: 0.79953| %_mask_idx: 0.36075| ppl: 128.3013| %_neg_is_pos: 0.0078| lr: 0.0| temp: 1.99426 | loss: 1.17256| constrast_loss: 4.60996| div_loss: 0.80265| %_mask_idx: 0.38941| ppl: 126.30461| %_neg_is_pos: 0.00999| lr: 0.0| temp: 1.99424 | loss: 1.17256| constrast_loss: 4.60864| div_loss: 0.81599| %_mask_idx: 0.35652| ppl: 117.76453| %_neg_is_pos: 0.01505| lr: 0.0| temp: 1.99424 | loss: 1.17303| constrast_loss: 4.6125| div_loss: 0.79602| %_mask_idx: 0.3631| ppl: 130.54857| %_neg_is_pos: 0.00733| lr: 0.0| temp: 1.99423 | loss: 1.17276| constrast_loss: 4.61045| div_loss: 0.80588| %_mask_idx: 0.40617| ppl: 124.23705| %_neg_is_pos: 0.00681| lr: 0.0| temp: 1.99423 | loss: 1.17264| constrast_loss: 4.61003| div_loss: 0.80514| %_mask_idx: 0.42466| ppl: 124.70881| %_neg_is_pos: 0.00726| lr: 0.0| temp: 1.99422 | loss: 1.17262| constrast_loss: 4.61037| div_loss: 0.80101| %_mask_idx: 0.38095| ppl: 127.35331| %_neg_is_pos: 0.00965| lr: 0.0| temp: 1.99422 | loss: 1.17341| constrast_loss: 4.61222| div_loss: 0.81431| %_mask_idx: 0.41118| ppl: 118.83949| %_neg_is_pos: 0.00477| lr: 0.0| temp: 1.99421 | loss: 1.17365| constrast_loss: 4.61436| div_loss: 0.80223| %_mask_idx: 0.38142| ppl: 126.57491| %_neg_is_pos: 0.00586| lr: 0.0| temp: 1.99421 | loss: 1.17284| constrast_loss: 4.61164| div_loss: 0.79729| %_mask_idx: 0.41463| ppl: 129.73753| %_neg_is_pos: 0.00664| lr: 0.0| temp: 1.99419 | loss: 1.1731| constrast_loss: 4.61261| div_loss: 0.79777| %_mask_idx: 0.37202| ppl: 129.42706| %_neg_is_pos: 0.00475| lr: 0.0| temp: 1.99419 | loss: 1.17296| constrast_loss: 4.61126| div_loss: 0.80577| %_mask_idx: 0.42246| ppl: 124.30872| %_neg_is_pos: 0.0052| lr: 0.0| temp: 1.99418 | loss: 1.17307| constrast_loss: 4.61215| div_loss: 0.80146| %_mask_idx: 0.38393| ppl: 127.06688| %_neg_is_pos: 0.00796| lr: 0.0| temp: 1.99418 | loss: 1.17335| constrast_loss: 4.61316| div_loss: 0.80221| %_mask_idx: 0.43562| ppl: 126.58642| %_neg_is_pos: 0.00495| lr: 0.0| temp: 1.99417 | loss: 1.17305| constrast_loss: 4.61274| div_loss: 0.79443| %_mask_idx: 0.41761| ppl: 131.56667| %_neg_is_pos: 0.00494| lr: 0.0| temp: 1.99417 | loss: 1.17331| constrast_loss: 4.61304| div_loss: 0.80213| %_mask_idx: 0.37563| ppl: 126.63406| %_neg_is_pos: 0.00848| lr: 0.0| temp: 1.99416 | loss: 1.17227| constrast_loss: 4.60971| div_loss: 0.79352| %_mask_idx: 0.44596| ppl: 132.14449| %_neg_is_pos: 0.00325| lr: 0.0| temp: 1.99416 | loss: 1.17093| constrast_loss: 4.60169| div_loss: 0.82021| %_mask_idx: 0.33098| ppl: 115.06606| %_neg_is_pos: 0.0215| lr: 0.0| temp: 1.99414 | loss: 1.17232| constrast_loss: 4.60898| div_loss: 0.80287| %_mask_idx: 0.41212| ppl: 126.16277| %_neg_is_pos: 0.01059| lr: 0.0| temp: 1.99414 | loss: 1.17278| constrast_loss: 4.61106| div_loss: 0.80046| %_mask_idx: 0.42575| ppl: 127.70329| %_neg_is_pos: 0.00747| lr: 0.0| temp: 1.99413 | loss: 1.17314| constrast_loss: 4.61216| div_loss: 0.80381| %_mask_idx: 0.39646| ppl: 125.56075| %_neg_is_pos: 0.00943| lr: 0.0| temp: 1.99413 | loss: 1.17286| constrast_loss: 4.61198| div_loss: 0.79452| %_mask_idx: 0.3891| ppl: 131.5058| %_neg_is_pos: 0.00598| lr: 0.0| temp: 1.99412 | loss: 1.17255| constrast_loss: 4.61084| div_loss: 0.79341| %_mask_idx: 0.32315| ppl: 132.21881| %_neg_is_pos: 0.00509| lr: 0.0| temp: 1.99412 | loss: 1.17301| constrast_loss: 4.61082| div_loss: 0.81217| %_mask_idx: 0.36513| ppl: 120.21408| %_neg_is_pos: 0.01064| lr: 0.0| temp: 1.99411 | loss: 1.17196| constrast_loss: 4.606| div_loss: 0.81856| %_mask_idx: 0.43578| ppl: 116.12135| %_neg_is_pos: 0.01061| lr: 0.0| temp: 1.99411 | loss: 1.17279| constrast_loss: 4.61038| div_loss: 0.80762| %_mask_idx: 0.45724| ppl: 123.12623| %_neg_is_pos: 0.00677| lr: 0.0| temp: 1.99409 | loss: 1.17306| constrast_loss: 4.61227| div_loss: 0.79985| %_mask_idx: 0.37829| ppl: 128.09872| %_neg_is_pos: 0.00781| lr: 0.0| temp: 1.99409 | loss: 1.17325| constrast_loss: 4.61201| div_loss: 0.80983| %_mask_idx: 0.42262| ppl: 121.70576| %_neg_is_pos: 0.00513| lr: 0.0| temp: 1.99408 | loss: 1.17292| constrast_loss: 4.61033| div_loss: 0.81365| %_mask_idx: 0.31955| ppl: 119.26202| %_neg_is_pos: 0.00927| lr: 0.0| temp: 1.99408 | loss: 1.17337| constrast_loss: 4.61362| div_loss: 0.79866| %_mask_idx: 0.37798| ppl: 128.85757| %_neg_is_pos: 0.00876| lr: 0.0| temp: 1.99406 | loss: 1.17324| constrast_loss: 4.61347| div_loss: 0.79485| %_mask_idx: 0.3985| ppl: 131.29391| %_neg_is_pos: 0.00612| lr: 0.0| temp: 1.99406 | loss: 1.17261| constrast_loss: 4.60847| div_loss: 0.81984| %_mask_idx: 0.37625| ppl: 115.30016| %_neg_is_pos: 0.01199| lr: 0.0| temp: 1.99405 | loss: 1.17296| constrast_loss: 4.61146| div_loss: 0.8037| %_mask_idx: 0.3985| ppl: 125.63393| %_neg_is_pos: 0.00751| lr: 0.0| temp: 1.99405 | loss: 1.17315| constrast_loss: 4.61286| div_loss: 0.79741| %_mask_idx: 0.40695| ppl: 129.66037| %_neg_is_pos: 0.00504| lr: 0.0| temp: 1.99404 | loss: 1.17373| constrast_loss: 4.61395| div_loss: 0.80977| %_mask_idx: 0.40132| ppl: 121.74563| %_neg_is_pos: 0.0086| lr: 0.0| temp: 1.99404 | loss: 1.17292| constrast_loss: 4.61162| div_loss: 0.80066| %_mask_idx: 0.35166| ppl: 127.57675| %_neg_is_pos: 0.00945| lr: 0.0| temp: 1.99403 | loss: 1.17291| constrast_loss: 4.6107| div_loss: 0.80962| %_mask_idx: 0.40053| ppl: 121.84323| %_neg_is_pos: 0.00953| lr: 0.0| temp: 1.99403 | loss: 1.17357| constrast_loss: 4.61392| div_loss: 0.80344| %_mask_idx: 0.4104| ppl: 125.7988| %_neg_is_pos: 0.00404| lr: 0.0| temp: 1.99401 | loss: 1.17319| constrast_loss: 4.61271| div_loss: 0.80048| %_mask_idx: 0.41494| ppl: 127.69142| %_neg_is_pos: 0.00784| lr: 0.0| temp: 1.99401 | loss: 1.17289| constrast_loss: 4.61091| div_loss: 0.80666| %_mask_idx: 0.38628| ppl: 123.73824| %_neg_is_pos: 0.01104| lr: 0.0| temp: 1.994 | loss: 1.17253| constrast_loss: 4.6095| div_loss: 0.80616| %_mask_idx: 0.38643| ppl: 124.05449| %_neg_is_pos: 0.00985| lr: 0.0| temp: 1.994 | loss: 1.17282| constrast_loss: 4.61048| div_loss: 0.80799| %_mask_idx: 0.36482| ppl: 122.88475| %_neg_is_pos: 0.00868| lr: 0.0| temp: 1.99399 | loss: 1.17275| constrast_loss: 4.61059| div_loss: 0.80418| %_mask_idx: 0.37108| ppl: 125.3253| %_neg_is_pos: 0.00982| lr: 0.0| temp: 1.99399 | loss: 1.17296| constrast_loss: 4.61213| div_loss: 0.797| %_mask_idx: 0.40429| ppl: 129.91936| %_neg_is_pos: 0.00813| lr: 0.0| temp: 1.99398 | loss: 1.17381| constrast_loss: 4.6162| div_loss: 0.79052| %_mask_idx: 0.3916| ppl: 134.06587| %_neg_is_pos: 0.00474| lr: 0.0| temp: 1.99398 | loss: 1.17265| constrast_loss: 4.60998| div_loss: 0.80622| %_mask_idx: 0.41667| ppl: 124.02156| %_neg_is_pos: 0.0065| lr: 0.0| temp: 1.99396 | loss: 1.17326| constrast_loss: 4.61237| div_loss: 0.80682| %_mask_idx: 0.39004| ppl: 123.63483| %_neg_is_pos: 0.00717| lr: 0.0| temp: 1.99396 | loss: 1.17345| constrast_loss: 4.61392| div_loss: 0.79879| %_mask_idx: 0.44878| ppl: 128.77637| %_neg_is_pos: 0.00685| lr: 0.0| temp: 1.99395 | loss: 1.17315| constrast_loss: 4.61267| div_loss: 0.79912| %_mask_idx: 0.3797| ppl: 128.56566| %_neg_is_pos: 0.00779| lr: 0.0| temp: 1.99395 | loss: 1.17378| constrast_loss: 4.6149| div_loss: 0.80227| %_mask_idx: 0.38925| ppl: 126.54712| %_neg_is_pos: 0.00458| lr: 0.0| temp: 1.99394 | loss: 1.17337| constrast_loss: 4.61347| div_loss: 0.80016| %_mask_idx: 0.38863| ppl: 127.90038| %_neg_is_pos: 0.00494| lr: 0.0| temp: 1.99394 | loss: 1.17254| constrast_loss: 4.60989| div_loss: 0.80283| %_mask_idx: 0.41181| ppl: 126.19104| %_neg_is_pos: 0.00788| lr: 0.0| temp: 1.99393 | loss: 1.17353| constrast_loss: 4.61504| div_loss: 0.79077| %_mask_idx: 0.40852| ppl: 133.90787| %_neg_is_pos: 0.00362| lr: 0.0| temp: 1.99393 | loss: 1.17286| constrast_loss: 4.6103| div_loss: 0.81161| %_mask_idx: 0.35918| ppl: 120.56707| %_neg_is_pos: 0.01196| lr: 0.0| temp: 1.99391 | loss: 1.17252| constrast_loss: 4.60856| div_loss: 0.81513| %_mask_idx: 0.34148| ppl: 118.31715| %_neg_is_pos: 0.01169| lr: 0.0| temp: 1.99391 | loss: 1.17304| constrast_loss: 4.61236| div_loss: 0.79788| %_mask_idx: 0.47603| ppl: 129.35672| %_neg_is_pos: 0.00465| lr: 0.0| temp: 1.9939 | loss: 1.17348| constrast_loss: 4.61375| div_loss: 0.8016| %_mask_idx: 0.39615| ppl: 126.97852| %_neg_is_pos: 0.0076| lr: 0.0| temp: 1.9939 | loss: 1.17341| constrast_loss: 4.61448| div_loss: 0.79181| %_mask_idx: 0.42152| ppl: 133.23952| %_neg_is_pos: 0.00391| lr: 0.0| temp: 1.99388 | loss: 1.17297| constrast_loss: 4.61186| div_loss: 0.80015| %_mask_idx: 0.37359| ppl: 127.90561| %_neg_is_pos: 0.00731| lr: 0.0| temp: 1.99388 | loss: 1.17331| constrast_loss: 4.61183| div_loss: 0.81406| %_mask_idx: 0.42356| ppl: 118.99875| %_neg_is_pos: 0.0087| lr: 0.0| temp: 1.99387 | loss: 1.17294| constrast_loss: 4.61031| div_loss: 0.81441| %_mask_idx: 0.35448| ppl: 118.77859| %_neg_is_pos: 0.01136| lr: 0.0| temp: 1.99387 | loss: 1.17283| constrast_loss: 4.6113| div_loss: 0.80007| %_mask_idx: 0.39333| ppl: 127.95673| %_neg_is_pos: 0.00642| lr: 0.0| temp: 1.99386 | loss: 1.17265| constrast_loss: 4.6096| div_loss: 0.81018| %_mask_idx: 0.3985| ppl: 121.4838| %_neg_is_pos: 0.00982| lr: 0.0| temp: 1.99386 | loss: 1.17324| constrast_loss: 4.61398| div_loss: 0.78975| %_mask_idx: 0.45833| ppl: 134.56116| %_neg_is_pos: 0.00286| lr: 0.0| temp: 1.99385 | loss: 1.1736| constrast_loss: 4.61417| div_loss: 0.80236| %_mask_idx: 0.42434| ppl: 126.48776| %_neg_is_pos: 0.00453| lr: 0.0| temp: 1.99385 | loss: 1.173| constrast_loss: 4.61121| div_loss: 0.80783| %_mask_idx: 0.38831| ppl: 122.98714| %_neg_is_pos: 0.00865| lr: 0.0| temp: 1.99383 | loss: 1.17267| constrast_loss: 4.61086| div_loss: 0.79838| %_mask_idx: 0.38612| ppl: 129.03848| %_neg_is_pos: 0.00994| lr: 0.0| temp: 1.99383 | loss: 1.17336| constrast_loss: 4.61359| div_loss: 0.79841| %_mask_idx: 0.40335| ppl: 129.01553| %_neg_is_pos: 0.00942| lr: 0.0| temp: 1.99382 | loss: 1.17215| constrast_loss: 4.608| div_loss: 0.80593| %_mask_idx: 0.37202| ppl: 124.2035| %_neg_is_pos: 0.01568| lr: 0.0| temp: 1.99382 | loss: 1.17274| constrast_loss: 4.61045| div_loss: 0.80499| %_mask_idx: 0.42152| ppl: 124.80611| %_neg_is_pos: 0.00629| lr: 0.0| temp: 1.99381 | loss: 1.17281| constrast_loss: 4.61161| div_loss: 0.79617| %_mask_idx: 0.39489| ppl: 130.44968| %_neg_is_pos: 0.0057| lr: 0.0| temp: 1.99381 | loss: 1.17311| constrast_loss: 4.61245| div_loss: 0.79987| %_mask_idx: 0.33929| ppl: 128.08293| %_neg_is_pos: 0.00845| lr: 0.0| temp: 1.9938 | loss: 1.17299| constrast_loss: 4.61212| div_loss: 0.79839| %_mask_idx: 0.3703| ppl: 129.03018| %_neg_is_pos: 0.00438| lr: 0.0| temp: 1.9938 | loss: 1.17347| constrast_loss: 4.6142| div_loss: 0.79676| %_mask_idx: 0.40069| ppl: 130.07245| %_neg_is_pos: 0.00485| lr: 0.0| temp: 1.99378 | loss: 1.17244| constrast_loss: 4.60987| div_loss: 0.79885| %_mask_idx: 0.39301| ppl: 128.73584| %_neg_is_pos: 0.01085| lr: 0.0| temp: 1.99378 | loss: 1.17321| constrast_loss: 4.61103| div_loss: 0.81817| %_mask_idx: 0.38565| ppl: 116.37143| %_neg_is_pos: 0.00784| lr: 0.0| temp: 1.99377 | loss: 1.17326| constrast_loss: 4.61323| div_loss: 0.79805| %_mask_idx: 0.40476| ppl: 129.25119| %_neg_is_pos: 0.00819| lr: 0.0| temp: 1.99377 | loss: 1.17334| constrast_loss: 4.61397| div_loss: 0.79407| %_mask_idx: 0.36513| ppl: 131.79581| %_neg_is_pos: 0.00431| lr: 0.0| temp: 1.99376 | loss: 1.17332| constrast_loss: 4.61365| div_loss: 0.79616| %_mask_idx: 0.35088| ppl: 130.45567| %_neg_is_pos: 0.00851| lr: 0.0| temp: 1.99376 | loss: 1.17254| constrast_loss: 4.6103| div_loss: 0.79858| %_mask_idx: 0.38252| ppl: 128.90656| %_neg_is_pos: 0.00933| lr: 0.0| temp: 1.99375 | loss: 1.17254| constrast_loss: 4.61095| div_loss: 0.79223| %_mask_idx: 0.37986| ppl: 132.97287| %_neg_is_pos: 0.00685| lr: 0.0| temp: 1.99375 | loss: 1.17294| constrast_loss: 4.6113| div_loss: 0.80454| %_mask_idx: 0.38769| ppl: 125.09759| %_neg_is_pos: 0.01148| lr: 0.0| temp: 1.99373 | loss: 1.17296| constrast_loss: 4.61306| div_loss: 0.78784| %_mask_idx: 0.42951| ppl: 135.78452| %_neg_is_pos: 0.00512| lr: 0.0| temp: 1.99373 | loss: 1.17272| constrast_loss: 4.60949| div_loss: 0.81397| %_mask_idx: 0.4198| ppl: 119.06105| %_neg_is_pos: 0.00818| lr: 0.0| temp: 1.99372 | loss: 1.17297| constrast_loss: 4.61169| div_loss: 0.80187| %_mask_idx: 0.40821| ppl: 126.8016| %_neg_is_pos: 0.00674| lr: 0.0| temp: 1.99372 [2021-09-01 22:38:45,628] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0 [2021-09-01 22:38:45,628] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0 | loss: 1.17286| constrast_loss: 4.61152| div_loss: 0.79923| %_mask_idx: 0.41134| ppl: 128.49553| %_neg_is_pos: 0.00737| lr: 0.0| temp: 1.9937 | loss: 1.17263| constrast_loss: 4.61081| div_loss: 0.79716| %_mask_idx: 0.35025| ppl: 129.81995| %_neg_is_pos: 0.00546| lr: 0.0| temp: 1.9937 | loss: 1.17288| constrast_loss: 4.60995| div_loss: 0.81577| %_mask_idx: 0.35934| ppl: 117.90993| %_neg_is_pos: 0.00715| lr: 0.0| temp: 1.99369 | loss: 1.17341| constrast_loss: 4.6133| div_loss: 0.80331| %_mask_idx: 0.38377| ppl: 125.8789| %_neg_is_pos: 0.00926| lr: 0.0| temp: 1.99369 | loss: 1.17305| constrast_loss: 4.61175| div_loss: 0.80448| %_mask_idx: 0.34868| ppl: 125.13313| %_neg_is_pos: 0.00605| lr: 0.0| temp: 1.99368 | loss: 1.17363| constrast_loss: 4.61393| div_loss: 0.8058| %_mask_idx: 0.3631| ppl: 124.28857| %_neg_is_pos: 0.00741| lr: 0.0| temp: 1.99368 | loss: 1.17333| constrast_loss: 4.61249| div_loss: 0.80821| %_mask_idx: 0.37798| ppl: 122.74536| %_neg_is_pos: 0.01368| lr: 0.0| temp: 1.99367 | loss: 1.17278| constrast_loss: 4.61241| div_loss: 0.78693| %_mask_idx: 0.41244| ppl: 136.36563| %_neg_is_pos: 0.00503| lr: 0.0| temp: 1.99367 | loss: 1.17315| constrast_loss: 4.61148| div_loss: 0.8114| %_mask_idx: 0.35965| ppl: 120.7028| %_neg_is_pos: 0.014| lr: 0.0| temp: 1.99365 | loss: 1.17268| constrast_loss: 4.61072| div_loss: 0.79992| %_mask_idx: 0.40789| ppl: 128.05336| %_neg_is_pos: 0.00424| lr: 0.0| temp: 1.99365 | loss: 1.17221| constrast_loss: 4.61001| div_loss: 0.78817| %_mask_idx: 0.40461| ppl: 135.56941| %_neg_is_pos: 0.00772| lr: 0.0| temp: 1.99364 | loss: 1.17239| constrast_loss: 4.60976| div_loss: 0.79786| %_mask_idx: 0.40727| ppl: 129.3725| %_neg_is_pos: 0.00705| lr: 0.0| temp: 1.99364 | loss: 1.17252| constrast_loss: 4.61073| div_loss: 0.79357| %_mask_idx: 0.39521| ppl: 132.11707| %_neg_is_pos: 0.0106| lr: 0.0| temp: 1.99363 | loss: 1.17317| constrast_loss: 4.61369| div_loss: 0.79| %_mask_idx: 0.39395| ppl: 134.40082| %_neg_is_pos: 0.00371| lr: 0.0| temp: 1.99363 | loss: 1.17254| constrast_loss: 4.61237| div_loss: 0.77786| %_mask_idx: 0.33866| ppl: 142.17236| %_neg_is_pos: 0.00544| lr: 0.0| temp: 1.99362 | loss: 1.17321| constrast_loss: 4.6128| div_loss: 0.80031| %_mask_idx: 0.38001| ppl: 127.79895| %_neg_is_pos: 0.00615| lr: 0.0| temp: 1.99362 | loss: 1.17251| constrast_loss: 4.61052| div_loss: 0.79528| %_mask_idx: 0.39082| ppl: 131.02063| %_neg_is_pos: 0.01363| lr: 0.0| temp: 1.9936| loss: 1.17235| constrast_loss: 4.61036| div_loss: 0.79029| %_mask_idx: 0.40915| ppl: 134.21263| %_neg_is_pos: 0.01651| lr: 0.0| temp: 1.9936 | loss: 1.17275| constrast_loss: 4.61139| div_loss: 0.79613| %_mask_idx: 0.41526| ppl: 130.47833| %_neg_is_pos: 0.00964| lr: 0.0| temp: 1.99359 | loss: 1.17244| constrast_loss: 4.61061| div_loss: 0.79138| %_mask_idx: 0.41917| ppl: 133.51587| %_neg_is_pos: 0.0077| lr: 0.0| temp: 1.99359 | loss: 1.17265| constrast_loss: 4.61142| div_loss: 0.79157| %_mask_idx: 0.36717| ppl: 133.39543| %_neg_is_pos: 0.00902| lr: 0.0| temp: 1.99358 | loss: 1.17226| constrast_loss: 4.6102| div_loss: 0.78849| %_mask_idx: 0.36529| ppl: 135.36613| %_neg_is_pos: 0.0128| lr: 0.0| temp: 1.99358 | loss: 1.17196| constrast_loss: 4.60933| div_loss: 0.78494| %_mask_idx: 0.40163| ppl: 137.64021| %_neg_is_pos: 0.00892| lr: 0.0| temp: 1.99357 | loss: 1.17194| constrast_loss: 4.60921| div_loss: 0.78565| %_mask_idx: 0.35182| ppl: 137.18658| %_neg_is_pos: 0.0122| lr: 0.0| temp: 1.99357 | loss: 1.17201| constrast_loss: 4.60946| div_loss: 0.7857| %_mask_idx: 0.38299| ppl: 137.15413| %_neg_is_pos: 0.00848| lr: 0.0| temp: 1.99355 | loss: 1.17257| constrast_loss: 4.61018| div_loss: 0.80085| %_mask_idx: 0.34054| ppl: 127.45702| %_neg_is_pos: 0.01085| lr: 0.0| temp: 1.99355 | loss: 1.17262| constrast_loss: 4.61307| div_loss: 0.77424| %_mask_idx: 0.37625| ppl: 144.48508| %_neg_is_pos: 0.00497| lr: 0.0| temp: 1.99354 | loss: 1.17386| constrast_loss: 4.61494| div_loss: 0.80486| %_mask_idx: 0.37923| ppl: 124.89062| %_neg_is_pos: 0.00511| lr: 0.0| temp: 1.99354 | loss: 1.17234| constrast_loss: 4.61161| div_loss: 0.77764| %_mask_idx: 0.39803| ppl: 142.30814| %_neg_is_pos: 0.00547| lr: 0.0| temp: 1.99352 | loss: 1.17218| constrast_loss: 4.61108| div_loss: 0.77635| %_mask_idx: 0.38596| ppl: 143.13326| %_neg_is_pos: 0.00813| lr: 0.0| temp: 1.99352 | loss: 1.17221| constrast_loss: 4.61179| div_loss: 0.77065| %_mask_idx: 0.40304| ppl: 146.78561| %_neg_is_pos: 0.00395| lr: 0.0| temp: 1.99351 | loss: 1.17237| constrast_loss: 4.61104| div_loss: 0.78428| %_mask_idx: 0.37813| ppl: 138.06128| %_neg_is_pos: 0.00545| lr: 0.0| temp: 1.99351 | loss: 1.17235| constrast_loss: 4.60985| div_loss: 0.79549| %_mask_idx: 0.40633| ppl: 130.88602| %_neg_is_pos: 0.00488| lr: 0.0| temp: 1.9935 | loss: 1.17225| constrast_loss: 4.61252| div_loss: 0.76465| %_mask_idx: 0.3963| ppl: 150.62572| %_neg_is_pos: 0.00477| lr: 0.0| temp: 1.9935 | loss: 1.17211| constrast_loss: 4.60969| div_loss: 0.78732| %_mask_idx: 0.4209| ppl: 136.11456| %_neg_is_pos: 0.00707| lr: 0.0| temp: 1.99349 | loss: 1.17327| constrast_loss: 4.61464| div_loss: 0.78435| %_mask_idx: 0.40241| ppl: 138.01633| %_neg_is_pos: 0.00463| lr: 0.0| temp: 1.99349 | loss: 1.17329| constrast_loss: 4.6147| div_loss: 0.78455| %_mask_idx: 0.42701| ppl: 137.88641| %_neg_is_pos: 0.00537| lr: 0.0| temp: 1.99347 | loss: 1.17233| constrast_loss: 4.61075| div_loss: 0.78584| %_mask_idx: 0.35902| ppl: 137.0648| %_neg_is_pos: 0.00607| lr: 0.0| temp: 1.99347 | loss: 1.17209| constrast_loss: 4.60875| div_loss: 0.79605| %_mask_idx: 0.39207| ppl: 130.52576| %_neg_is_pos: 0.01132| lr: 0.0| temp: 1.99346 | loss: 1.17298| constrast_loss: 4.61278| div_loss: 0.79156| %_mask_idx: 0.41526| ppl: 133.40086| %_neg_is_pos: 0.00474| lr: 0.0| temp: 1.99346 | loss: 1.17248| constrast_loss: 4.61076| div_loss: 0.79149| %_mask_idx: 0.36388| ppl: 133.4436| %_neg_is_pos: 0.00796| lr: 0.0| temp: 1.99345 | loss: 1.17299| constrast_loss: 4.61435| div_loss: 0.77606| %_mask_idx: 0.42654| ppl: 143.32227| %_neg_is_pos: 0.00394| lr: 0.0| temp: 1.99345 | loss: 1.1718| constrast_loss: 4.60931| div_loss: 0.77891| %_mask_idx: 0.38315| ppl: 141.49826| %_neg_is_pos: 0.00729| lr: 0.0| temp: 1.99344 | loss: 1.17234| constrast_loss: 4.6104| div_loss: 0.78954| %_mask_idx: 0.3891| ppl: 134.69362| %_neg_is_pos: 0.00472| lr: 0.0| temp: 1.99344 | loss: 1.17293| constrast_loss: 4.6142| div_loss: 0.77532| %_mask_idx: 0.40555| ppl: 143.79465| %_neg_is_pos: 0.00487| lr: 0.0| temp: 1.99342 | loss: 1.17263| constrast_loss: 4.6104| div_loss: 0.80137| %_mask_idx: 0.39646| ppl: 127.12621| %_neg_is_pos: 0.00707| lr: 0.0| temp: 1.99342 | loss: 1.17209| constrast_loss: 4.6094| div_loss: 0.78969| %_mask_idx: 0.4068| ppl: 134.59789| %_neg_is_pos: 0.0048| lr: 0.0| temp: 1.99341 | loss: 1.17249| constrast_loss: 4.6095| div_loss: 0.80471| %_mask_idx: 0.34305| ppl: 124.98305| %_neg_is_pos: 0.00669| lr: 0.0| temp: 1.99341 | loss: 1.17251| constrast_loss: 4.61015| div_loss: 0.79878| %_mask_idx: 0.34806| ppl: 128.78114| %_neg_is_pos: 0.01106| lr: 0.0| temp: 1.9934 | loss: 1.17232| constrast_loss: 4.61136| div_loss: 0.77934| %_mask_idx: 0.35025| ppl: 141.2234| %_neg_is_pos: 0.00776| lr: 0.0| temp: 1.9934 | loss: 1.17206| constrast_loss: 4.60973| div_loss: 0.78526| %_mask_idx: 0.4162| ppl: 137.43118| %_neg_is_pos: 0.00577| lr: 0.0| temp: 1.99339 | loss: 1.17278| constrast_loss: 4.61369| div_loss: 0.77435| %_mask_idx: 0.35949| ppl: 144.41638| %_neg_is_pos: 0.00404| lr: 0.0| temp: 1.99339 | loss: 1.17219| constrast_loss: 4.61009| div_loss: 0.78662| %_mask_idx: 0.40899| ppl: 136.56088| %_neg_is_pos: 0.00861| lr: 0.0| temp: 1.99337 | loss: 1.17267| constrast_loss: 4.61256| div_loss: 0.78116| %_mask_idx: 0.42513| ppl: 140.0553| %_neg_is_pos: 0.00344| lr: 0.0| temp: 1.99337 | loss: 1.17207| constrast_loss: 4.61073| div_loss: 0.77541| %_mask_idx: 0.41056| ppl: 143.73972| %_neg_is_pos: 0.00944| lr: 0.0| temp: 1.99336 | loss: 1.17291| constrast_loss: 4.61255| div_loss: 0.79071| %_mask_idx: 0.41776| ppl: 133.94818| %_neg_is_pos: 0.00634| lr: 0.0| temp: 1.99336 | loss: 1.17245| constrast_loss: 4.61096| div_loss: 0.78845| %_mask_idx: 0.37672| ppl: 135.39227| %_neg_is_pos: 0.005| lr: 0.0| temp: 1.99334 | loss: 1.17225| constrast_loss: 4.61096| div_loss: 0.78021| %_mask_idx: 0.39568| ppl: 140.66338| %_neg_is_pos: 0.01083| lr: 0.0| temp: 1.99334 | loss: 1.17254| constrast_loss: 4.6122| div_loss: 0.77983| %_mask_idx: 0.38127| ppl: 140.90698| %_neg_is_pos: 0.00429| lr: 0.0| temp: 1.99333 | loss: 1.17219| constrast_loss: 4.61123| div_loss: 0.77543| %_mask_idx: 0.38643| ppl: 143.72704| %_neg_is_pos: 0.00499| lr: 0.0| temp: 1.99333 | loss: 1.17272| constrast_loss: 4.61323| div_loss: 0.77642| %_mask_idx: 0.38643| ppl: 143.0928| %_neg_is_pos: 0.00449| lr: 0.0| temp: 1.99332 | loss: 1.17262| constrast_loss: 4.61117| div_loss: 0.79299| %_mask_idx: 0.40398| ppl: 132.48813| %_neg_is_pos: 0.00612| lr: 0.0| temp: 1.99332 | loss: 1.17263| constrast_loss: 4.61173| div_loss: 0.78807| %_mask_idx: 0.36231| ppl: 135.63463| %_neg_is_pos: 0.00716| lr: 0.0| temp: 1.99331 | loss: 1.17248| constrast_loss: 4.61209| div_loss: 0.77808| %_mask_idx: 0.35072| ppl: 142.02808| %_neg_is_pos: 0.0071| lr: 0.0| temp: 1.99331 | loss: 1.17248| constrast_loss: 4.61198| div_loss: 0.77961| %_mask_idx: 0.40711| ppl: 141.04657| %_neg_is_pos: 0.00597| lr: 0.0| temp: 1.99329 | loss: 1.17232| constrast_loss: 4.61057| div_loss: 0.78726| %_mask_idx: 0.40163| ppl: 136.15433| %_neg_is_pos: 0.00635| lr: 0.0| temp: 1.99329 | loss: 1.17226| constrast_loss: 4.61085| div_loss: 0.78191| %_mask_idx: 0.37469| ppl: 139.58035| %_neg_is_pos: 0.00887| lr: 0.0| temp: 1.99328 | loss: 1.17229| constrast_loss: 4.61219| div_loss: 0.76964| %_mask_idx: 0.41463| ppl: 147.43193| %_neg_is_pos: 0.00335| lr: 0.0| temp: 1.99328 | loss: 1.17228| constrast_loss: 4.61095| div_loss: 0.78191| %_mask_idx: 0.39098| ppl: 139.57794| %_neg_is_pos: 0.00414| lr: 0.0| temp: 1.99327 | loss: 1.17266| constrast_loss: 4.61246| div_loss: 0.7816| %_mask_idx: 0.36059| ppl: 139.77905| %_neg_is_pos: 0.00609| lr: 0.0| temp: 1.99327 | loss: 1.17255| constrast_loss: 4.61208| div_loss: 0.78107| %_mask_idx: 0.37766| ppl: 140.11613| %_neg_is_pos: 0.00471| lr: 0.0| temp: 1.99326 | loss: 1.17265| constrast_loss: 4.61252| div_loss: 0.78092| %_mask_idx: 0.40085| ppl: 140.21062| %_neg_is_pos: 0.00417| lr: 0.0| temp: 1.99326 | loss: 1.17181| constrast_loss: 4.60804| div_loss: 0.792| %_mask_idx: 0.35965| ppl: 133.11691| %_neg_is_pos: 0.00724| lr: 0.0| temp: 1.99324 | loss: 1.17274| constrast_loss: 4.61206| div_loss: 0.78901| %_mask_idx: 0.40727| ppl: 135.03326| %_neg_is_pos: 0.00557| lr: 0.0| temp: 1.99324 | loss: 1.17245| constrast_loss: 4.61225| div_loss: 0.77537| %_mask_idx: 0.39348| ppl: 143.76602| %_neg_is_pos: 0.00473| lr: 0.0| temp: 1.99323 | loss: 1.17238| constrast_loss: 4.61074| div_loss: 0.78781| %_mask_idx: 0.40273| ppl: 135.80167| %_neg_is_pos: 0.0082| lr: 0.0| temp: 1.99323 | loss: 1.17251| constrast_loss: 4.61157| div_loss: 0.7848| %_mask_idx: 0.36325| ppl: 137.72957| %_neg_is_pos: 0.00606| lr: 0.0| temp: 1.99322 | loss: 1.17242| constrast_loss: 4.61131| div_loss: 0.78354| %_mask_idx: 0.37704| ppl: 138.53204| %_neg_is_pos: 0.00715| lr: 0.0| temp: 1.99322 | loss: 1.17337| constrast_loss: 4.61386| div_loss: 0.7961| %_mask_idx: 0.40899| ppl: 130.49915| %_neg_is_pos: 0.00417| lr: 0.0| temp: 1.99321 | loss: 1.17292| constrast_loss: 4.61318| div_loss: 0.78514| %_mask_idx: 0.4104| ppl: 137.50926| %_neg_is_pos: 0.00331| lr: 0.0| temp: 1.99321 | loss: 1.17344| constrast_loss: 4.61566| div_loss: 0.7809| %_mask_idx: 0.41275| ppl: 140.22641| %_neg_is_pos: 0.0049| lr: 0.0| temp: 1.99319 | loss: 1.17284| constrast_loss: 4.61158| div_loss: 0.79774| %_mask_idx: 0.38894| ppl: 129.44669| %_neg_is_pos: 0.00751| lr: 0.0| temp: 1.99319 | loss: 1.17272| constrast_loss: 4.6125| div_loss: 0.78374| %_mask_idx: 0.39505| ppl: 138.4068| %_neg_is_pos: 0.00534| lr: 0.0| temp: 1.99318 | loss: 1.17234| constrast_loss: 4.61072| div_loss: 0.78638| %_mask_idx: 0.33772| ppl: 136.71954| %_neg_is_pos: 0.00982| lr: 0.0| temp: 1.99318 | loss: 1.17292| constrast_loss: 4.61395| div_loss: 0.77711| %_mask_idx: 0.42027| ppl: 142.64734| %_neg_is_pos: 0.00218| lr: 0.0| temp: 1.99316 | loss: 1.17253| constrast_loss: 4.61154| div_loss: 0.78601| %_mask_idx: 0.35041| ppl: 136.95432| %_neg_is_pos: 0.00927| lr: 0.0| temp: 1.99316 | loss: 1.17146| constrast_loss: 4.60677| div_loss: 0.79082| %_mask_idx: 0.35166| ppl: 133.87535| %_neg_is_pos: 0.01158| lr: 0.0| temp: 1.99315 | loss: 1.17216| constrast_loss: 4.61063| div_loss: 0.78015| %_mask_idx: 0.35182| ppl: 140.7061| %_neg_is_pos: 0.0057| lr: 0.0| temp: 1.99315 | loss: 1.1732| constrast_loss: 4.6145| div_loss: 0.78315| %_mask_idx: 0.4552| ppl: 138.78722| %_neg_is_pos: 0.00352| lr: 0.0| temp: 1.99314 | loss: 1.17263| constrast_loss: 4.61307| div_loss: 0.77456| %_mask_idx: 0.40476| ppl: 144.28413| %_neg_is_pos: 0.00672| lr: 0.0| temp: 1.99314 | loss: 1.17268| constrast_loss: 4.61389| div_loss: 0.76813| %_mask_idx: 0.39489| ppl: 148.39883| %_neg_is_pos: 0.00406| lr: 0.0| temp: 1.99313 | loss: 1.17227| constrast_loss: 4.60943| div_loss: 0.79636| %_mask_idx: 0.39756| ppl: 130.33206| %_neg_is_pos: 0.00789| lr: 0.0| temp: 1.99313 | loss: 1.17271| constrast_loss: 4.61286| div_loss: 0.77979| %_mask_idx: 0.40163| ppl: 140.93515| %_neg_is_pos: 0.00442| lr: 0.0| temp: 1.99311 | loss: 1.17201| constrast_loss: 4.61079| div_loss: 0.77256| %_mask_idx: 0.3761| ppl: 145.56471| %_neg_is_pos: 0.00527| lr: 0.0| temp: 1.99311 | loss: 1.1719| constrast_loss: 4.60935| div_loss: 0.78249| %_mask_idx: 0.44063| ppl: 139.20959| %_neg_is_pos: 0.00886| lr: 0.0| temp: 1.9931 | loss: 1.17253| constrast_loss: 4.61359| div_loss: 0.7654| %_mask_idx: 0.40194| ppl: 150.14194| %_neg_is_pos: 0.00444| lr: 0.0| temp: 1.9931 | loss: 1.17277| constrast_loss: 4.61373| div_loss: 0.77349| %_mask_idx: 0.41776| ppl: 144.96468| %_neg_is_pos: 0.00358| lr: 0.0| temp: 1.99309 | loss: 1.17198| constrast_loss: 4.60957| div_loss: 0.78357| %_mask_idx: 0.35323| ppl: 138.51437| %_neg_is_pos: 0.00821| lr: 0.0| temp: 1.99309 | loss: 1.17203| constrast_loss: 4.60875| div_loss: 0.79377| %_mask_idx: 0.37672| ppl: 131.98965| %_neg_is_pos: 0.00873| lr: 0.0| temp: 1.99308 | loss: 1.17322| constrast_loss: 4.61403| div_loss: 0.78843| %_mask_idx: 0.38142| ppl: 135.40616| %_neg_is_pos: 0.00569| lr: 0.0| temp: 1.99308 | loss: 1.17249| constrast_loss: 4.61134| div_loss: 0.78629| %_mask_idx: 0.3927| ppl: 136.77335| %_neg_is_pos: 0.00876| lr: 0.0| temp: 1.99306 | loss: 1.1725| constrast_loss: 4.61098| div_loss: 0.7901| %_mask_idx: 0.42215| ppl: 134.33519| %_neg_is_pos: 0.00501| lr: 0.0| temp: 1.99306 | loss: 1.17212| constrast_loss: 4.61068| div_loss: 0.7781| %_mask_idx: 0.36685| ppl: 142.01353| %_neg_is_pos: 0.00681| lr: 0.0| temp: 1.99305 | loss: 1.17217| constrast_loss: 4.61022| div_loss: 0.78455| %_mask_idx: 0.38346| ppl: 137.88536| %_neg_is_pos: 0.01164| lr: 0.0| temp: 1.99305 | loss: 1.17277| constrast_loss: 4.61235| div_loss: 0.78729| %_mask_idx: 0.3584| ppl: 136.13669| %_neg_is_pos: 0.0117| lr: 0.0| temp: 1.99304 | loss: 1.17267| constrast_loss: 4.61048| div_loss: 0.80198| %_mask_idx: 0.37281| ppl: 126.73488| %_neg_is_pos: 0.00536| lr: 0.0| temp: 1.99304 | loss: 1.1728| constrast_loss: 4.61213| div_loss: 0.7909| %_mask_idx: 0.36419| ppl: 133.82381| %_neg_is_pos: 0.01056| lr: 0.0| temp: 1.99303 | loss: 1.1727| constrast_loss: 4.6118| div_loss: 0.79016| %_mask_idx: 0.41745| ppl: 134.29868| %_neg_is_pos: 0.00703| lr: 0.0| temp: 1.99303 | loss: 1.1728| constrast_loss: 4.61237| div_loss: 0.78843| %_mask_idx: 0.37892| ppl: 135.40614| %_neg_is_pos: 0.00531| lr: 0.0| temp: 1.99301 | loss: 1.17202| constrast_loss: 4.60882| div_loss: 0.7926| %_mask_idx: 0.362| ppl: 132.73744| %_neg_is_pos: 0.00816| lr: 0.0| temp: 1.99301 | loss: 1.17245| constrast_loss: 4.61203| div_loss: 0.77757| %_mask_idx: 0.3963| ppl: 142.35559| %_neg_is_pos: 0.0061| lr: 0.0| temp: 1.993 | loss: 1.17231| constrast_loss: 4.60997| div_loss: 0.79272| %_mask_idx: 0.37516| ppl: 132.66187| %_neg_is_pos: 0.0097| lr: 0.0| temp: 1.993 [2021-09-01 22:47:59,653] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0 [2021-09-01 22:47:59,653] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0 | loss: 1.17181| constrast_loss: 4.60995| div_loss: 0.77284| %_mask_idx: 0.36388| ppl: 145.38153| %_neg_is_pos: 0.00527| lr: 0.0| temp: 1.99298| loss: 1.17277| constrast_loss: 4.61254| div_loss: 0.78543| %_mask_idx: 0.34226| ppl: 137.32385| %_neg_is_pos: 0.00614| lr: 0.0| temp: 1.99298 | loss: 1.17258| constrast_loss: 4.61406| div_loss: 0.76255| %_mask_idx: 0.45316| ppl: 151.97021| %_neg_is_pos: 0.00383| lr: 0.0| temp: 1.99297 | loss: 1.17259| constrast_loss: 4.61108| div_loss: 0.7929| %_mask_idx: 0.38581| ppl: 132.54204| %_neg_is_pos: 0.00473| lr: 0.0| temp: 1.99297 | loss: 1.17336| constrast_loss: 4.61583| div_loss: 0.77623| %_mask_idx: 0.3432| ppl: 143.21265| %_neg_is_pos: 0.00566| lr: 0.0| temp: 1.99296 | loss: 1.17229| constrast_loss: 4.61127| div_loss: 0.77895| %_mask_idx: 0.40758| ppl: 141.47182| %_neg_is_pos: 0.00659| lr: 0.0| temp: 1.99296 | loss: 1.17189| constrast_loss: 4.61089| div_loss: 0.76675| %_mask_idx: 0.38612| ppl: 149.27696| %_neg_is_pos: 0.00551| lr: 0.0| temp: 1.99295 | loss: 1.17318| constrast_loss: 4.61377| div_loss: 0.78947| %_mask_idx: 0.35135| ppl: 134.74005| %_neg_is_pos: 0.00556| lr: 0.0| temp: 1.99295 | loss: 1.17172| constrast_loss: 4.60855| div_loss: 0.78309| %_mask_idx: 0.375| ppl: 138.8204| %_neg_is_pos: 0.00989| lr: 0.0| temp: 1.99293 | loss: 1.17263| constrast_loss: 4.61242| div_loss: 0.78092| %_mask_idx: 0.44659| ppl: 140.21112| %_neg_is_pos: 0.00357| lr: 0.0| temp: 1.99293 | loss: 1.17269| constrast_loss: 4.61242| div_loss: 0.78333| %_mask_idx: 0.40711| ppl: 138.66615| %_neg_is_pos: 0.00463| lr: 0.0| temp: 1.99292 | loss: 1.17214| constrast_loss: 4.61157| div_loss: 0.76977| %_mask_idx: 0.3869| ppl: 147.34737| %_neg_is_pos: 0.0042| lr: 0.0| temp: 1.99292 | loss: 1.17199| constrast_loss: 4.61131| div_loss: 0.76633| %_mask_idx: 0.36858| ppl: 149.54929| %_neg_is_pos: 0.00519| lr: 0.0| temp: 1.99291 | loss: 1.17237| constrast_loss: 4.61216| div_loss: 0.77305| %_mask_idx: 0.43515| ppl: 145.24736| %_neg_is_pos: 0.00368| lr: 0.0| temp: 1.99291 | loss: 1.17215| constrast_loss: 4.61114| div_loss: 0.77447| %_mask_idx: 0.39145| ppl: 144.33652| %_neg_is_pos: 0.00479| lr: 0.0| temp: 1.9929 | loss: 1.17241| constrast_loss: 4.61155| div_loss: 0.78089| %_mask_idx: 0.32691| ppl: 140.23206| %_neg_is_pos: 0.00598| lr: 0.0| temp: 1.9929 | loss: 1.17255| constrast_loss: 4.61336| div_loss: 0.76837| %_mask_idx: 0.39333| ppl: 148.24094| %_neg_is_pos: 0.00525| lr: 0.0| temp: 1.99288| loss: 1.17158| constrast_loss: 4.60885| div_loss: 0.77482| %_mask_idx: 0.37108| ppl: 144.11639| %_neg_is_pos: 0.00744| lr: 0.0| temp: 1.99288 | loss: 1.1717| constrast_loss: 4.6104| div_loss: 0.76395| %_mask_idx: 0.32221| ppl: 151.0703| %_neg_is_pos: 0.00692| lr: 0.0| temp: 1.99287 | loss: 1.17175| constrast_loss: 4.60984| div_loss: 0.77154| %_mask_idx: 0.39004| ppl: 146.21428| %_neg_is_pos: 0.00704| lr: 0.0| temp: 1.99287 | loss: 1.17235| constrast_loss: 4.611| div_loss: 0.78379| %_mask_idx: 0.32691| ppl: 138.37677| %_neg_is_pos: 0.0076| lr: 0.0| temp: 1.99286 | loss: 1.17174| constrast_loss: 4.61018| div_loss: 0.76765| %_mask_idx: 0.37328| ppl: 148.70419| %_neg_is_pos: 0.00719| lr: 0.0| temp: 1.99286 | loss: 1.17159| constrast_loss: 4.60951| div_loss: 0.76842| %_mask_idx: 0.38565| ppl: 148.21429| %_neg_is_pos: 0.0055| lr: 0.0| temp: 1.99285 | loss: 1.17176| constrast_loss: 4.60879| div_loss: 0.78229| %_mask_idx: 0.37986| ppl: 139.33362| %_neg_is_pos: 0.00934| lr: 0.0| temp: 1.99285 | loss: 1.17132| constrast_loss: 4.60955| div_loss: 0.7573| %_mask_idx: 0.42043| ppl: 155.32976| %_neg_is_pos: 0.00522| lr: 0.0| temp: 1.99283 | loss: 1.17128| constrast_loss: 4.60875| div_loss: 0.76356| %_mask_idx: 0.40508| ppl: 151.32367| %_neg_is_pos: 0.00601| lr: 0.0| temp: 1.99283 | loss: 1.17175| constrast_loss: 4.60965| div_loss: 0.77333| %_mask_idx: 0.39599| ppl: 145.07184| %_neg_is_pos: 0.00802| lr: 0.0| temp: 1.99282 | loss: 1.17192| constrast_loss: 4.61082| div_loss: 0.76846| %_mask_idx: 0.3703| ppl: 148.18555| %_neg_is_pos: 0.00983| lr: 0.0| temp: 1.99282 | loss: 1.17104| constrast_loss: 4.60848| div_loss: 0.75688| %_mask_idx: 0.34618| ppl: 155.59476| %_neg_is_pos: 0.00687| lr: 0.0| temp: 1.9928 | loss: 1.17182| constrast_loss: 4.60956| div_loss: 0.77738| %_mask_idx: 0.38847| ppl: 142.47783| %_neg_is_pos: 0.00587| lr: 0.0| temp: 1.9928 | loss: 1.17228| constrast_loss: 4.61062| div_loss: 0.78484| %_mask_idx: 0.34038| ppl: 137.70303| %_neg_is_pos: 0.00968| lr: 0.0| temp: 1.99279 | loss: 1.17179| constrast_loss: 4.61044| div_loss: 0.76718| %_mask_idx: 0.39113| ppl: 149.00613| %_neg_is_pos: 0.01029| lr: 0.0| temp: 1.99279 | loss: 1.17144| constrast_loss: 4.6086| div_loss: 0.77147| %_mask_idx: 0.39082| ppl: 146.26114| %_neg_is_pos: 0.00812| lr: 0.0| temp: 1.99278 | loss: 1.17187| constrast_loss: 4.61177| div_loss: 0.75704| %_mask_idx: 0.39897| ppl: 155.495| %_neg_is_pos: 0.00758| lr: 0.0| temp: 1.99278 | loss: 1.17148| constrast_loss: 4.60905| div_loss: 0.76879| %_mask_idx: 0.38017| ppl: 147.97572| %_neg_is_pos: 0.01177| lr: 0.0| temp: 1.99277 | loss: 1.17188| constrast_loss: 4.61156| div_loss: 0.75974| %_mask_idx: 0.40633| ppl: 153.76947| %_neg_is_pos: 0.007| lr: 0.0| temp: 1.99277 | loss: 1.17098| constrast_loss: 4.60683| div_loss: 0.77081| %_mask_idx: 0.38549| ppl: 146.68271| %_neg_is_pos: 0.01186| lr: 0.0| temp: 1.99275 | loss: 1.17161| constrast_loss: 4.60768| div_loss: 0.78762| %_mask_idx: 0.37594| ppl: 135.922| %_neg_is_pos: 0.0099| lr: 0.0| temp: 1.99275 | loss: 1.17164| constrast_loss: 4.60943| div_loss: 0.77125| %_mask_idx: 0.40946| ppl: 146.3996| %_neg_is_pos: 0.00759| lr: 0.0| temp: 1.99274 | loss: 1.17213| constrast_loss: 4.61206| div_loss: 0.7645| %_mask_idx: 0.40946| ppl: 150.71841| %_neg_is_pos: 0.00507| lr: 0.0| temp: 1.99274 | loss: 1.17166| constrast_loss: 4.60944| div_loss: 0.77202| %_mask_idx: 0.36153| ppl: 145.90796| %_neg_is_pos: 0.00605| lr: 0.0| temp: 1.99273 | loss: 1.17215| constrast_loss: 4.61356| div_loss: 0.75039| %_mask_idx: 0.46632| ppl: 159.74805| %_neg_is_pos: 0.00309| lr: 0.0| temp: 1.99273 | loss: 1.17123| constrast_loss: 4.6085| div_loss: 0.7641| %_mask_idx: 0.37735| ppl: 150.97415| %_neg_is_pos: 0.00396| lr: 0.0| temp: 1.99272 | loss: 1.17228| constrast_loss: 4.61202| div_loss: 0.77095| %_mask_idx: 0.39317| ppl: 146.59473| %_neg_is_pos: 0.00468| lr: 0.0| temp: 1.99272 | loss: 1.17144| constrast_loss: 4.61139| div_loss: 0.74366| %_mask_idx: 0.39615| ppl: 164.0564| %_neg_is_pos: 0.00317| lr: 0.0| temp: 1.9927 | loss: 1.17172| constrast_loss: 4.61183| div_loss: 0.75036| %_mask_idx: 0.45473| ppl: 159.76869| %_neg_is_pos: 0.00311| lr: 0.0| temp: 1.9927 | loss: 1.17258| constrast_loss: 4.61253| div_loss: 0.77805| %_mask_idx: 0.42011| ppl: 142.05099| %_neg_is_pos: 0.00608| lr: 0.0| temp: 1.99269 | loss: 1.1714| constrast_loss: 4.60792| div_loss: 0.77663| %_mask_idx: 0.37469| ppl: 142.95366| %_neg_is_pos: 0.00669| lr: 0.0| temp: 1.99269 | loss: 1.17215| constrast_loss: 4.6126| div_loss: 0.76006| %_mask_idx: 0.37531| ppl: 153.56438| %_neg_is_pos: 0.00501| lr: 0.0| temp: 1.99268 | loss: 1.17254| constrast_loss: 4.61353| div_loss: 0.76642| %_mask_idx: 0.39113| ppl: 149.48819| %_neg_is_pos: 0.00887| lr: 0.0| temp: 1.99268 | loss: 1.17154| constrast_loss: 4.60862| div_loss: 0.7756| %_mask_idx: 0.38659| ppl: 143.6152| %_neg_is_pos: 0.00787| lr: 0.0| temp: 1.99267 | loss: 1.17093| constrast_loss: 4.60752| div_loss: 0.76189| %_mask_idx: 0.39364| ppl: 152.38757| %_neg_is_pos: 0.01075| lr: 0.0| temp: 1.99267 | loss: 1.17221| constrast_loss: 4.61287| div_loss: 0.75988| %_mask_idx: 0.33098| ppl: 153.67755| %_neg_is_pos: 0.00662| lr: 0.0| temp: 1.99265 | loss: 1.17138| constrast_loss: 4.60969| div_loss: 0.75813| %_mask_idx: 0.36419| ppl: 154.79901| %_neg_is_pos: 0.00803| lr: 0.0| temp: 1.99265 | loss: 1.17182| constrast_loss: 4.61233| div_loss: 0.74961| %_mask_idx: 0.38299| ppl: 160.25275| %_neg_is_pos: 0.00254| lr: 0.0| temp: 1.99264 | loss: 1.17134| constrast_loss: 4.60732| div_loss: 0.78046| %_mask_idx: 0.37657| ppl: 140.50769| %_neg_is_pos: 0.01138| lr: 0.0| temp: 1.99264 | loss: 1.17169| constrast_loss: 4.61062| div_loss: 0.76132| %_mask_idx: 0.3739| ppl: 152.75516| %_neg_is_pos: 0.00512| lr: 0.0| temp: 1.99262 | loss: 1.1728| constrast_loss: 4.61414| div_loss: 0.77077| %_mask_idx: 0.33506| ppl: 146.70662| %_neg_is_pos: 0.00599| lr: 0.0| temp: 1.99262 | loss: 1.17201| constrast_loss: 4.6111| div_loss: 0.76959| %_mask_idx: 0.41902| ppl: 147.46426| %_neg_is_pos: 0.00466| lr: 0.0| temp: 1.99261 | loss: 1.17133| constrast_loss: 4.60865| div_loss: 0.76688| %_mask_idx: 0.35934| ppl: 149.19791| %_neg_is_pos: 0.00906| lr: 0.0| temp: 1.99261 | loss: 1.17153| constrast_loss: 4.60956| div_loss: 0.76565| %_mask_idx: 0.39536| ppl: 149.98465| %_neg_is_pos: 0.00404| lr: 0.0| temp: 1.9926 | loss: 1.1722| constrast_loss: 4.61125| div_loss: 0.77548| %_mask_idx: 0.3515| ppl: 143.69473| %_neg_is_pos: 0.00974| lr: 0.0| temp: 1.9926 | loss: 1.172| constrast_loss: 4.61208| div_loss: 0.75934| %_mask_idx: 0.36451| ppl: 154.02057| %_neg_is_pos: 0.00266| lr: 0.0| temp: 1.99259 | loss: 1.17177| constrast_loss: 4.60986| div_loss: 0.77204| %_mask_idx: 0.41588| ppl: 145.89682| %_neg_is_pos: 0.00505| lr: 0.0| temp: 1.99259 | loss: 1.1721| constrast_loss: 4.6126| div_loss: 0.75789| %_mask_idx: 0.37641| ppl: 154.95018| %_neg_is_pos: 0.0036| lr: 0.0| temp: 1.99257 | loss: 1.17181| constrast_loss: 4.61041| div_loss: 0.76827| %_mask_idx: 0.3985| ppl: 148.30959| %_neg_is_pos: 0.00846| lr: 0.0| temp: 1.99257 | loss: 1.17012| constrast_loss: 4.6052| div_loss: 0.75293| %_mask_idx: 0.4151| ppl: 158.12688| %_neg_is_pos: 0.00808| lr: 0.0| temp: 1.99256 | loss: 1.17141| constrast_loss: 4.60959| div_loss: 0.76036| %_mask_idx: 0.38299| ppl: 153.36798| %_neg_is_pos: 0.00504| lr: 0.0| temp: 1.99256 | loss: 1.17119| constrast_loss: 4.60887| div_loss: 0.75888| %_mask_idx: 0.36451| ppl: 154.31526| %_neg_is_pos: 0.00609| lr: 0.0| temp: 1.99255 | loss: 1.17196| constrast_loss: 4.60926| div_loss: 0.78583| %_mask_idx: 0.40429| ppl: 137.06841| %_neg_is_pos: 0.01137| lr: 0.0| temp: 1.99255 | loss: 1.17164| constrast_loss: 4.60934| div_loss: 0.77224| %_mask_idx: 0.38001| ppl: 145.76443| %_neg_is_pos: 0.00912| lr: 0.0| temp: 1.99254 | loss: 1.17158| constrast_loss: 4.61026| div_loss: 0.76073| %_mask_idx: 0.39348| ppl: 153.13275| %_neg_is_pos: 0.00531| lr: 0.0| temp: 1.99254 | loss: 1.17215| constrast_loss: 4.61073| div_loss: 0.7785| %_mask_idx: 0.37657| ppl: 141.75981| %_neg_is_pos: 0.00684| lr: 0.0| temp: 1.99252 | loss: 1.1714| constrast_loss: 4.60989| div_loss: 0.75687| %_mask_idx: 0.38612| ppl: 155.60399| %_neg_is_pos: 0.00421| lr: 0.0| temp: 1.99252 | loss: 1.17182| constrast_loss: 4.61059| div_loss: 0.76675| %_mask_idx: 0.3797| ppl: 149.27846| %_neg_is_pos: 0.00849| lr: 0.0| temp: 1.99251 | loss: 1.17133| constrast_loss: 4.60986| div_loss: 0.75478| %_mask_idx: 0.38737| ppl: 156.94281| %_neg_is_pos: 0.00558| lr: 0.0| temp: 1.99251 | loss: 1.17138| constrast_loss: 4.60992| div_loss: 0.75579| %_mask_idx: 0.37986| ppl: 156.29471| %_neg_is_pos: 0.00764| lr: 0.0| temp: 1.9925 | loss: 1.17178| constrast_loss: 4.61004| div_loss: 0.77091| %_mask_idx: 0.40382| ppl: 146.6196| %_neg_is_pos: 0.00602| lr: 0.0| temp: 1.9925 | loss: 1.17192| constrast_loss: 4.61094| div_loss: 0.76761| %_mask_idx: 0.31971| ppl: 148.72679| %_neg_is_pos: 0.00775| lr: 0.0| temp: 1.99249 | loss: 1.17234| constrast_loss: 4.61293| div_loss: 0.76443| %_mask_idx: 0.37939| ppl: 150.76483| %_neg_is_pos: 0.0059| lr: 0.0| temp: 1.99249 | loss: 1.17161| constrast_loss: 4.61034| div_loss: 0.76122| %_mask_idx: 0.37876| ppl: 152.81607| %_neg_is_pos: 0.00693| lr: 0.0| temp: 1.99247 | loss: 1.17196| constrast_loss: 4.61193| div_loss: 0.75891| %_mask_idx: 0.4433| ppl: 154.29599| %_neg_is_pos: 0.00613| lr: 0.0| temp: 1.99247 | loss: 1.17162| constrast_loss: 4.61099| div_loss: 0.75505| %_mask_idx: 0.40602| ppl: 156.77112| %_neg_is_pos: 0.0047| lr: 0.0| temp: 1.99246 | loss: 1.17158| constrast_loss: 4.60858| div_loss: 0.77727| %_mask_idx: 0.37954| ppl: 142.54626| %_neg_is_pos: 0.01233| lr: 0.0| temp: 1.99246 | loss: 1.17148| constrast_loss: 4.60987| div_loss: 0.76035| %_mask_idx: 0.38847| ppl: 153.37454| %_neg_is_pos: 0.00737| lr: 0.0| temp: 1.99244 | loss: 1.17193| constrast_loss: 4.61089| div_loss: 0.76833| %_mask_idx: 0.35182| ppl: 148.26822| %_neg_is_pos: 0.00657| lr: 0.0| temp: 1.99244 | loss: 1.17238| constrast_loss: 4.61278| div_loss: 0.76755| %_mask_idx: 0.40085| ppl: 148.76962| %_neg_is_pos: 0.00508| lr: 0.0| temp: 1.99243 | loss: 1.17162| constrast_loss: 4.61158| div_loss: 0.7488| %_mask_idx: 0.45426| ppl: 160.76968| %_neg_is_pos: 0.00408| lr: 0.0| temp: 1.99243 | loss: 1.1728| constrast_loss: 4.61482| div_loss: 0.76394| %_mask_idx: 0.38377| ppl: 151.07581| %_neg_is_pos: 0.00538| lr: 0.0| temp: 1.99242 | loss: 1.17169| constrast_loss: 4.61041| div_loss: 0.76361| %_mask_idx: 0.39286| ppl: 151.28876| %_neg_is_pos: 0.00665| lr: 0.0| temp: 1.99242 | loss: 1.17203| constrast_loss: 4.61127| div_loss: 0.76849| %_mask_idx: 0.35401| ppl: 148.16446| %_neg_is_pos: 0.00543| lr: 0.0| temp: 1.99241 | loss: 1.17147| constrast_loss: 4.60819| div_loss: 0.77712| %_mask_idx: 0.39474| ppl: 142.64589| %_neg_is_pos: 0.00712| lr: 0.0| temp: 1.99241 | loss: 1.17157| constrast_loss: 4.60943| div_loss: 0.76836| %_mask_idx: 0.37484| ppl: 148.24933| %_neg_is_pos: 0.00991| lr: 0.0| temp: 1.99239 | loss: 1.1723| constrast_loss: 4.6142| div_loss: 0.75016| %_mask_idx: 0.4458| ppl: 159.90015| %_neg_is_pos: 0.00234| lr: 0.0| temp: 1.99239 | loss: 1.17235| constrast_loss: 4.61421| div_loss: 0.75175| %_mask_idx: 0.39928| ppl: 158.87991| %_neg_is_pos: 0.00207| lr: 0.0| temp: 1.99238 | loss: 1.1712| constrast_loss: 4.60866| div_loss: 0.76158| %_mask_idx: 0.38549| ppl: 152.58806| %_neg_is_pos: 0.00913| lr: 0.0| temp: 1.99238 | loss: 1.17185| constrast_loss: 4.61117| div_loss: 0.76245| %_mask_idx: 0.38236| ppl: 152.02927| %_neg_is_pos: 0.00611| lr: 0.0| temp: 1.99237 | loss: 1.17188| constrast_loss: 4.60993| div_loss: 0.77605| %_mask_idx: 0.38268| ppl: 143.32983| %_neg_is_pos: 0.00661| lr: 0.0| temp: 1.99237 | loss: 1.17248| constrast_loss: 4.61054| div_loss: 0.79376| %_mask_idx: 0.37563| ppl: 131.99356| %_neg_is_pos: 0.008| lr: 0.0| temp: 1.99236 | loss: 1.17172| constrast_loss: 4.61194| div_loss: 0.74944| %_mask_idx: 0.34539| ppl: 160.35834| %_neg_is_pos: 0.00684| lr: 0.0| temp: 1.99236 | loss: 1.17169| constrast_loss: 4.6124| div_loss: 0.74371| %_mask_idx: 0.39082| ppl: 164.02612| %_neg_is_pos: 0.00224| lr: 0.0| temp: 1.99234 | loss: 1.17135| constrast_loss: 4.60713| div_loss: 0.78276| %_mask_idx: 0.32315| ppl: 139.03241| %_neg_is_pos: 0.0127| lr: 0.0| temp: 1.99234 | loss: 1.17155| constrast_loss: 4.60914| div_loss: 0.77044| %_mask_idx: 0.34806| ppl: 146.91965| %_neg_is_pos: 0.00619| lr: 0.0| temp: 1.99233 | loss: 1.17135| constrast_loss: 4.61052| div_loss: 0.74862| %_mask_idx: 0.41338| ppl: 160.88025| %_neg_is_pos: 0.00811| lr: 0.0| temp: 1.99233 | loss: 1.17209| constrast_loss: 4.61251| div_loss: 0.75857| %_mask_idx: 0.37813| ppl: 154.51796| %_neg_is_pos: 0.00408| lr: 0.0| temp: 1.99232 | loss: 1.17193| constrast_loss: 4.61356| div_loss: 0.74154| %_mask_idx: 0.39615| ppl: 165.41547| %_neg_is_pos: 0.00362| lr: 0.0| temp: 1.99232 | loss: 1.17192| constrast_loss: 4.61041| div_loss: 0.77282| %_mask_idx: 0.38596| ppl: 145.39458| %_neg_is_pos: 0.00581| lr: 0.0| temp: 1.99231 | loss: 1.17196| constrast_loss: 4.61172| div_loss: 0.76104| %_mask_idx: 0.40147| ppl: 152.93567| %_neg_is_pos: 0.00615| lr: 0.0| temp: 1.99231 | loss: 1.17168| constrast_loss: 4.60735| div_loss: 0.79349| %_mask_idx: 0.38581| ppl: 132.16466| %_neg_is_pos: 0.00895| lr: 0.0| temp: 1.99229 | loss: 1.17226| constrast_loss: 4.61219| div_loss: 0.76852| %_mask_idx: 0.36936| ppl: 148.14481| %_neg_is_pos: 0.00672| lr: 0.0| temp: 1.99229 | loss: 1.1721| constrast_loss: 4.61247| div_loss: 0.75915| %_mask_idx: 0.36654| ppl: 154.14445| %_neg_is_pos: 0.01167| lr: 0.0| temp: 1.99228 | loss: 1.1722| constrast_loss: 4.6119| div_loss: 0.76897| %_mask_idx: 0.41024| ppl: 147.86026| %_neg_is_pos: 0.00569| lr: 0.0| temp: 1.99228 [2021-09-01 22:57:16,123] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0 [2021-09-01 22:57:16,123] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0 | loss: 1.17228| constrast_loss: 4.61218| div_loss: 0.76926| %_mask_idx: 0.4458| ppl: 147.67673| %_neg_is_pos: 0.00664| lr: 0.0| temp: 1.99226 | loss: 1.17251| constrast_loss: 4.6132| div_loss: 0.76852| %_mask_idx: 0.41087| ppl: 148.14914| %_neg_is_pos: 0.00593| lr: 0.0| temp: 1.99226 | loss: 1.17139| constrast_loss: 4.6089| div_loss: 0.76665| %_mask_idx: 0.39066| ppl: 149.3436| %_neg_is_pos: 0.01084| lr: 0.0| temp: 1.99225 | loss: 1.17164| constrast_loss: 4.61077| div_loss: 0.7577| %_mask_idx: 0.41447| ppl: 155.07239| %_neg_is_pos: 0.00634| lr: 0.0| temp: 1.99225 | loss: 1.17136| constrast_loss: 4.60801| div_loss: 0.77429| %_mask_idx: 0.40053| ppl: 144.45496| %_neg_is_pos: 0.01251| lr: 0.0| temp: 1.99225 | loss: 1.17272| constrast_loss: 4.61482| div_loss: 0.76077| %_mask_idx: 0.46507| ppl: 153.10596| %_neg_is_pos: 0.00331| lr: 0.0| temp: 1.99225 | loss: 1.17156| constrast_loss: 4.61146| div_loss: 0.74758| %_mask_idx: 0.37907| ppl: 161.55014| %_neg_is_pos: 0.00393| lr: 0.0| temp: 1.99224 | loss: 1.1719| constrast_loss: 4.61067| div_loss: 0.76917| %_mask_idx: 0.40523| ppl: 147.73135| %_neg_is_pos: 0.00675| lr: 0.0| temp: 1.99224 | loss: 1.17163| constrast_loss: 4.61151| div_loss: 0.75018| %_mask_idx: 0.37265| ppl: 159.88718| %_neg_is_pos: 0.00483| lr: 0.0| temp: 1.99222 | loss: 1.17083| constrast_loss: 4.60501| div_loss: 0.78305| %_mask_idx: 0.40555| ppl: 138.85097| %_neg_is_pos: 0.01337| lr: 0.0| temp: 1.99222 | loss: 1.17151| constrast_loss: 4.61031| div_loss: 0.75737| %_mask_idx: 0.35119| ppl: 155.28453| %_neg_is_pos: 0.01104| lr: 0.0| temp: 1.99221 | loss: 1.17118| constrast_loss: 4.60882| div_loss: 0.75898| %_mask_idx: 0.38362| ppl: 154.25017| %_neg_is_pos: 0.00786| lr: 0.0| temp: 1.99221 | loss: 1.17135| constrast_loss: 4.60933| div_loss: 0.7605| %_mask_idx: 0.37484| ppl: 153.27914| %_neg_is_pos: 0.00456| lr: 0.0| temp: 1.9922 | loss: 1.17022| constrast_loss: 4.60622| div_loss: 0.74649| %_mask_idx: 0.41604| ppl: 162.24913| %_neg_is_pos: 0.00832| lr: 0.0| temp: 1.9922 | loss: 1.17059| constrast_loss: 4.60709| div_loss: 0.75257| %_mask_idx: 0.39975| ppl: 158.3533| %_neg_is_pos: 0.00919| lr: 0.0| temp: 1.99219 | loss: 1.17188| constrast_loss: 4.61171| div_loss: 0.75822| %_mask_idx: 0.36482| ppl: 154.7366| %_neg_is_pos: 0.00648| lr: 0.0| temp: 1.99219 | loss: 1.17136| constrast_loss: 4.61091| div_loss: 0.74521| %_mask_idx: 0.37798| ppl: 163.06744| %_neg_is_pos: 0.00406| lr: 0.0| temp: 1.99217 | loss: 1.17139| constrast_loss: 4.60862| div_loss: 0.76955| %_mask_idx: 0.38878| ppl: 147.48615| %_neg_is_pos: 0.009| lr: 0.0| temp: 1.99217 | loss: 1.17113| constrast_loss: 4.61014| div_loss: 0.74392| %_mask_idx: 0.38158| ppl: 163.8916| %_neg_is_pos: 0.00549| lr: 0.0| temp: 1.99216 | loss: 1.17148| constrast_loss: 4.61266| div_loss: 0.73262| %_mask_idx: 0.41855| ppl: 171.121| %_neg_is_pos: 0.00457| lr: 0.0| temp: 1.99216 | loss: 1.17126| constrast_loss: 4.60913| div_loss: 0.75916| %_mask_idx: 0.3761| ppl: 154.1377| %_neg_is_pos: 0.01141| lr: 0.0| temp: 1.99215 | loss: 1.17106| constrast_loss: 4.60839| div_loss: 0.75855| %_mask_idx: 0.37281| ppl: 154.52997| %_neg_is_pos: 0.00733| lr: 0.0| temp: 1.99215 | loss: 1.1713| constrast_loss: 4.61005| div_loss: 0.75146| %_mask_idx: 0.41244| ppl: 159.06592| %_neg_is_pos: 0.00581| lr: 0.0| temp: 1.99214 | loss: 1.17154| constrast_loss: 4.60997| div_loss: 0.7621| %_mask_idx: 0.39536| ppl: 152.25616| %_neg_is_pos: 0.00494| lr: 0.0| temp: 1.99214 | loss: 1.17037| constrast_loss: 4.60615| div_loss: 0.75336| %_mask_idx: 0.40789| ppl: 157.84784| %_neg_is_pos: 0.00595| lr: 0.0| temp: 1.99212 | loss: 1.17068| constrast_loss: 4.608| div_loss: 0.74705| %_mask_idx: 0.34289| ppl: 161.88591| %_neg_is_pos: 0.00574| lr: 0.0| temp: 1.99212 | loss: 1.17048| constrast_loss: 4.60864| div_loss: 0.73285| %_mask_idx: 0.37907| ppl: 170.97549| %_neg_is_pos: 0.00494| lr: 0.0| temp: 1.99211 | loss: 1.17075| constrast_loss: 4.608| div_loss: 0.75006| %_mask_idx: 0.40993| ppl: 159.95912| %_neg_is_pos: 0.00497| lr: 0.0| temp: 1.99211 | loss: 1.17123| constrast_loss: 4.61042| div_loss: 0.74508| %_mask_idx: 0.39035| ppl: 163.15097| %_neg_is_pos: 0.0067| lr: 0.0| temp: 1.99209 | loss: 1.17077| constrast_loss: 4.60774| div_loss: 0.75342| %_mask_idx: 0.39991| ppl: 157.80832| %_neg_is_pos: 0.00605| lr: 0.0| temp: 1.99209 | loss: 1.17009| constrast_loss: 4.60513| div_loss: 0.7523| %_mask_idx: 0.3407| ppl: 158.52888| %_neg_is_pos: 0.00942| lr: 0.0| temp: 1.99208 | loss: 1.171| constrast_loss: 4.61086| div_loss: 0.73135| %_mask_idx: 0.4187| ppl: 171.93378| %_neg_is_pos: 0.00193| lr: 0.0| temp: 1.99208 | loss: 1.17114| constrast_loss: 4.60982| div_loss: 0.74742| %_mask_idx: 0.39286| ppl: 161.6519| %_neg_is_pos: 0.00447| lr: 0.0| temp: 1.99207 | loss: 1.17086| constrast_loss: 4.60952| div_loss: 0.73917| %_mask_idx: 0.39912| ppl: 166.93388| %_neg_is_pos: 0.00452| lr: 0.0| temp: 1.99207 | loss: 1.17058| constrast_loss: 4.60739| div_loss: 0.74928| %_mask_idx: 0.37218| ppl: 160.46104| %_neg_is_pos: 0.00908| lr: 0.0| temp: 1.99206 | loss: 1.17142| constrast_loss: 4.60932| div_loss: 0.76344| %_mask_idx: 0.38972| ppl: 151.39561| %_neg_is_pos: 0.00453| lr: 0.0| temp: 1.99206 | loss: 1.17096| constrast_loss: 4.61038| div_loss: 0.73452| %_mask_idx: 0.38706| ppl: 169.90933| %_neg_is_pos: 0.00588| lr: 0.0| temp: 1.99204 | loss: 1.16981| constrast_loss: 4.60557| div_loss: 0.73664| %_mask_idx: 0.43296| ppl: 168.54932| %_neg_is_pos: 0.00529| lr: 0.0| temp: 1.99204 | loss: 1.17113| constrast_loss: 4.60978| div_loss: 0.74738| %_mask_idx: 0.37688| ppl: 161.67685| %_neg_is_pos: 0.00544| lr: 0.0| temp: 1.99203 | loss: 1.1711| constrast_loss: 4.60794| div_loss: 0.76477| %_mask_idx: 0.38033| ppl: 150.54663| %_neg_is_pos: 0.00668| lr: 0.0| temp: 1.99203 | loss: 1.17059| constrast_loss: 4.60739| div_loss: 0.74968| %_mask_idx: 0.39709| ppl: 160.20764| %_neg_is_pos: 0.00622| lr: 0.0| temp: 1.99202 | loss: 1.17048| constrast_loss: 4.60599| div_loss: 0.75935| %_mask_idx: 0.37469| ppl: 154.01529| %_neg_is_pos: 0.00957| lr: 0.0| temp: 1.99202 | loss: 1.16984| constrast_loss: 4.60465| div_loss: 0.74697| %_mask_idx: 0.39959| ppl: 161.94019| %_neg_is_pos: 0.00831| lr: 0.0| temp: 1.99201 | loss: 1.17028| constrast_loss: 4.60767| div_loss: 0.73439| %_mask_idx: 0.35965| ppl: 169.98987| %_neg_is_pos: 0.01136| lr: 0.0| temp: 1.99201 | loss: 1.17102| constrast_loss: 4.60858| div_loss: 0.75493| %_mask_idx: 0.3844| ppl: 156.84387| %_neg_is_pos: 0.0087| lr: 0.0| temp: 1.99199 | loss: 1.17074| constrast_loss: 4.60719| div_loss: 0.75753| %_mask_idx: 0.36576| ppl: 155.18124| %_neg_is_pos: 0.00649| lr: 0.0| temp: 1.99199 | loss: 1.17003| constrast_loss: 4.60545| div_loss: 0.74661| %_mask_idx: 0.35777| ppl: 162.17258| %_neg_is_pos: 0.00745| lr: 0.0| temp: 1.99198 | loss: 1.17096| constrast_loss: 4.60889| div_loss: 0.74949| %_mask_idx: 0.40836| ppl: 160.32388| %_neg_is_pos: 0.00493| lr: 0.0| temp: 1.99198 | loss: 1.17039| constrast_loss: 4.605| div_loss: 0.76547| %_mask_idx: 0.39239| ppl: 150.1006| %_neg_is_pos: 0.01017| lr: 0.0| temp: 1.99197 | loss: 1.17079| constrast_loss: 4.607| div_loss: 0.76162| %_mask_idx: 0.41447| ppl: 152.56352| %_neg_is_pos: 0.00541| lr: 0.0| temp: 1.99197 | loss: 1.17123| constrast_loss: 4.60864| div_loss: 0.76279| %_mask_idx: 0.38503| ppl: 151.8114| %_neg_is_pos: 0.00793| lr: 0.0| temp: 1.99196 | loss: 1.17098| constrast_loss: 4.60961| div_loss: 0.74303| %_mask_idx: 0.4234| ppl: 164.46255| %_neg_is_pos: 0.00403| lr: 0.0| temp: 1.99196 | loss: 1.17127| constrast_loss: 4.61071| div_loss: 0.74375| %_mask_idx: 0.40226| ppl: 164.00093| %_neg_is_pos: 0.00398| lr: 0.0| temp: 1.99194 | loss: 1.17067| constrast_loss: 4.60708| div_loss: 0.7561| %_mask_idx: 0.4281| ppl: 156.09764| %_neg_is_pos: 0.00951| lr: 0.0| temp: 1.99194 | loss: 1.17047| constrast_loss: 4.60826| div_loss: 0.73601| %_mask_idx: 0.37657| ppl: 168.95175| %_neg_is_pos: 0.00922| lr: 0.0| temp: 1.99193 | loss: 1.17059| constrast_loss: 4.60615| div_loss: 0.76225| %_mask_idx: 0.35855| ppl: 152.16257| %_neg_is_pos: 0.00942| lr: 0.0| temp: 1.99193 | loss: 1.17094| constrast_loss: 4.60852| div_loss: 0.75222| %_mask_idx: 0.36153| ppl: 158.58092| %_neg_is_pos: 0.00543| lr: 0.0| temp: 1.99191 | loss: 1.17061| constrast_loss: 4.60779| div_loss: 0.74668| %_mask_idx: 0.39254| ppl: 162.12616| %_neg_is_pos: 0.00631| lr: 0.0| temp: 1.99191 | loss: 1.17029| constrast_loss: 4.60599| div_loss: 0.75164| %_mask_idx: 0.32691| ppl: 158.95242| %_neg_is_pos: 0.00812| lr: 0.0| temp: 1.9919 | loss: 1.17065| constrast_loss: 4.60504| div_loss: 0.77556| %_mask_idx: 0.38111| ppl: 143.64389| %_neg_is_pos: 0.01413| lr: 0.0| temp: 1.9919 | loss: 1.17124| constrast_loss: 4.60879| div_loss: 0.76172| %_mask_idx: 0.40758| ppl: 152.50099| %_neg_is_pos: 0.00481| lr: 0.0| temp: 1.99189 | loss: 1.17086| constrast_loss: 4.60809| div_loss: 0.75345| %_mask_idx: 0.33521| ppl: 157.79474| %_neg_is_pos: 0.00662| lr: 0.0| temp: 1.99189 | loss: 1.17091| constrast_loss: 4.60645| div_loss: 0.77192| %_mask_idx: 0.43687| ppl: 145.96945| %_neg_is_pos: 0.01032| lr: 0.0| temp: 1.99188 | loss: 1.17078| constrast_loss: 4.60651| div_loss: 0.7659| %_mask_idx: 0.34477| ppl: 149.82347| %_neg_is_pos: 0.01136| lr: 0.0| temp: 1.99188 | loss: 1.17029| constrast_loss: 4.60724| div_loss: 0.73925| %_mask_idx: 0.36983| ppl: 166.88077| %_neg_is_pos: 0.00866| lr: 0.0| temp: 1.99186 | loss: 1.17162| constrast_loss: 4.61152| div_loss: 0.74964| %_mask_idx: 0.45238| ppl: 160.22894| %_neg_is_pos: 0.00253| lr: 0.0| temp: 1.99186 | loss: 1.17125| constrast_loss: 4.60956| div_loss: 0.75424| %_mask_idx: 0.37672| ppl: 157.28702| %_neg_is_pos: 0.00713| lr: 0.0| temp: 1.99185 | loss: 1.17084| constrast_loss: 4.60881| div_loss: 0.7456| %_mask_idx: 0.41808| ppl: 162.81592| %_neg_is_pos: 0.00418| lr: 0.0| temp: 1.99185 | loss: 1.17068| constrast_loss: 4.6076| div_loss: 0.75098| %_mask_idx: 0.39975| ppl: 159.37579| %_neg_is_pos: 0.00596| lr: 0.0| temp: 1.99184 | loss: 1.1706| constrast_loss: 4.60784| div_loss: 0.74548| %_mask_idx: 0.47744| ppl: 162.89148| %_neg_is_pos: 0.00569| lr: 0.0| temp: 1.99184 | loss: 1.17011| constrast_loss: 4.60401| div_loss: 0.7644| %_mask_idx: 0.39489| ppl: 150.78558| %_neg_is_pos: 0.00771| lr: 0.0| temp: 1.99183 | loss: 1.17123| constrast_loss: 4.61139| div_loss: 0.73521| %_mask_idx: 0.44455| ppl: 169.4649| %_neg_is_pos: 0.00537| lr: 0.0| temp: 1.99183 | loss: 1.17045| constrast_loss: 4.60734| div_loss: 0.7447| %_mask_idx: 0.36544| ppl: 163.3911| %_neg_is_pos: 0.00833| lr: 0.0| temp: 1.99181 | loss: 1.17048| constrast_loss: 4.60653| div_loss: 0.75395| %_mask_idx: 0.34915| ppl: 157.47171| %_neg_is_pos: 0.00689| lr: 0.0| temp: 1.99181 | loss: 1.1708| constrast_loss: 4.60826| div_loss: 0.74958| %_mask_idx: 0.34806| ppl: 160.26877| %_neg_is_pos: 0.00779| lr: 0.0| temp: 1.9918 | loss: 1.17024| constrast_loss: 4.60574| div_loss: 0.75204| %_mask_idx: 0.37014| ppl: 158.69385| %_neg_is_pos: 0.01302| lr: 0.0| temp: 1.9918 | loss: 1.17118| constrast_loss: 4.60927| div_loss: 0.75459| %_mask_idx: 0.40695| ppl: 157.06366| %_neg_is_pos: 0.00598| lr: 0.0| temp: 1.99179 | loss: 1.17153| constrast_loss: 4.61192| div_loss: 0.74205| %_mask_idx: 0.40053| ppl: 165.09003| %_neg_is_pos: 0.00518| lr: 0.0| temp: 1.99179 | loss: 1.17073| constrast_loss: 4.60787| div_loss: 0.75036| %_mask_idx: 0.40977| ppl: 159.76984| %_neg_is_pos: 0.0079| lr: 0.0| temp: 1.99178 | loss: 1.17144| constrast_loss: 4.61194| div_loss: 0.73821| %_mask_idx: 0.41494| ppl: 167.54309| %_neg_is_pos: 0.00588| lr: 0.0| temp: 1.99178 | loss: 1.17106| constrast_loss: 4.60773| div_loss: 0.76505| %_mask_idx: 0.39771| ppl: 150.36868| %_neg_is_pos: 0.00507| lr: 0.0| temp: 1.99176 | loss: 1.17147| constrast_loss: 4.60861| div_loss: 0.77286| %_mask_idx: 0.36952| ppl: 145.36984| %_neg_is_pos: 0.01077| lr: 0.0| temp: 1.99176 | loss: 1.17121| constrast_loss: 4.6092| div_loss: 0.75643| %_mask_idx: 0.40946| ppl: 155.88663| %_neg_is_pos: 0.00708| lr: 0.0| temp: 1.99175 | loss: 1.17104| constrast_loss: 4.60733| div_loss: 0.76831| %_mask_idx: 0.34445| ppl: 148.28011| %_neg_is_pos: 0.01351| lr: 0.0| temp: 1.99175 | loss: 1.17072| constrast_loss: 4.60726| div_loss: 0.75625| %_mask_idx: 0.37751| ppl: 156.00235| %_neg_is_pos: 0.00557| lr: 0.0| temp: 1.99173 | loss: 1.17098| constrast_loss: 4.60806| div_loss: 0.75884| %_mask_idx: 0.37939| ppl: 154.34464| %_neg_is_pos: 0.00712| lr: 0.0| temp: 1.99173 | loss: 1.17052| constrast_loss: 4.60615| div_loss: 0.75936| %_mask_idx: 0.41056| ppl: 154.01221| %_neg_is_pos: 0.0095| lr: 0.0| temp: 1.99172 | loss: 1.17022| constrast_loss: 4.60556| div_loss: 0.75304| %_mask_idx: 0.4151| ppl: 158.05334| %_neg_is_pos: 0.00617| lr: 0.0| temp: 1.99172 | loss: 1.17013| constrast_loss: 4.60689| div_loss: 0.73609| %_mask_idx: 0.42528| ppl: 168.90381| %_neg_is_pos: 0.00611| lr: 0.0| temp: 1.99171 | loss: 1.17089| constrast_loss: 4.60611| div_loss: 0.77439| %_mask_idx: 0.36764| ppl: 144.39189| %_neg_is_pos: 0.00849| lr: 0.0| temp: 1.99171 | loss: 1.17011| constrast_loss: 4.60569| div_loss: 0.74747| %_mask_idx: 0.4032| ppl: 161.6217| %_neg_is_pos: 0.00804| lr: 0.0| temp: 1.9917 | loss: 1.17105| constrast_loss: 4.60822| div_loss: 0.75979| %_mask_idx: 0.3985| ppl: 153.73367| %_neg_is_pos: 0.00618| lr: 0.0| temp: 1.9917 | loss: 1.17115| constrast_loss: 4.61064| div_loss: 0.7396| %_mask_idx: 0.36967| ppl: 166.65878| %_neg_is_pos: 0.00693| lr: 0.0| temp: 1.99168 | loss: 1.17123| constrast_loss: 4.60661| div_loss: 0.78319| %_mask_idx: 0.38628| ppl: 138.75842| %_neg_is_pos: 0.00957| lr: 0.0| temp: 1.99168 | loss: 1.17103| constrast_loss: 4.60767| div_loss: 0.76448| %_mask_idx: 0.39991| ppl: 150.73151| %_neg_is_pos: 0.00684| lr: 0.0| temp: 1.99167 | loss: 1.17093| constrast_loss: 4.60867| div_loss: 0.75036| %_mask_idx: 0.35902| ppl: 159.7706| %_neg_is_pos: 0.00864| lr: 0.0| temp: 1.99167 | loss: 1.17155| constrast_loss: 4.61287| div_loss: 0.73327| %_mask_idx: 0.37939| ppl: 170.70477| %_neg_is_pos: 0.00377| lr: 0.0| temp: 1.99166 | loss: 1.17047| constrast_loss: 4.60468| div_loss: 0.77206| %_mask_idx: 0.37343| ppl: 145.87936| %_neg_is_pos: 0.01048| lr: 0.0| temp: 1.99166 | loss: 1.17112| constrast_loss: 4.60956| div_loss: 0.74915| %_mask_idx: 0.43061| ppl: 160.54205| %_neg_is_pos: 0.00517| lr: 0.0| temp: 1.99165 | loss: 1.17145| constrast_loss: 4.61277| div_loss: 0.73028| %_mask_idx: 0.37594| ppl: 172.61792| %_neg_is_pos: 0.00509| lr: 0.0| temp: 1.99165 | loss: 1.17115| constrast_loss: 4.60813| div_loss: 0.76486| %_mask_idx: 0.40523| ppl: 150.48856| %_neg_is_pos: 0.00732| lr: 0.0| temp: 1.99163 | loss: 1.1706| constrast_loss: 4.60797| div_loss: 0.74425| %_mask_idx: 0.39881| ppl: 163.67992| %_neg_is_pos: 0.00779| lr: 0.0| temp: 1.99163 | loss: 1.17122| constrast_loss: 4.60817| div_loss: 0.76712| %_mask_idx: 0.31814| ppl: 149.04111| %_neg_is_pos: 0.01253| lr: 0.0| temp: 1.99162 | loss: 1.17075| constrast_loss: 4.60637| div_loss: 0.76619| %_mask_idx: 0.34085| ppl: 149.63571| %_neg_is_pos: 0.01306| lr: 0.0| temp: 1.99162 | loss: 1.17047| constrast_loss: 4.6059| div_loss: 0.75965| %_mask_idx: 0.38659| ppl: 153.8259| %_neg_is_pos: 0.0092| lr: 0.0| temp: 1.99161 | loss: 1.17057| constrast_loss: 4.6077| div_loss: 0.74575| %_mask_idx: 0.43719| ppl: 162.7204| %_neg_is_pos: 0.00648| lr: 0.0| temp: 1.99161 | loss: 1.17102| constrast_loss: 4.60909| div_loss: 0.74979| %_mask_idx: 0.41009| ppl: 160.13278| %_neg_is_pos: 0.00628| lr: 0.0| temp: 1.9916 | loss: 1.17127| constrast_loss: 4.61023| div_loss: 0.74851| %_mask_idx: 0.38753| ppl: 160.95596| %_neg_is_pos: 0.00626| lr: 0.0| temp: 1.9916 | loss: 1.17118| constrast_loss: 4.6092| div_loss: 0.75536| %_mask_idx: 0.38362| ppl: 156.5665| %_neg_is_pos: 0.00725| lr: 0.0| temp: 1.99158 | loss: 1.16994| constrast_loss: 4.60589| div_loss: 0.73885| %_mask_idx: 0.39583| ppl: 167.13437| %_neg_is_pos: 0.00521| lr: 0.0| temp: 1.99158 | loss: 1.17037| constrast_loss: 4.60681| div_loss: 0.74669| %_mask_idx: 0.38456| ppl: 162.12137| %_neg_is_pos: 0.00702| lr: 0.0| temp: 1.99157 | loss: 1.17112| constrast_loss: 4.60912| div_loss: 0.75363| %_mask_idx: 0.38487| ppl: 157.67583| %_neg_is_pos: 0.00454| lr: 0.0| temp: 1.99157 [2021-09-01 23:06:30,654] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 4096.0, reducing to 2048.0 [2021-09-01 23:06:30,654] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 4096.0, reducing to 2048.0 | loss: 1.17015| constrast_loss: 4.60412| div_loss: 0.76471| %_mask_idx: 0.3844| ppl: 150.58267| %_neg_is_pos: 0.01127| lr: 0.0| temp: 1.99155 | loss: 1.17048| constrast_loss: 4.60569| div_loss: 0.76238| %_mask_idx: 0.37704| ppl: 152.07889| %_neg_is_pos: 0.0069| lr: 0.0| temp: 1.99155 | loss: 1.17113| constrast_loss: 4.61007| div_loss: 0.74446| %_mask_idx: 0.36952| ppl: 163.54269| %_neg_is_pos: 0.00603| lr: 0.0| temp: 1.99154 | loss: 1.17108| constrast_loss: 4.60818| div_loss: 0.7614| %_mask_idx: 0.39756| ppl: 152.70212| %_neg_is_pos: 0.00939| lr: 0.0| temp: 1.99154 | loss: 1.17107| constrast_loss: 4.60731| div_loss: 0.76984| %_mask_idx: 0.44126| ppl: 147.3028| %_neg_is_pos: 0.00424| lr: 0.0| temp: 1.99153 | loss: 1.16985| constrast_loss: 4.60434| div_loss: 0.75061| %_mask_idx: 0.34821| ppl: 159.60901| %_neg_is_pos: 0.00836| lr: 0.0| temp: 1.99153 | loss: 1.17023| constrast_loss: 4.60558| div_loss: 0.7536| %_mask_idx: 0.39724| ppl: 157.69876| %_neg_is_pos: 0.00962| lr: 0.0| temp: 1.99152 | loss: 1.17078| constrast_loss: 4.6072| div_loss: 0.75927| %_mask_idx: 0.41447| ppl: 154.06465| %_neg_is_pos: 0.00829| lr: 0.0| temp: 1.99152 | loss: 1.1701| constrast_loss: 4.60501| div_loss: 0.75374| %_mask_idx: 0.39975| ppl: 157.6046| %_neg_is_pos: 0.00768| lr: 0.0| temp: 1.9915 | loss: 1.1712| constrast_loss: 4.60922| div_loss: 0.75591| %_mask_idx: 0.36607| ppl: 156.22008| %_neg_is_pos: 0.01072| lr: 0.0| temp: 1.9915 | loss: 1.17132| constrast_loss: 4.61045| div_loss: 0.74832| %_mask_idx: 0.33474| ppl: 161.07501| %_neg_is_pos: 0.00547| lr: 0.0| temp: 1.99149 | loss: 1.17048| constrast_loss: 4.60709| div_loss: 0.74825| %_mask_idx: 0.37046| ppl: 161.11908| %_neg_is_pos: 0.01| lr: 0.0| temp: 1.99149 | loss: 1.17027| constrast_loss: 4.60538| div_loss: 0.75691| %_mask_idx: 0.40461| ppl: 155.57742| %_neg_is_pos: 0.01176| lr: 0.0| temp: 1.99148 | loss: 1.17092| constrast_loss: 4.60797| div_loss: 0.75723| %_mask_idx: 0.35276| ppl: 155.37131| %_neg_is_pos: 0.00837| lr: 0.0| temp: 1.99148 | loss: 1.16998| constrast_loss: 4.60642| div_loss: 0.73514| %_mask_idx: 0.41745| ppl: 169.50775| %_neg_is_pos: 0.00693| lr: 0.0| temp: 1.99147 | loss: 1.17031| constrast_loss: 4.60617| div_loss: 0.75065| %_mask_idx: 0.40179| ppl: 159.58139| %_neg_is_pos: 0.01105| lr: 0.0| temp: 1.99147 | loss: 1.16992| constrast_loss: 4.60712| div_loss: 0.7257| %_mask_idx: 0.41024| ppl: 175.54913| %_neg_is_pos: 0.00647| lr: 0.0| temp: 1.99145 | loss: 1.16963| constrast_loss: 4.60402| div_loss: 0.74494| %_mask_idx: 0.38518| ppl: 163.23929| %_neg_is_pos: 0.01254| lr: 0.0| temp: 1.99145 | loss: 1.16991| constrast_loss: 4.60401| div_loss: 0.75646| %_mask_idx: 0.3761| ppl: 155.86819| %_neg_is_pos: 0.01504| lr: 0.0| temp: 1.99144 | loss: 1.1694| constrast_loss: 4.60229| div_loss: 0.75306| %_mask_idx: 0.38174| ppl: 158.04102| %_neg_is_pos: 0.01068| lr: 0.0| temp: 1.99144 | loss: 1.17016| constrast_loss: 4.60495| div_loss: 0.75681| %_mask_idx: 0.44392| ppl: 155.64441| %_neg_is_pos: 0.00681| lr: 0.0| temp: 1.99143 | loss: 1.16972| constrast_loss: 4.60494| div_loss: 0.73941| %_mask_idx: 0.31579| ppl: 166.77747| %_neg_is_pos: 0.01181| lr: 0.0| temp: 1.99143 | loss: 1.16968| constrast_loss: 4.60605| div_loss: 0.72677| %_mask_idx: 0.41776| ppl: 174.86757| %_neg_is_pos: 0.00586| lr: 0.0| temp: 1.99142 | loss: 1.17017| constrast_loss: 4.6076| div_loss: 0.73095| %_mask_idx: 0.38111| ppl: 172.19147| %_neg_is_pos: 0.00463| lr: 0.0| temp: 1.99142 | loss: 1.16746| constrast_loss: 4.59588| div_loss: 0.7395| %_mask_idx: 0.37751| ppl: 166.7193| %_neg_is_pos: 0.01242| lr: 0.0| temp: 1.9914 | loss: 1.16787| constrast_loss: 4.59863| div_loss: 0.72839| %_mask_idx: 0.36544| ppl: 173.82755| %_neg_is_pos: 0.00828| lr: 0.0| temp: 1.9914 | loss: 1.16878| constrast_loss: 4.60092| div_loss: 0.74188| %_mask_idx: 0.35166| ppl: 165.19885| %_neg_is_pos: 0.01029| lr: 0.0| temp: 1.99139 | loss: 1.16955| constrast_loss: 4.60469| div_loss: 0.73532| %_mask_idx: 0.39442| ppl: 169.39453| %_neg_is_pos: 0.0076| lr: 0.0| temp: 1.99139 | loss: 1.16892| constrast_loss: 4.60311| div_loss: 0.72557| %_mask_idx: 0.35307| ppl: 175.63666| %_neg_is_pos: 0.0106| lr: 0.0| temp: 1.99137 | loss: 1.16911| constrast_loss: 4.60198| div_loss: 0.7446| %_mask_idx: 0.38831| ppl: 163.45609| %_neg_is_pos: 0.00901| lr: 0.0| temp: 1.99137 | loss: 1.16848| constrast_loss: 4.59914| div_loss: 0.74791| %_mask_idx: 0.32644| ppl: 161.33894| %_neg_is_pos: 0.01086| lr: 0.0| temp: 1.99136 | loss: 1.17019| constrast_loss: 4.60693| div_loss: 0.73837| %_mask_idx: 0.40946| ppl: 167.44125| %_neg_is_pos: 0.00753| lr: 0.0| temp: 1.99136 | loss: 1.1699| constrast_loss: 4.60346| div_loss: 0.7613| %_mask_idx: 0.43123| ppl: 152.76617| %_neg_is_pos: 0.0076| lr: 0.0| temp: 1.99135 | loss: 1.16842| constrast_loss: 4.5985| div_loss: 0.75175| %_mask_idx: 0.40946| ppl: 158.88239| %_neg_is_pos: 0.01099| lr: 0.0| temp: 1.99135 | loss: 1.16909| constrast_loss: 4.60177| div_loss: 0.74601| %_mask_idx: 0.41244| ppl: 162.55627| %_neg_is_pos: 0.00697| lr: 0.0| temp: 1.99134 | loss: 1.16759| constrast_loss: 4.59399| div_loss: 0.76392| %_mask_idx: 0.33991| ppl: 151.09175| %_neg_is_pos: 0.01494| lr: 0.0| temp: 1.99134 | loss: 1.1693| constrast_loss: 4.60126| div_loss: 0.7592| %_mask_idx: 0.3739| ppl: 154.11075| %_neg_is_pos: 0.01403| lr: 0.0| temp: 1.99132 | loss: 1.1694| constrast_loss: 4.6035| div_loss: 0.74087| %_mask_idx: 0.34414| ppl: 165.84076| %_neg_is_pos: 0.0088| lr: 0.0| temp: 1.99132 | loss: 1.16881| constrast_loss: 4.59953| div_loss: 0.7572| %_mask_idx: 0.37296| ppl: 155.39023| %_neg_is_pos: 0.00848| lr: 0.0| temp: 1.99131 | loss: 1.1689| constrast_loss: 4.6031| div_loss: 0.72487| %_mask_idx: 0.401| ppl: 176.08337| %_neg_is_pos: 0.00726| lr: 0.0| temp: 1.99131 | loss: 1.16932| constrast_loss: 4.60294| div_loss: 0.74352| %_mask_idx: 0.36435| ppl: 164.14789| %_neg_is_pos: 0.00784| lr: 0.0| temp: 1.9913 | loss: 1.16882| constrast_loss: 4.60192| div_loss: 0.73377| %_mask_idx: 0.36231| ppl: 170.38762| %_neg_is_pos: 0.01013| lr: 0.0| temp: 1.9913 | loss: 1.16927| constrast_loss: 4.60074| div_loss: 0.76317| %_mask_idx: 0.30279| ppl: 151.57268| %_neg_is_pos: 0.01328| lr: 0.0| temp: 1.99129 | loss: 1.1685| constrast_loss: 4.59759| div_loss: 0.76388| %_mask_idx: 0.40805| ppl: 151.11459| %_neg_is_pos: 0.00686| lr: 0.0| temp: 1.99129 | loss: 1.16948| constrast_loss: 4.60344| div_loss: 0.74484| %_mask_idx: 0.35652| ppl: 163.30374| %_neg_is_pos: 0.01151| lr: 0.0| temp: 1.99127 | loss: 1.16941| constrast_loss: 4.6039| div_loss: 0.73737| %_mask_idx: 0.40038| ppl: 168.08501| %_neg_is_pos: 0.01193| lr: 0.0| temp: 1.99127 | loss: 1.16785| constrast_loss: 4.59856| div_loss: 0.72848| %_mask_idx: 0.35777| ppl: 173.77383| %_neg_is_pos: 0.01036| lr: 0.0| temp: 1.99126 | loss: 1.16971| constrast_loss: 4.6054| div_loss: 0.7344| %_mask_idx: 0.40445| ppl: 169.98611| %_neg_is_pos: 0.00871| lr: 0.0| temp: 1.99126 | loss: 1.16903| constrast_loss: 4.60262| div_loss: 0.73492| %_mask_idx: 0.42998| ppl: 169.64816| %_neg_is_pos: 0.00786| lr: 0.0| temp: 1.99125 | loss: 1.16832| constrast_loss: 4.59922| div_loss: 0.74047| %_mask_idx: 0.38972| ppl: 166.09955| %_neg_is_pos: 0.01205| lr: 0.0| temp: 1.99125 | loss: 1.16948| constrast_loss: 4.60355| div_loss: 0.74379| %_mask_idx: 0.3562| ppl: 163.97571| %_neg_is_pos: 0.01638| lr: 0.0| temp: 1.99124 | loss: 1.1705| constrast_loss: 4.60958| div_loss: 0.72435| %_mask_idx: 0.4104| ppl: 176.41414| %_neg_is_pos: 0.00863| lr: 0.0| temp: 1.99124 | loss: 1.16943| constrast_loss: 4.60381| div_loss: 0.73895| %_mask_idx: 0.39019| ppl: 167.07306| %_neg_is_pos: 0.00762| lr: 0.0| temp: 1.99122 | loss: 1.16808| constrast_loss: 4.59797| div_loss: 0.74369| %_mask_idx: 0.3938| ppl: 164.03909| %_neg_is_pos: 0.00875| lr: 0.0| temp: 1.99122 | loss: 1.16895| constrast_loss: 4.60125| div_loss: 0.74553| %_mask_idx: 0.3573| ppl: 162.8588| %_neg_is_pos: 0.01089| lr: 0.0| temp: 1.99121 | loss: 1.16865| constrast_loss: 4.59937| div_loss: 0.75226| %_mask_idx: 0.42293| ppl: 158.55414| %_neg_is_pos: 0.01291| lr: 0.0| temp: 1.99121 | loss: 1.16865| constrast_loss: 4.59962| div_loss: 0.74972| %_mask_idx: 0.34962| ppl: 160.17767| %_neg_is_pos: 0.01147| lr: 0.0| temp: 1.99119 | loss: 1.16916| constrast_loss: 4.59955| div_loss: 0.77092| %_mask_idx: 0.36607| ppl: 146.6107| %_neg_is_pos: 0.01485| lr: 0.0| temp: 1.99119 | loss: 1.16967| constrast_loss: 4.60547| div_loss: 0.73228| %_mask_idx: 0.34837| ppl: 171.33932| %_neg_is_pos: 0.00794| lr: 0.0| temp: 1.99118 | loss: 1.16951| constrast_loss: 4.60307| div_loss: 0.74986| %_mask_idx: 0.41103| ppl: 160.09151| %_neg_is_pos: 0.0122| lr: 0.0| temp: 1.99118 | loss: 1.17021| constrast_loss: 4.60667| div_loss: 0.74167| %_mask_idx: 0.39756| ppl: 165.32996| %_neg_is_pos: 0.00876| lr: 0.0| temp: 1.99117 | loss: 1.169| constrast_loss: 4.601| div_loss: 0.75018| %_mask_idx: 0.38283| ppl: 159.88248| %_neg_is_pos: 0.00618| lr: 0.0| temp: 1.99117 | loss: 1.16955| constrast_loss: 4.60582| div_loss: 0.724| %_mask_idx: 0.37578| ppl: 176.64249| %_neg_is_pos: 0.00534| lr: 0.0| temp: 1.99116 | loss: 1.16833| constrast_loss: 4.5977| div_loss: 0.75624| %_mask_idx: 0.38017| ppl: 156.00656| %_neg_is_pos: 0.00816| lr: 0.0| temp: 1.99116 | loss: 1.16814| constrast_loss: 4.5977| div_loss: 0.74875| %_mask_idx: 0.40304| ppl: 160.80101| %_neg_is_pos: 0.00654| lr: 0.0| temp: 1.99114 | loss: 1.16952| constrast_loss: 4.60346| div_loss: 0.74632| %_mask_idx: 0.37343| ppl: 162.35341| %_neg_is_pos: 0.0088| lr: 0.0| temp: 1.99114 | loss: 1.1697| constrast_loss: 4.60394| div_loss: 0.74839| %_mask_idx: 0.37954| ppl: 161.02768| %_neg_is_pos: 0.0098| lr: 0.0| temp: 1.99113 | loss: 1.16891| constrast_loss: 4.60179| div_loss: 0.73836| %_mask_idx: 0.40821| ppl: 167.44858| %_neg_is_pos: 0.01067| lr: 0.0| temp: 1.99113 | loss: 1.16763| constrast_loss: 4.59546| div_loss: 0.75047| %_mask_idx: 0.37907| ppl: 159.70013| %_neg_is_pos: 0.01441| lr: 0.0| temp: 1.99112 | loss: 1.16971| constrast_loss: 4.60701| div_loss: 0.71848| %_mask_idx: 0.3609| ppl: 180.17096| %_neg_is_pos: 0.00591| lr: 0.0| temp: 1.99112 | loss: 1.16984| constrast_loss: 4.60621| div_loss: 0.73156| %_mask_idx: 0.39489| ppl: 171.80054| %_neg_is_pos: 0.0061| lr: 0.0| temp: 1.99111 | loss: 1.16901| constrast_loss: 4.60247| div_loss: 0.7355| %_mask_idx: 0.33224| ppl: 169.27786| %_neg_is_pos: 0.00931| lr: 0.0| temp: 1.99111 | loss: 1.16896| constrast_loss: 4.60146| div_loss: 0.74375| %_mask_idx: 0.39239| ppl: 164.00175| %_neg_is_pos: 0.00652| lr: 0.0| temp: 1.99109 | loss: 1.17038| constrast_loss: 4.6087| div_loss: 0.72834| %_mask_idx: 0.36451| ppl: 173.86011| %_neg_is_pos: 0.00942| lr: 0.0| temp: 1.99109 | loss: 1.16893| constrast_loss: 4.60187| div_loss: 0.73835| %_mask_idx: 0.37108| ppl: 167.45581| %_neg_is_pos: 0.01151| lr: 0.0| temp: 1.99108 | loss: 1.16868| constrast_loss: 4.60118| div_loss: 0.73533| %_mask_idx: 0.37061| ppl: 169.388| %_neg_is_pos: 0.00776| lr: 0.0| temp: 1.99108 | loss: 1.16789| constrast_loss: 4.59581| div_loss: 0.75757| %_mask_idx: 0.38127| ppl: 155.15341| %_neg_is_pos: 0.01239| lr: 0.0| temp: 1.99107 | loss: 1.16952| constrast_loss: 4.6026| div_loss: 0.75471| %_mask_idx: 0.37594| ppl: 156.98473| %_neg_is_pos: 0.01379| lr: 0.0| temp: 1.99107 | loss: 1.16952| constrast_loss: 4.60307| div_loss: 0.74999| %_mask_idx: 0.40398| ppl: 160.0038| %_neg_is_pos: 0.01017| lr: 0.0| temp: 1.99106 | loss: 1.16925| constrast_loss: 4.60313| div_loss: 0.73889| %_mask_idx: 0.37986| ppl: 167.11243| %_neg_is_pos: 0.00938| lr: 0.0| temp: 1.99106 | loss: 1.16986| constrast_loss: 4.60691| div_loss: 0.72524| %_mask_idx: 0.39568| ppl: 175.84555| %_neg_is_pos: 0.00797| lr: 0.0| temp: 1.99104 | loss: 1.17028| constrast_loss: 4.60922| div_loss: 0.71913| %_mask_idx: 0.39254| ppl: 179.75931| %_neg_is_pos: 0.00717| lr: 0.0| temp: 1.99104 | loss: 1.16957| constrast_loss: 4.60475| div_loss: 0.73538| %_mask_idx: 0.40461| ppl: 169.35992| %_neg_is_pos: 0.00654| lr: 0.0| temp: 1.99103 | loss: 1.16832| constrast_loss: 4.59672| div_loss: 0.76564| %_mask_idx: 0.40257| ppl: 149.99118| %_neg_is_pos: 0.01123| lr: 0.0| temp: 1.99103 | loss: 1.16924| constrast_loss: 4.60221| div_loss: 0.74762| %_mask_idx: 0.40241| ppl: 161.52411| %_neg_is_pos: 0.00764| lr: 0.0| temp: 1.99101 | loss: 1.1693| constrast_loss: 4.6043| div_loss: 0.72876| %_mask_idx: 0.40476| ppl: 173.59085| %_neg_is_pos: 0.0048| lr: 0.0| temp: 1.99101 | loss: 1.16923| constrast_loss: 4.60231| div_loss: 0.74621| %_mask_idx: 0.3656| ppl: 162.42311| %_neg_is_pos: 0.00716| lr: 0.0| temp: 1.991 | loss: 1.1677| constrast_loss: 4.59534| div_loss: 0.75462| %_mask_idx: 0.33239| ppl: 157.04337| %_neg_is_pos: 0.01724| lr: 0.0| temp: 1.991 | loss: 1.168| constrast_loss: 4.59616| div_loss: 0.75826| %_mask_idx: 0.43296| ppl: 154.71323| %_neg_is_pos: 0.0081| lr: 0.0| temp: 1.99099 | loss: 1.16937| constrast_loss: 4.60446| div_loss: 0.73022| %_mask_idx: 0.38503| ppl: 172.66132| %_neg_is_pos: 0.00693| lr: 0.0| temp: 1.99099 | loss: 1.17005| constrast_loss: 4.60652| div_loss: 0.73688| %_mask_idx: 0.36795| ppl: 168.39421| %_neg_is_pos: 0.00482| lr: 0.0| temp: 1.99098 | loss: 1.16932| constrast_loss: 4.60353| div_loss: 0.73747| %_mask_idx: 0.39192| ppl: 168.0224| %_neg_is_pos: 0.00782| lr: 0.0| temp: 1.99098 | loss: 1.16932| constrast_loss: 4.60314| div_loss: 0.74122| %_mask_idx: 0.43061| ppl: 165.62219| %_neg_is_pos: 0.00883| lr: 0.0| temp: 1.99096 | loss: 1.16911| constrast_loss: 4.60215| div_loss: 0.74271| %_mask_idx: 0.37657| ppl: 164.66312| %_neg_is_pos: 0.0078| lr: 0.0| temp: 1.99096 | loss: 1.16894| constrast_loss: 4.60333| div_loss: 0.72443| %_mask_idx: 0.42669| ppl: 176.36356| %_neg_is_pos: 0.00778| lr: 0.0| temp: 1.99095 | loss: 1.16997| constrast_loss: 4.60669| div_loss: 0.73176| %_mask_idx: 0.41729| ppl: 171.67438| %_neg_is_pos: 0.00602| lr: 0.0| temp: 1.99095 | loss: 1.16971| constrast_loss: 4.60693| div_loss: 0.71916| %_mask_idx: 0.3963| ppl: 179.73761| %_neg_is_pos: 0.00345| lr: 0.0| temp: 1.99094 | loss: 1.16853| constrast_loss: 4.60053| div_loss: 0.73582| %_mask_idx: 0.37234| ppl: 169.07301| %_neg_is_pos: 0.01367| lr: 0.0| temp: 1.99094 | loss: 1.16836| constrast_loss: 4.59835| div_loss: 0.75072| %_mask_idx: 0.39474| ppl: 159.54123| %_neg_is_pos: 0.01134| lr: 0.0| temp: 1.99093 | loss: 1.16903| constrast_loss: 4.60162| div_loss: 0.74487| %_mask_idx: 0.40695| ppl: 163.28183| %_neg_is_pos: 0.00827| lr: 0.0| temp: 1.99093 | loss: 1.16854| constrast_loss: 4.59873| div_loss: 0.75446| %_mask_idx: 0.38957| ppl: 157.14429| %_neg_is_pos: 0.0081| lr: 0.0| temp: 1.99091 | loss: 1.1694| constrast_loss: 4.60444| div_loss: 0.73147| %_mask_idx: 0.39865| ppl: 171.85892| %_neg_is_pos: 0.00799| lr: 0.0| temp: 1.99091 | loss: 1.16997| constrast_loss: 4.605| div_loss: 0.74863| %_mask_idx: 0.38236| ppl: 160.87366| %_neg_is_pos: 0.00906| lr: 0.0| temp: 1.9909 | loss: 1.16855| constrast_loss: 4.60019| div_loss: 0.73998| %_mask_idx: 0.37406| ppl: 166.4108| %_neg_is_pos: 0.01444| lr: 0.0| temp: 1.9909 | loss: 1.169| constrast_loss: 4.59979| div_loss: 0.7619| %_mask_idx: 0.33866| ppl: 152.38155| %_neg_is_pos: 0.01668| lr: 0.0| temp: 1.99089 | loss: 1.16995| constrast_loss: 4.6066| div_loss: 0.73186| %_mask_idx: 0.401| ppl: 171.60864| %_neg_is_pos: 0.00673| lr: 0.0| temp: 1.99089 | loss: 1.16896| constrast_loss: 4.60326| div_loss: 0.72565| %_mask_idx: 0.38205| ppl: 175.58411| %_neg_is_pos: 0.00767| lr: 0.0| temp: 1.99088 | loss: 1.16751| constrast_loss: 4.59494| div_loss: 0.75114| %_mask_idx: 0.36999| ppl: 159.27345| %_neg_is_pos: 0.01663| lr: 0.0| temp: 1.99088 | loss: 1.16886| constrast_loss: 4.59961| div_loss: 0.75841| %_mask_idx: 0.37923| ppl: 154.61612| %_neg_is_pos: 0.0103| lr: 0.0| temp: 1.99086 | loss: 1.17002| constrast_loss: 4.60684| div_loss: 0.73252| %_mask_idx: 0.41259| ppl: 171.18616| %_neg_is_pos: 0.00515| lr: 0.0| temp: 1.99086 | loss: 1.17063| constrast_loss: 4.60871| div_loss: 0.73831| %_mask_idx: 0.41964| ppl: 167.48193| %_neg_is_pos: 0.0062| lr: 0.0| temp: 1.99085 | loss: 1.16865| constrast_loss: 4.60176| div_loss: 0.72841| %_mask_idx: 0.35949| ppl: 173.81802| %_neg_is_pos: 0.00866| lr: 0.0| temp: 1.99085 [2021-09-01 23:15:43,682] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 2048.0, reducing to 1024.0 [2021-09-01 23:15:43,682] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 2048.0, reducing to 1024.0 | loss: 1.16922| constrast_loss: 4.60382| div_loss: 0.73055| %_mask_idx: 0.37986| ppl: 172.45081| %_neg_is_pos: 0.01019| lr: 0.0| temp: 1.99083 | loss: 1.16965| constrast_loss: 4.60646| div_loss: 0.72146| %_mask_idx: 0.41635| ppl: 178.26736| %_neg_is_pos: 0.00847| lr: 0.0| temp: 1.99083 | loss: 1.16985| constrast_loss: 4.60558| div_loss: 0.73827| %_mask_idx: 0.41823| ppl: 167.50926| %_neg_is_pos: 0.00708| lr: 0.0| temp: 1.99082 | loss: 1.16926| constrast_loss: 4.60215| div_loss: 0.74903| %_mask_idx: 0.43499| ppl: 160.62335| %_neg_is_pos: 0.0051| lr: 0.0| temp: 1.99082 | loss: 1.16991| constrast_loss: 4.60465| div_loss: 0.75002| %_mask_idx: 0.35887| ppl: 159.9863| %_neg_is_pos: 0.00892| lr: 0.0| temp: 1.99081 | loss: 1.16721| constrast_loss: 4.59134| div_loss: 0.77485| %_mask_idx: 0.36451| ppl: 144.09909| %_neg_is_pos: 0.0165| lr: 0.0| temp: 1.99081 | loss: 1.16939| constrast_loss: 4.603| div_loss: 0.74566| %_mask_idx: 0.43045| ppl: 162.77847| %_neg_is_pos: 0.009| lr: 0.0| temp: 1.9908 | loss: 1.17052| constrast_loss: 4.60968| div_loss: 0.72388| %_mask_idx: 0.42685| ppl: 176.71762| %_neg_is_pos: 0.00455| lr: 0.0| temp: 1.9908 | loss: 1.16826| constrast_loss: 4.5995| div_loss: 0.73526| %_mask_idx: 0.37672| ppl: 169.43396| %_neg_is_pos: 0.00939| lr: 0.0| temp: 1.99078 | loss: 1.16823| constrast_loss: 4.59699| div_loss: 0.75922| %_mask_idx: 0.42732| ppl: 154.09668| %_neg_is_pos: 0.00944| lr: 0.0| temp: 1.99078 | loss: 1.16924| constrast_loss: 4.603| div_loss: 0.73963| %_mask_idx: 0.38878| ppl: 166.63858| %_neg_is_pos: 0.00886| lr: 0.0| temp: 1.99077 | loss: 1.16832| constrast_loss: 4.59985| div_loss: 0.73438| %_mask_idx: 0.38503| ppl: 169.99789| %_neg_is_pos: 0.01372| lr: 0.0| temp: 1.99077 | loss: 1.16873| constrast_loss: 4.60297| div_loss: 0.71937| %_mask_idx: 0.39301| ppl: 179.60194| %_neg_is_pos: 0.00613| lr: 0.0| temp: 1.99076 | loss: 1.1698| constrast_loss: 4.60486| div_loss: 0.7435| %_mask_idx: 0.34774| ppl: 164.1608| %_neg_is_pos: 0.01055| lr: 0.0| temp: 1.99076 | loss: 1.16973| constrast_loss: 4.60571| div_loss: 0.73202| %_mask_idx: 0.41651| ppl: 171.51009| %_neg_is_pos: 0.00573| lr: 0.0| temp: 1.99075 | loss: 1.16717| constrast_loss: 4.59543| div_loss: 0.73232| %_mask_idx: 0.40351| ppl: 171.31558| %_neg_is_pos: 0.01263| lr: 0.0| temp: 1.99075 | loss: 1.16852| constrast_loss: 4.59669| div_loss: 0.77404| %_mask_idx: 0.39536| ppl: 144.61615| %_neg_is_pos: 0.01613| lr: 0.0| temp: 1.99073 | loss: 1.16871| constrast_loss: 4.59988| div_loss: 0.74942| %_mask_idx: 0.42293| ppl: 160.37274| %_neg_is_pos: 0.00732| lr: 0.0| temp: 1.99073 | loss: 1.16691| constrast_loss: 4.59222| div_loss: 0.75441| %_mask_idx: 0.35652| ppl: 157.17615| %_neg_is_pos: 0.01539| lr: 0.0| temp: 1.99072 | loss: 1.16719| constrast_loss: 4.59247| div_loss: 0.76287| %_mask_idx: 0.36122| ppl: 151.76628| %_neg_is_pos: 0.01543| lr: 0.0| temp: 1.99072 | loss: 1.16652| constrast_loss: 4.59402| div_loss: 0.72052| %_mask_idx: 0.40789| ppl: 178.86914| %_neg_is_pos: 0.00794| lr: 0.0| temp: 1.99071 | loss: 1.16653| constrast_loss: 4.59179| div_loss: 0.74321| %_mask_idx: 0.38471| ppl: 164.34604| %_neg_is_pos: 0.00956| lr: 0.0| temp: 1.99071 | loss: 1.16714| constrast_loss: 4.59744| div_loss: 0.71132| %_mask_idx: 0.3974| ppl: 184.75323| %_neg_is_pos: 0.01106| lr: 0.0| temp: 1.9907 | loss: 1.16643| constrast_loss: 4.59377| div_loss: 0.71952| %_mask_idx: 0.3869| ppl: 179.50714| %_neg_is_pos: 0.01072| lr: 0.0| temp: 1.9907 | loss: 1.1673| constrast_loss: 4.59539| div_loss: 0.73802| %_mask_idx: 0.3515| ppl: 167.66943| %_neg_is_pos: 0.00945| lr: 0.0| temp: 1.99068 | loss: 1.16753| constrast_loss: 4.59549| div_loss: 0.74628| %_mask_idx: 0.42152| ppl: 162.37897| %_neg_is_pos: 0.00906| lr: 0.0| temp: 1.99068 | loss: 1.16523| constrast_loss: 4.58552| div_loss: 0.75418| %_mask_idx: 0.38362| ppl: 157.32535| %_neg_is_pos: 0.01833| lr: 0.0| temp: 1.99067 | loss: 1.16761| constrast_loss: 4.59642| div_loss: 0.73999| %_mask_idx: 0.41776| ppl: 166.40488| %_neg_is_pos: 0.01168| lr: 0.0| temp: 1.99067 | loss: 1.1657| constrast_loss: 4.59055| div_loss: 0.72262| %_mask_idx: 0.44095| ppl: 177.521| %_neg_is_pos: 0.01097| lr: 0.0| temp: 1.99065 | loss: 1.16575| constrast_loss: 4.58931| div_loss: 0.73678| %_mask_idx: 0.35714| ppl: 168.46086| %_neg_is_pos: 0.01404| lr: 0.0| temp: 1.99065 | loss: 1.1644| constrast_loss: 4.58439| div_loss: 0.73198| %_mask_idx: 0.3313| ppl: 171.53584| %_neg_is_pos: 0.01936| lr: 0.0| temp: 1.99064 | loss: 1.16777| constrast_loss: 4.5979| div_loss: 0.7319| %_mask_idx: 0.38769| ppl: 171.58304| %_neg_is_pos: 0.01509| lr: 0.0| temp: 1.99064 | loss: 1.16633| constrast_loss: 4.59183| div_loss: 0.73475| %_mask_idx: 0.40492| ppl: 169.75772| %_neg_is_pos: 0.00919| lr: 0.0| temp: 1.99063 | loss: 1.16477| constrast_loss: 4.5856| div_loss: 0.73467| %_mask_idx: 0.31751| ppl: 169.80914| %_neg_is_pos: 0.02449| lr: 0.0| temp: 1.99063 | loss: 1.16705| constrast_loss: 4.59689| div_loss: 0.71325| %_mask_idx: 0.40194| ppl: 183.52313| %_neg_is_pos: 0.01103| lr: 0.0| temp: 1.99062 | loss: 1.16587| constrast_loss: 4.59093| div_loss: 0.72551| %_mask_idx: 0.35526| ppl: 175.67395| %_neg_is_pos: 0.01212| lr: 0.0| temp: 1.99062 | loss: 1.16747| constrast_loss: 4.59857| div_loss: 0.71313| %_mask_idx: 0.43437| ppl: 183.59399| %_neg_is_pos: 0.01105| lr: 0.0| temp: 1.9906 | loss: 1.16686| constrast_loss: 4.59512| div_loss: 0.72308| %_mask_idx: 0.37876| ppl: 177.22852| %_neg_is_pos: 0.00655| lr: 0.0| temp: 1.9906 | loss: 1.16606| constrast_loss: 4.59137| div_loss: 0.72854| %_mask_idx: 0.35432| ppl: 173.73416| %_neg_is_pos: 0.01847| lr: 0.0| temp: 1.99059 | loss: 1.16599| constrast_loss: 4.59142| div_loss: 0.72553| %_mask_idx: 0.41071| ppl: 175.65918| %_neg_is_pos: 0.01088| lr: 0.0| temp: 1.99059 | loss: 1.16603| constrast_loss: 4.59245| div_loss: 0.7167| %_mask_idx: 0.37077| ppl: 181.31311| %_neg_is_pos: 0.0113| lr: 0.0| temp: 1.99058 | loss: 1.16642| constrast_loss: 4.5921| div_loss: 0.73565| %_mask_idx: 0.33177| ppl: 169.18213| %_neg_is_pos: 0.0235| lr: 0.0| temp: 1.99058 | loss: 1.16277| constrast_loss: 4.57641| div_loss: 0.74667| %_mask_idx: 0.37422| ppl: 162.13397| %_neg_is_pos: 0.00859| lr: 0.0| temp: 1.99057 | loss: 1.16818| constrast_loss: 4.59952| div_loss: 0.73203| %_mask_idx: 0.35824| ppl: 171.49828| %_neg_is_pos: 0.01041| lr: 0.0| temp: 1.99057 | loss: 1.16789| constrast_loss: 4.5996| div_loss: 0.71966| %_mask_idx: 0.39693| ppl: 179.41542| %_neg_is_pos: 0.00824| lr: 0.0| temp: 1.99055 | loss: 1.16663| constrast_loss: 4.59436| div_loss: 0.7215| %_mask_idx: 0.37766| ppl: 178.24292| %_neg_is_pos: 0.0083| lr: 0.0| temp: 1.99055 | loss: 1.16574| constrast_loss: 4.58829| div_loss: 0.74682| %_mask_idx: 0.40398| ppl: 162.03482| %_neg_is_pos: 0.01254| lr: 0.0| temp: 1.99054 | loss: 1.16708| constrast_loss: 4.59412| div_loss: 0.74217| %_mask_idx: 0.4469| ppl: 165.00873| %_neg_is_pos: 0.01139| lr: 0.0| temp: 1.99054 | loss: 1.16476| constrast_loss: 4.5864| div_loss: 0.72655| %_mask_idx: 0.36404| ppl: 175.00671| %_neg_is_pos: 0.01736| lr: 0.0| temp: 1.99053 | loss: 1.16696| constrast_loss: 4.59559| div_loss: 0.72257| %_mask_idx: 0.45739| ppl: 177.5553| %_neg_is_pos: 0.00499| lr: 0.0| temp: 1.99053 | loss: 1.16559| constrast_loss: 4.58689| div_loss: 0.75456| %_mask_idx: 0.37077| ppl: 157.0834| %_neg_is_pos: 0.01883| lr: 0.0| temp: 1.99052 | loss: 1.16629| constrast_loss: 4.59| div_loss: 0.75148| %_mask_idx: 0.37234| ppl: 159.05246| %_neg_is_pos: 0.01249| lr: 0.0| temp: 1.99052 | loss: 1.16722| constrast_loss: 4.596| div_loss: 0.72863| %_mask_idx: 0.39944| ppl: 173.67773| %_neg_is_pos: 0.00789| lr: 0.0| temp: 1.9905 | loss: 1.16719| constrast_loss: 4.59441| div_loss: 0.74343| %_mask_idx: 0.43374| ppl: 164.20766| %_neg_is_pos: 0.0075| lr: 0.0| temp: 1.9905 | loss: 1.16705| constrast_loss: 4.59383| div_loss: 0.74387| %_mask_idx: 0.37735| ppl: 163.92432| %_neg_is_pos: 0.02053| lr: 0.0| temp: 1.99049 | loss: 1.16702| constrast_loss: 4.59441| div_loss: 0.73693| %_mask_idx: 0.4068| ppl: 168.3627| %_neg_is_pos: 0.01589| lr: 0.0| temp: 1.99049 | loss: 1.16392| constrast_loss: 4.58153| div_loss: 0.74137| %_mask_idx: 0.39677| ppl: 165.52017| %_neg_is_pos: 0.01176| lr: 0.0| temp: 1.99047 | loss: 1.16822| constrast_loss: 4.59722| div_loss: 0.75656| %_mask_idx: 0.44565| ppl: 155.80469| %_neg_is_pos: 0.01112| lr: 0.0| temp: 1.99047 | loss: 1.16629| constrast_loss: 4.59077| div_loss: 0.74374| %_mask_idx: 0.36263| ppl: 164.00815| %_neg_is_pos: 0.01633| lr: 0.0| temp: 1.99046 | loss: 1.16519| constrast_loss: 4.58849| div_loss: 0.72255| %_mask_idx: 0.39834| ppl: 177.56653| %_neg_is_pos: 0.01521| lr: 0.0| temp: 1.99046 | loss: 1.16527| constrast_loss: 4.58534| div_loss: 0.75727| %_mask_idx: 0.3739| ppl: 155.35004| %_neg_is_pos: 0.02173| lr: 0.0| temp: 1.99045 | loss: 1.16726| constrast_loss: 4.59631| div_loss: 0.7272| %_mask_idx: 0.4317| ppl: 174.58978| %_neg_is_pos: 0.00594| lr: 0.0| temp: 1.99045 | loss: 1.1668| constrast_loss: 4.59406| div_loss: 0.73128| %_mask_idx: 0.40351| ppl: 171.98331| %_neg_is_pos: 0.00592| lr: 0.0| temp: 1.99044 | loss: 1.16604| constrast_loss: 4.59153| div_loss: 0.7262| %_mask_idx: 0.37892| ppl: 175.23499| %_neg_is_pos: 0.01507| lr: 0.0| temp: 1.99044 | loss: 1.16452| constrast_loss: 4.5854| div_loss: 0.72691| %_mask_idx: 0.40946| ppl: 174.77838| %_neg_is_pos: 0.01235| lr: 0.0| temp: 1.99042 | loss: 1.16782| constrast_loss: 4.59914| div_loss: 0.72143| %_mask_idx: 0.37265| ppl: 178.28513| %_neg_is_pos: 0.0079| lr: 0.0| temp: 1.99042 | loss: 1.16706| constrast_loss: 4.59456| div_loss: 0.73693| %_mask_idx: 0.36043| ppl: 168.36588| %_neg_is_pos: 0.01567| lr: 0.0| temp: 1.99041 | loss: 1.16691| constrast_loss: 4.59343| div_loss: 0.74212| %_mask_idx: 0.39035| ppl: 165.04028| %_neg_is_pos: 0.01848| lr: 0.0| temp: 1.99041 | loss: 1.16707| constrast_loss: 4.59665| div_loss: 0.71631| %_mask_idx: 0.41698| ppl: 181.56113| %_neg_is_pos: 0.01095| lr: 0.0| temp: 1.9904 | loss: 1.16878| constrast_loss: 4.59957| div_loss: 0.7555| %_mask_idx: 0.37954| ppl: 156.48106| %_neg_is_pos: 0.01078| lr: 0.0| temp: 1.9904 | loss: 1.1664| constrast_loss: 4.5932| div_loss: 0.72386| %_mask_idx: 0.41902| ppl: 176.7326| %_neg_is_pos: 0.01024| lr: 0.0| temp: 1.99039 | loss: 1.16645| constrast_loss: 4.59164| div_loss: 0.7415| %_mask_idx: 0.3703| ppl: 165.44016| %_neg_is_pos: 0.0174| lr: 0.0| temp: 1.99039 | loss: 1.16885| constrast_loss: 4.60277| div_loss: 0.7262| %_mask_idx: 0.40273| ppl: 175.22983| %_neg_is_pos: 0.01512| lr: 0.0| temp: 1.99037 | loss: 1.16583| constrast_loss: 4.58794| div_loss: 0.75362| %_mask_idx: 0.41212| ppl: 157.68298| %_neg_is_pos: 0.01154| lr: 0.0| temp: 1.99037 | loss: 1.16557| constrast_loss: 4.58727| div_loss: 0.7502| %_mask_idx: 0.40414| ppl: 159.86922| %_neg_is_pos: 0.01564| lr: 0.0| temp: 1.99036 | loss: 1.16574| constrast_loss: 4.59066| div_loss: 0.72284| %_mask_idx: 0.38628| ppl: 177.38188| %_neg_is_pos: 0.00721| lr: 0.0| temp: 1.99036 | loss: 1.16745| constrast_loss: 4.59511| div_loss: 0.74706| %_mask_idx: 0.42716| ppl: 161.88335| %_neg_is_pos: 0.00892| lr: 0.0| temp: 1.99035 | loss: 1.16771| constrast_loss: 4.59727| div_loss: 0.73567| %_mask_idx: 0.3833| ppl: 169.17059| %_neg_is_pos: 0.00815| lr: 0.0| temp: 1.99035 | loss: 1.1658| constrast_loss: 4.58919| div_loss: 0.73996| %_mask_idx: 0.36889| ppl: 166.42795| %_neg_is_pos: 0.01398| lr: 0.0| temp: 1.99034 | loss: 1.16608| constrast_loss: 4.5907| div_loss: 0.73618| %_mask_idx: 0.40664| ppl: 168.8456| %_neg_is_pos: 0.02012| lr: 0.0| temp: 1.99034 | loss: 1.16433| constrast_loss: 4.58461| div_loss: 0.72695| %_mask_idx: 0.33803| ppl: 174.74988| %_neg_is_pos: 0.01103| lr: 0.0| temp: 1.99032 | loss: 1.16689| constrast_loss: 4.59466| div_loss: 0.72898| %_mask_idx: 0.34164| ppl: 173.45256| %_neg_is_pos: 0.01653| lr: 0.0| temp: 1.99032 | loss: 1.16539| constrast_loss: 4.58809| div_loss: 0.73457| %_mask_idx: 0.40226| ppl: 169.87704| %_neg_is_pos: 0.01976| lr: 0.0| temp: 1.99031 | loss: 1.16581| constrast_loss: 4.58994| div_loss: 0.73304| %_mask_idx: 0.35103| ppl: 170.85748| %_neg_is_pos: 0.01312| lr: 0.0| temp: 1.99031 | loss: 1.16759| constrast_loss: 4.59822| div_loss: 0.7213| %_mask_idx: 0.38174| ppl: 178.37051| %_neg_is_pos: 0.00813| lr: 0.0| temp: 1.99029 | loss: 1.16612| constrast_loss: 4.59083| div_loss: 0.73649| %_mask_idx: 0.41855| ppl: 168.64923| %_neg_is_pos: 0.0092| lr: 0.0| temp: 1.99029 | loss: 1.16742| constrast_loss: 4.59668| div_loss: 0.73003| %_mask_idx: 0.38847| ppl: 172.78253| %_neg_is_pos: 0.01363| lr: 0.0| temp: 1.99028 | loss: 1.16319| constrast_loss: 4.57564| div_loss: 0.77103| %_mask_idx: 0.36623| ppl: 146.53888| %_neg_is_pos: 0.01774| lr: 0.0| temp: 1.99028 | loss: 1.16529| constrast_loss: 4.58735| div_loss: 0.73793| %_mask_idx: 0.37202| ppl: 167.72275| %_neg_is_pos: 0.01357| lr: 0.0| temp: 1.99027 | loss: 1.16593| constrast_loss: 4.59061| div_loss: 0.73127| %_mask_idx: 0.46648| ppl: 171.98874| %_neg_is_pos: 0.00839| lr: 0.0| temp: 1.99027 | loss: 1.16591| constrast_loss: 4.59104| div_loss: 0.72584| %_mask_idx: 0.40977| ppl: 175.46457| %_neg_is_pos: 0.01006| lr: 0.0| temp: 1.99026 | loss: 1.16851| constrast_loss: 4.60149| div_loss: 0.72548| %_mask_idx: 0.38581| ppl: 175.6917| %_neg_is_pos: 0.01037| lr: 0.0| temp: 1.99026 | loss: 1.16637| constrast_loss: 4.59209| div_loss: 0.73391| %_mask_idx: 0.41259| ppl: 170.29556| %_neg_is_pos: 0.01293| lr: 0.0| temp: 1.99024 | loss: 1.16699| constrast_loss: 4.59681| div_loss: 0.7115| %_mask_idx: 0.37312| ppl: 184.64221| %_neg_is_pos: 0.0085| lr: 0.0| temp: 1.99024 | loss: 1.16509| constrast_loss: 4.58601| div_loss: 0.7436| %_mask_idx: 0.38487| ppl: 164.09871| %_neg_is_pos: 0.01909| lr: 0.0| temp: 1.99023 | loss: 1.16649| constrast_loss: 4.59156| div_loss: 0.74411| %_mask_idx: 0.35855| ppl: 163.76846| %_neg_is_pos: 0.01146| lr: 0.0| temp: 1.99023 | loss: 1.16765| constrast_loss: 4.59666| div_loss: 0.73949| %_mask_idx: 0.40241| ppl: 166.7287| %_neg_is_pos: 0.01134| lr: 0.0| temp: 1.99022 | loss: 1.16496| constrast_loss: 4.58642| div_loss: 0.73418| %_mask_idx: 0.36451| ppl: 170.12213| %_neg_is_pos: 0.02013| lr: 0.0| temp: 1.99022 | loss: 1.16539| constrast_loss: 4.58701| div_loss: 0.74544| %_mask_idx: 0.39066| ppl: 162.92128| %_neg_is_pos: 0.0124| lr: 0.0| temp: 1.99021 | loss: 1.16618| constrast_loss: 4.59065| div_loss: 0.74084| %_mask_idx: 0.37657| ppl: 165.86398| %_neg_is_pos: 0.01464| lr: 0.0| temp: 1.99021 | loss: 1.16687| constrast_loss: 4.59246| div_loss: 0.75015| %_mask_idx: 0.40179| ppl: 159.90141| %_neg_is_pos: 0.01334| lr: 0.0| temp: 1.99019 | loss: 1.16527| constrast_loss: 4.58745| div_loss: 0.73632| %_mask_idx: 0.39254| ppl: 168.75372| %_neg_is_pos: 0.01039| lr: 0.0| temp: 1.99019 | loss: 1.16584| constrast_loss: 4.58947| div_loss: 0.73901| %_mask_idx: 0.37672| ppl: 167.03679| %_neg_is_pos: 0.00987| lr: 0.0| temp: 1.99018 | loss: 1.16771| constrast_loss: 4.5989| div_loss: 0.7192| %_mask_idx: 0.37234| ppl: 179.70926| %_neg_is_pos: 0.01167| lr: 0.0| temp: 1.99018 | loss: 1.1679| constrast_loss: 4.60044| div_loss: 0.71149| %_mask_idx: 0.40069| ppl: 184.64697| %_neg_is_pos: 0.00771| lr: 0.0| temp: 1.99017 | loss: 1.16715| constrast_loss: 4.59757| div_loss: 0.71023| %_mask_idx: 0.44063| ppl: 185.45308| %_neg_is_pos: 0.00931| lr: 0.0| temp: 1.99017 | loss: 1.1674| constrast_loss: 4.59845| div_loss: 0.71135| %_mask_idx: 0.39834| ppl: 184.73874| %_neg_is_pos: 0.00772| lr: 0.0| temp: 1.99016 | loss: 1.16681| constrast_loss: 4.59377| div_loss: 0.73488| %_mask_idx: 0.4151| ppl: 169.67902| %_neg_is_pos: 0.00942| lr: 0.0| temp: 1.99016 | loss: 1.1666| constrast_loss: 4.59252| div_loss: 0.73872| %_mask_idx: 0.41682| ppl: 167.2182| %_neg_is_pos: 0.00982| lr: 0.0| temp: 1.99014 | loss: 1.16385| constrast_loss: 4.58119| div_loss: 0.74209| %_mask_idx: 0.37954| ppl: 165.06497| %_neg_is_pos: 0.01654| lr: 0.0| temp: 1.99014 | loss: 1.16447| constrast_loss: 4.58389| div_loss: 0.73981| %_mask_idx: 0.37892| ppl: 166.52121| %_neg_is_pos: 0.01197| lr: 0.0| temp: 1.99013 | loss: 1.1675| constrast_loss: 4.59713| div_loss: 0.72856| %_mask_idx: 0.38503| ppl: 173.72443| %_neg_is_pos: 0.01021| lr: 0.0| temp: 1.99013 [2021-09-01 23:24:58,825] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1024.0, reducing to 512.0 [2021-09-01 23:24:58,825] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1024.0, reducing to 512.0 | loss: 1.16506| constrast_loss: 4.58499| div_loss: 0.75252| %_mask_idx: 0.42528| ppl: 158.38492| %_neg_is_pos: 0.01154| lr: 0.0| temp: 1.99011 | loss: 1.1637| constrast_loss: 4.57976| div_loss: 0.75044| %_mask_idx: 0.37813| ppl: 159.71832| %_neg_is_pos: 0.02038| lr: 0.0| temp: 1.99011 | loss: 1.16672| constrast_loss: 4.594| div_loss: 0.72877| %_mask_idx: 0.36544| ppl: 173.5887| %_neg_is_pos: 0.01453| lr: 0.0| temp: 1.9901 | loss: 1.16612| constrast_loss: 4.58935| div_loss: 0.7515| %_mask_idx: 0.38142| ppl: 159.03854| %_neg_is_pos: 0.01905| lr: 0.0| temp: 1.9901 | loss: 1.16701| constrast_loss: 4.59673| div_loss: 0.71326| %_mask_idx: 0.37155| ppl: 183.51395| %_neg_is_pos: 0.01159| lr: 0.0| temp: 1.99009 | loss: 1.1647| constrast_loss: 4.58619| div_loss: 0.7261| %_mask_idx: 0.37516| ppl: 175.29774| %_neg_is_pos: 0.01199| lr: 0.0| temp: 1.99009 | loss: 1.16676| constrast_loss: 4.59502| div_loss: 0.72027| %_mask_idx: 0.43421| ppl: 179.02785| %_neg_is_pos: 0.00825| lr: 0.0| temp: 1.99008 | loss: 1.16932| constrast_loss: 4.60662| div_loss: 0.70644| %_mask_idx: 0.40006| ppl: 187.87547| %_neg_is_pos: 0.00797| lr: 0.0| temp: 1.99008 | loss: 1.16488| constrast_loss: 4.58561| div_loss: 0.73922| %_mask_idx: 0.37813| ppl: 166.89822| %_neg_is_pos: 0.01856| lr: 0.0| temp: 1.99006 | loss: 1.16372| constrast_loss: 4.58308| div_loss: 0.71801| %_mask_idx: 0.36075| ppl: 180.47662| %_neg_is_pos: 0.01469| lr: 0.0| temp: 1.99006 | loss: 1.16823| constrast_loss: 4.59957| div_loss: 0.73358| %_mask_idx: 0.40555| ppl: 170.50668| %_neg_is_pos: 0.01394| lr: 0.0| temp: 1.99005 | loss: 1.16663| constrast_loss: 4.59504| div_loss: 0.71472| %_mask_idx: 0.36075| ppl: 182.57858| %_neg_is_pos: 0.01056| lr: 0.0| temp: 1.99005 | loss: 1.16414| constrast_loss: 4.58404| div_loss: 0.72527| %_mask_idx: 0.42168| ppl: 175.82416| %_neg_is_pos: 0.00652| lr: 0.0| temp: 1.99004 | loss: 1.16482| constrast_loss: 4.58626| div_loss: 0.73002| %_mask_idx: 0.37563| ppl: 172.78606| %_neg_is_pos: 0.01369| lr: 0.0| temp: 1.99004 | loss: 1.16201| constrast_loss: 4.57245| div_loss: 0.75612| %_mask_idx: 0.35793| ppl: 156.08447| %_neg_is_pos: 0.02002| lr: 0.0| temp: 1.99003 | loss: 1.16566| constrast_loss: 4.58886| div_loss: 0.73782| %_mask_idx: 0.38706| ppl: 167.79379| %_neg_is_pos: 0.0121| lr: 0.0| temp: 1.99003 | loss: 1.16082| constrast_loss: 4.56832| div_loss: 0.74977| %_mask_idx: 0.36325| ppl: 160.14819| %_neg_is_pos: 0.01963| lr: 0.0| temp: 1.99002 | loss: 1.16027| constrast_loss: 4.56839| div_loss: 0.72678| %_mask_idx: 0.43045| ppl: 174.86081| %_neg_is_pos: 0.01362| lr: 0.0| temp: 1.99002 | loss: 1.16325| constrast_loss: 4.58026| div_loss: 0.72724| %_mask_idx: 0.37735| ppl: 174.56422| %_neg_is_pos: 0.02057| lr: 0.0| temp: 1.99001 | loss: 1.16309| constrast_loss: 4.57777| div_loss: 0.74595| %_mask_idx: 0.35448| ppl: 162.59152| %_neg_is_pos: 0.00805| lr: 0.0| temp: 1.99001 | loss: 1.16341| constrast_loss: 4.58015| div_loss: 0.73507| %_mask_idx: 0.39223| ppl: 169.55649| %_neg_is_pos: 0.01141| lr: 0.0| temp: 1.99 | loss: 1.161| constrast_loss: 4.57145| div_loss: 0.72552| %_mask_idx: 0.38189| ppl: 175.66464| %_neg_is_pos: 0.01539| lr: 0.0| temp: 1.99 | loss: 1.15947| constrast_loss: 4.56144| div_loss: 0.76454| %_mask_idx: 0.38346| ppl: 150.69632| %_neg_is_pos: 0.024| lr: 0.0| temp: 1.98999 | loss: 1.16091| constrast_loss: 4.56836| div_loss: 0.75297| %_mask_idx: 0.41369| ppl: 158.10123| %_neg_is_pos: 0.01628| lr: 0.0| temp: 1.98999 | loss: 1.16333| constrast_loss: 4.58209| div_loss: 0.71246| %_mask_idx: 0.38064| ppl: 184.02319| %_neg_is_pos: 0.01123| lr: 0.0| temp: 1.98997 | loss: 1.1638| constrast_loss: 4.58421| div_loss: 0.70987| %_mask_idx: 0.42325| ppl: 185.6857| %_neg_is_pos: 0.01378| lr: 0.0| temp: 1.98997 | loss: 1.16146| constrast_loss: 4.57344| div_loss: 0.72397| %_mask_idx: 0.42184| ppl: 176.66129| %_neg_is_pos: 0.01969| lr: 0.0| temp: 1.98996 | loss: 1.15706| constrast_loss: 4.55253| div_loss: 0.75701| %_mask_idx: 0.38393| ppl: 155.51071| %_neg_is_pos: 0.01999| lr: 0.0| temp: 1.98996 | loss: 1.15893| constrast_loss: 4.56501| div_loss: 0.70727| %_mask_idx: 0.37876| ppl: 187.34586| %_neg_is_pos: 0.01298| lr: 0.0| temp: 1.98994 | loss: 1.16313| constrast_loss: 4.58067| div_loss: 0.71859| %_mask_idx: 0.34586| ppl: 180.10538| %_neg_is_pos: 0.02| lr: 0.0| temp: 1.98994 | loss: 1.16466| constrast_loss: 4.58545| div_loss: 0.73201| %_mask_idx: 0.40993| ppl: 171.5105| %_neg_is_pos: 0.01528| lr: 0.0| temp: 1.98993 | loss: 1.16104| constrast_loss: 4.57124| div_loss: 0.72904| %_mask_idx: 0.38409| ppl: 173.41154| %_neg_is_pos: 0.01959| lr: 0.0| temp: 1.98993 | loss: 1.16136| constrast_loss: 4.57341| div_loss: 0.72014| %_mask_idx: 0.45598| ppl: 179.10959| %_neg_is_pos: 0.00973| lr: 0.0| temp: 1.98992 | loss: 1.15714| constrast_loss: 4.55426| div_loss: 0.74294| %_mask_idx: 0.39489| ppl: 164.51797| %_neg_is_pos: 0.02388| lr: 0.0| temp: 1.98992 | loss: 1.16058| constrast_loss: 4.56919| div_loss: 0.73138| %_mask_idx: 0.39991| ppl: 171.91405| %_neg_is_pos: 0.01627| lr: 0.0| temp: 1.98991 | loss: 1.16503| constrast_loss: 4.58887| div_loss: 0.71231| %_mask_idx: 0.40539| ppl: 184.12251| %_neg_is_pos: 0.01394| lr: 0.0| temp: 1.98991 | loss: 1.15744| constrast_loss: 4.55404| div_loss: 0.75713| %_mask_idx: 0.36404| ppl: 155.43388| %_neg_is_pos: 0.03275| lr: 0.0| temp: 1.98989 | loss: 1.16379| constrast_loss: 4.58284| div_loss: 0.72327| %_mask_idx: 0.36419| ppl: 177.10538| %_neg_is_pos: 0.02506| lr: 0.0| temp: 1.98989 | loss: 1.1606| constrast_loss: 4.5694| div_loss: 0.72999| %_mask_idx: 0.35558| ppl: 172.80458| %_neg_is_pos: 0.02445| lr: 0.0| temp: 1.98988 | loss: 1.16082| constrast_loss: 4.57102| div_loss: 0.72254| %_mask_idx: 0.39489| ppl: 177.57713| %_neg_is_pos: 0.01722| lr: 0.0| temp: 1.98988 | loss: 1.16295| constrast_loss: 4.57843| div_loss: 0.73374| %_mask_idx: 0.40805| ppl: 170.40331| %_neg_is_pos: 0.0156| lr: 0.0| temp: 1.98987 | loss: 1.15595| constrast_loss: 4.55098| div_loss: 0.728| %_mask_idx: 0.41776| ppl: 174.07782| %_neg_is_pos: 0.0168| lr: 0.0| temp: 1.98987 | loss: 1.16426| constrast_loss: 4.5832| div_loss: 0.73838| %_mask_idx: 0.4281| ppl: 167.43683| %_neg_is_pos: 0.01658| lr: 0.0| temp: 1.98986 | loss: 1.16155| constrast_loss: 4.57313| div_loss: 0.73062| %_mask_idx: 0.40476| ppl: 172.40637| %_neg_is_pos: 0.01937| lr: 0.0| temp: 1.98986 | loss: 1.15979| constrast_loss: 4.56637| div_loss: 0.72771| %_mask_idx: 0.3349| ppl: 174.26512| %_neg_is_pos: 0.02528| lr: 0.0| temp: 1.98984 | loss: 1.1596| constrast_loss: 4.56715| div_loss: 0.7126| %_mask_idx: 0.38753| ppl: 183.93562| %_neg_is_pos: 0.01755| lr: 0.0| temp: 1.98984 | loss: 1.16187| constrast_loss: 4.57453| div_loss: 0.72936| %_mask_idx: 0.41181| ppl: 173.20648| %_neg_is_pos: 0.01735| lr: 0.0| temp: 1.98983 | loss: 1.15527| constrast_loss: 4.54583| div_loss: 0.75265| %_mask_idx: 0.36576| ppl: 158.30356| %_neg_is_pos: 0.02431| lr: 0.0| temp: 1.98983 | loss: 1.15389| constrast_loss: 4.5401| div_loss: 0.75478| %_mask_idx: 0.38205| ppl: 156.94269| %_neg_is_pos: 0.02311| lr: 0.0| temp: 1.98982 | loss: 1.16402| constrast_loss: 4.58578| div_loss: 0.70305| %_mask_idx: 0.4032| ppl: 190.04665| %_neg_is_pos: 0.01335| lr: 0.0| temp: 1.98982 | loss: 1.16017| constrast_loss: 4.56977| div_loss: 0.70896| %_mask_idx: 0.38377| ppl: 186.26765| %_neg_is_pos: 0.01301| lr: 0.0| temp: 1.98981 | loss: 1.16129| constrast_loss: 4.5726| div_loss: 0.72545| %_mask_idx: 0.39082| ppl: 175.71469| %_neg_is_pos: 0.02587| lr: 0.0| temp: 1.98981 | loss: 1.1563| constrast_loss: 4.54976| div_loss: 0.75447| %_mask_idx: 0.36685| ppl: 157.13913| %_neg_is_pos: 0.0335| lr: 0.0| temp: 1.98979 | loss: 1.1588| constrast_loss: 4.55899| div_loss: 0.76201| %_mask_idx: 0.36482| ppl: 152.31665| %_neg_is_pos: 0.0315| lr: 0.0| temp: 1.98979 | loss: 1.15827| constrast_loss: 4.55742| div_loss: 0.75648| %_mask_idx: 0.32331| ppl: 155.85529| %_neg_is_pos: 0.02267| lr: 0.0| temp: 1.98978 | loss: 1.16035| constrast_loss: 4.56831| div_loss: 0.73068| %_mask_idx: 0.40789| ppl: 172.36243| %_neg_is_pos: 0.01768| lr: 0.0| temp: 1.98978 | loss: 1.15328| constrast_loss: 4.53896| div_loss: 0.74136| %_mask_idx: 0.34195| ppl: 165.53232| %_neg_is_pos: 0.0352| lr: 0.0| temp: 1.98976 | loss: 1.15834| constrast_loss: 4.55819| div_loss: 0.75172| %_mask_idx: 0.32785| ppl: 158.89645| %_neg_is_pos: 0.03539| lr: 0.0| temp: 1.98976 | loss: 1.16008| constrast_loss: 4.56764| div_loss: 0.72673| %_mask_idx: 0.45269| ppl: 174.89523| %_neg_is_pos: 0.01503| lr: 0.0| temp: 1.98975 | loss: 1.16207| constrast_loss: 4.57792| div_loss: 0.7035| %_mask_idx: 0.37939| ppl: 189.75824| %_neg_is_pos: 0.01834| lr: 0.0| temp: 1.98975 | loss: 1.1589| constrast_loss: 4.56232| div_loss: 0.73294| %_mask_idx: 0.3338| ppl: 170.91821| %_neg_is_pos: 0.02433| lr: 0.0| temp: 1.98974 | loss: 1.15789| constrast_loss: 4.5591| div_loss: 0.72457| %_mask_idx: 0.35056| ppl: 176.27692| %_neg_is_pos: 0.02018| lr: 0.0| temp: 1.98974 | loss: 1.15951| constrast_loss: 4.56587| div_loss: 0.72161| %_mask_idx: 0.29746| ppl: 178.16907| %_neg_is_pos: 0.02457| lr: 0.0| temp: 1.98973 | loss: 1.15989| constrast_loss: 4.56696| div_loss: 0.72606| %_mask_idx: 0.39646| ppl: 175.32176| %_neg_is_pos: 0.01877| lr: 0.0| temp: 1.98973 | loss: 1.15557| constrast_loss: 4.5477| div_loss: 0.74576| %_mask_idx: 0.34492| ppl: 162.71153| %_neg_is_pos: 0.03519| lr: 0.0| temp: 1.98971 | loss: 1.15894| constrast_loss: 4.56258| div_loss: 0.73162| %_mask_idx: 0.41165| ppl: 171.76471| %_neg_is_pos: 0.01949| lr: 0.0| temp: 1.98971 | loss: 1.15573| constrast_loss: 4.5482| div_loss: 0.74724| %_mask_idx: 0.35244| ppl: 161.76671| %_neg_is_pos: 0.02033| lr: 0.0| temp: 1.9897 | loss: 1.15922| constrast_loss: 4.56376| div_loss: 0.73129| %_mask_idx: 0.37672| ppl: 171.97623| %_neg_is_pos: 0.01935| lr: 0.0| temp: 1.9897 | loss: 1.16031| constrast_loss: 4.56818| div_loss: 0.73062| %_mask_idx: 0.40852| ppl: 172.40155| %_neg_is_pos: 0.01383| lr: 0.0| temp: 1.98969 | loss: 1.16699| constrast_loss: 4.59834| div_loss: 0.69633| %_mask_idx: 0.39881| ppl: 194.34906| %_neg_is_pos: 0.01623| lr: 0.0| temp: 1.98969 | loss: 1.16163| constrast_loss: 4.57433| div_loss: 0.72172| %_mask_idx: 0.39019| ppl: 178.10208| %_neg_is_pos: 0.01892| lr: 0.0| temp: 1.98968 | loss: 1.1586| constrast_loss: 4.56014| div_loss: 0.74276| %_mask_idx: 0.34414| ppl: 164.63184| %_neg_is_pos: 0.02636| lr: 0.0| temp: 1.98968 | loss: 1.15928| constrast_loss: 4.56413| div_loss: 0.72984| %_mask_idx: 0.37516| ppl: 172.9035| %_neg_is_pos: 0.02243| lr: 0.0| temp: 1.98966 | loss: 1.1546| constrast_loss: 4.54408| div_loss: 0.74328| %_mask_idx: 0.36466| ppl: 164.29877| %_neg_is_pos: 0.02851| lr: 0.0| temp: 1.98966 | loss: 1.16387| constrast_loss: 4.58334| div_loss: 0.72135| %_mask_idx: 0.39881| ppl: 178.33688| %_neg_is_pos: 0.01315| lr: 0.0| temp: 1.98965 | loss: 1.1589| constrast_loss: 4.56224| div_loss: 0.73359| %_mask_idx: 0.39004| ppl: 170.50313| %_neg_is_pos: 0.01434| lr: 0.0| temp: 1.98965 | loss: 1.16473| constrast_loss: 4.58823| div_loss: 0.70689| %_mask_idx: 0.42403| ppl: 187.59299| %_neg_is_pos: 0.01315| lr: 0.0| temp: 1.98964 | loss: 1.15901| constrast_loss: 4.56379| div_loss: 0.72241| %_mask_idx: 0.38753| ppl: 177.65607| %_neg_is_pos: 0.01478| lr: 0.0| temp: 1.98964 | loss: 1.15927| constrast_loss: 4.56402| div_loss: 0.73075| %_mask_idx: 0.38988| ppl: 172.32196| %_neg_is_pos: 0.01925| lr: 0.0| temp: 1.98963 | loss: 1.16035| constrast_loss: 4.56498| div_loss: 0.76411| %_mask_idx: 0.43922| ppl: 150.97279| %_neg_is_pos: 0.02341| lr: 0.0| temp: 1.98963 | loss: 1.16243| constrast_loss: 4.57478| div_loss: 0.74957| %_mask_idx: 0.36999| ppl: 160.27753| %_neg_is_pos: 0.02349| lr: 0.0| temp: 1.98961 | loss: 1.16582| constrast_loss: 4.58966| div_loss: 0.73618| %_mask_idx: 0.38816| ppl: 168.84207| %_neg_is_pos: 0.02621| lr: 0.0| temp: 1.98961 | loss: 1.1624| constrast_loss: 4.57809| div_loss: 0.71508| %_mask_idx: 0.41181| ppl: 182.3513| %_neg_is_pos: 0.01367| lr: 0.0| temp: 1.9896 | loss: 1.16131| constrast_loss: 4.57318| div_loss: 0.72067| %_mask_idx: 0.39818| ppl: 178.77094| %_neg_is_pos: 0.01848| lr: 0.0| temp: 1.9896 | loss: 1.15895| constrast_loss: 4.56478| div_loss: 0.71017| %_mask_idx: 0.32832| ppl: 185.48846| %_neg_is_pos: 0.02075| lr: 0.0| temp: 1.98958 | loss: 1.15791| constrast_loss: 4.55864| div_loss: 0.73007| %_mask_idx: 0.40633| ppl: 172.75627| %_neg_is_pos: 0.01411| lr: 0.0| temp: 1.98958 | loss: 1.16289| constrast_loss: 4.58075| div_loss: 0.70807| %_mask_idx: 0.37578| ppl: 186.83411| %_neg_is_pos: 0.02313| lr: 0.0| temp: 1.98957 | loss: 1.15226| constrast_loss: 4.53385| div_loss: 0.75205| %_mask_idx: 0.37547| ppl: 158.68921| %_neg_is_pos: 0.03195| lr: 0.0| temp: 1.98957 | loss: 1.15543| constrast_loss: 4.54806| div_loss: 0.73652| %_mask_idx: 0.35432| ppl: 168.62961| %_neg_is_pos: 0.02535| lr: 0.0| temp: 1.98956 | loss: 1.16591| constrast_loss: 4.59049| div_loss: 0.73159| %_mask_idx: 0.39066| ppl: 171.77942| %_neg_is_pos: 0.01156| lr: 0.0| temp: 1.98956 | loss: 1.16302| constrast_loss: 4.58028| div_loss: 0.718| %_mask_idx: 0.39489| ppl: 180.4819| %_neg_is_pos: 0.01302| lr: 0.0| temp: 1.98955 | loss: 1.15515| constrast_loss: 4.54632| div_loss: 0.74277| %_mask_idx: 0.33819| ppl: 164.62881| %_neg_is_pos: 0.02618| lr: 0.0| temp: 1.98955 | loss: 1.16234| constrast_loss: 4.57743| div_loss: 0.71922| %_mask_idx: 0.36529| ppl: 179.69748| %_neg_is_pos: 0.0232| lr: 0.0| temp: 1.98953 | loss: 1.1603| constrast_loss: 4.56732| div_loss: 0.73883| %_mask_idx: 0.38831| ppl: 167.15047| %_neg_is_pos: 0.02437| lr: 0.0| temp: 1.98953 | loss: 1.15553| constrast_loss: 4.55004| div_loss: 0.72088| %_mask_idx: 0.35338| ppl: 178.63416| %_neg_is_pos: 0.02596| lr: 0.0| temp: 1.98952 | loss: 1.15989| constrast_loss: 4.5672| div_loss: 0.72353| %_mask_idx: 0.41667| ppl: 176.94162| %_neg_is_pos: 0.01625| lr: 0.0| temp: 1.98952 | loss: 1.15799| constrast_loss: 4.55904| div_loss: 0.72939| %_mask_idx: 0.3974| ppl: 173.189| %_neg_is_pos: 0.01408| lr: 0.0| temp: 1.98951 | loss: 1.15719| constrast_loss: 4.55572| div_loss: 0.73044| %_mask_idx: 0.38252| ppl: 172.51852| %_neg_is_pos: 0.02084| lr: 0.0| temp: 1.98951 | loss: 1.16048| constrast_loss: 4.56999| div_loss: 0.7193| %_mask_idx: 0.35918| ppl: 179.64908| %_neg_is_pos: 0.03605| lr: 0.0| temp: 1.9895 | loss: 1.16292| constrast_loss: 4.58074| div_loss: 0.70949| %_mask_idx: 0.38236| ppl: 185.92694| %_neg_is_pos: 0.01803| lr: 0.0| temp: 1.9895 | loss: 1.15833| constrast_loss: 4.56054| div_loss: 0.728| %_mask_idx: 0.39348| ppl: 174.08211| %_neg_is_pos: 0.02656| lr: 0.0| temp: 1.98948 | loss: 1.15981| constrast_loss: 4.56644| div_loss: 0.72808| %_mask_idx: 0.40555| ppl: 174.02673| %_neg_is_pos: 0.01352| lr: 0.0| temp: 1.98948 | loss: 1.15766| constrast_loss: 4.55467| div_loss: 0.75966| %_mask_idx: 0.37516| ppl: 153.81932| %_neg_is_pos: 0.02482| lr: 0.0| temp: 1.98947 | loss: 1.15437| constrast_loss: 4.54435| div_loss: 0.73116| %_mask_idx: 0.35761| ppl: 172.05667| %_neg_is_pos: 0.03101| lr: 0.0| temp: 1.98947 | loss: 1.15369| constrast_loss: 4.54019| div_loss: 0.74551| %_mask_idx: 0.34868| ppl: 162.87492| %_neg_is_pos: 0.02283| lr: 0.0| temp: 1.98946 | loss: 1.16481| constrast_loss: 4.58717| div_loss: 0.72045| %_mask_idx: 0.44439| ppl: 178.91003| %_neg_is_pos: 0.01206| lr: 0.0| temp: 1.98946 | loss: 1.15787| constrast_loss: 4.5601| div_loss: 0.7139| %_mask_idx: 0.38581| ppl: 183.10107| %_neg_is_pos: 0.02046| lr: 0.0| temp: 1.98945 | loss: 1.1511| constrast_loss: 4.52842| div_loss: 0.75963| %_mask_idx: 0.36106| ppl: 153.83472| %_neg_is_pos: 0.03086| lr: 0.0| temp: 1.98945 | loss: 1.15803| constrast_loss: 4.55986| div_loss: 0.72244| %_mask_idx: 0.42575| ppl: 177.63905| %_neg_is_pos: 0.02042| lr: 0.0| temp: 1.98943 | loss: 1.16515| constrast_loss: 4.58845| div_loss: 0.72128| %_mask_idx: 0.37625| ppl: 178.38062| %_neg_is_pos: 0.01529| lr: 0.0| temp: 1.98943 | loss: 1.1604| constrast_loss: 4.56883| div_loss: 0.72752| %_mask_idx: 0.39897| ppl: 174.38593| %_neg_is_pos: 0.01873| lr: 0.0| temp: 1.98942 | loss: 1.16248| constrast_loss: 4.57729| div_loss: 0.72608| %_mask_idx: 0.43499| ppl: 175.30806| %_neg_is_pos: 0.00583| lr: 0.0| temp: 1.98942 [2021-09-01 23:34:13,837] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 512.0, reducing to 256.0 [2021-09-01 23:34:13,837] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 512.0, reducing to 256.0 | loss: 1.15723| constrast_loss: 4.55337| div_loss: 0.75563| %_mask_idx: 0.33913| ppl: 156.39398| %_neg_is_pos: 0.0323| lr: 0.0| temp: 1.9894| loss: 1.15996| constrast_loss: 4.56657| div_loss: 0.73271| %_mask_idx: 0.38972| ppl: 171.06284| %_neg_is_pos: 0.02324| lr: 0.0| temp: 1.9894 | loss: 1.15906| constrast_loss: 4.56301| div_loss: 0.73225| %_mask_idx: 0.39176| ppl: 171.36063| %_neg_is_pos: 0.01555| lr: 0.0| temp: 1.98939 | loss: 1.16042| constrast_loss: 4.57115| div_loss: 0.70528| %_mask_idx: 0.33224| ppl: 188.62357| %_neg_is_pos: 0.02267| lr: 0.0| temp: 1.98939 | loss: 1.16157| constrast_loss: 4.57511| div_loss: 0.71174| %_mask_idx: 0.42231| ppl: 184.48921| %_neg_is_pos: 0.0117| lr: 0.0| temp: 1.98938 | loss: 1.15841| constrast_loss: 4.56128| div_loss: 0.72344| %_mask_idx: 0.38722| ppl: 176.99582| %_neg_is_pos: 0.01646| lr: 0.0| temp: 1.98938 | loss: 1.1594| constrast_loss: 4.56512| div_loss: 0.72493| %_mask_idx: 0.37077| ppl: 176.04492| %_neg_is_pos: 0.02776| lr: 0.0| temp: 1.98937 | loss: 1.16437| constrast_loss: 4.58324| div_loss: 0.74241| %_mask_idx: 0.35291| ppl: 164.85611| %_neg_is_pos: 0.0178| lr: 0.0| temp: 1.98937 | loss: 1.1584| constrast_loss: 4.56065| div_loss: 0.72947| %_mask_idx: 0.34477| ppl: 173.14156| %_neg_is_pos: 0.01471| lr: 0.0| temp: 1.98935| loss: 1.15792| constrast_loss: 4.55745| div_loss: 0.74243| %_mask_idx: 0.41949| ppl: 164.84171| %_neg_is_pos: 0.01767| lr: 0.0| temp: 1.98935 | loss: 1.15232| constrast_loss: 4.53574| div_loss: 0.73529| %_mask_idx: 0.37061| ppl: 169.4122| %_neg_is_pos: 0.01667| lr: 0.0| temp: 1.98934 | loss: 1.1564| constrast_loss: 4.55272| div_loss: 0.72885| %_mask_idx: 0.42325| ppl: 173.53902| %_neg_is_pos: 0.01593| lr: 0.0| temp: 1.98934 | loss: 1.15609| constrast_loss: 4.55296| div_loss: 0.714| %_mask_idx: 0.35965| ppl: 183.04086| %_neg_is_pos: 0.02492| lr: 0.0| temp: 1.98933 | loss: 1.15778| constrast_loss: 4.56023| div_loss: 0.70892| %_mask_idx: 0.39411| ppl: 186.28998| %_neg_is_pos: 0.01095| lr: 0.0| temp: 1.98933 | loss: 1.16184| constrast_loss: 4.57511| div_loss: 0.72242| %_mask_idx: 0.39521| ppl: 177.65213| %_neg_is_pos: 0.02021| lr: 0.0| temp: 1.98932 | loss: 1.1572| constrast_loss: 4.55568| div_loss: 0.73106| %_mask_idx: 0.41228| ppl: 172.11893| %_neg_is_pos: 0.02121| lr: 0.0| temp: 1.98932 | loss: 1.16109| constrast_loss: 4.57186| div_loss: 0.72515| %_mask_idx: 0.42262| ppl: 175.90636| %_neg_is_pos: 0.00933| lr: 0.0| temp: 1.9893 | loss: 1.1599| constrast_loss: 4.5669| div_loss: 0.72699| %_mask_idx: 0.41322| ppl: 174.72411| %_neg_is_pos: 0.01651| lr: 0.0| temp: 1.9893 | loss: 1.15294| constrast_loss: 4.53944| div_loss: 0.72335| %_mask_idx: 0.36153| ppl: 177.05527| %_neg_is_pos: 0.01861| lr: 0.0| temp: 1.98929 | loss: 1.15738| constrast_loss: 4.55936| div_loss: 0.70143| %_mask_idx: 0.39599| ppl: 191.08313| %_neg_is_pos: 0.02425| lr: 0.0| temp: 1.98929 | loss: 1.16076| constrast_loss: 4.57403| div_loss: 0.68995| %_mask_idx: 0.43123| ppl: 198.43387| %_neg_is_pos: 0.01553| lr: 0.0| temp: 1.98928 | loss: 1.1524| constrast_loss: 4.53716| div_loss: 0.72424| %_mask_idx: 0.39677| ppl: 176.48779| %_neg_is_pos: 0.02793| lr: 0.0| temp: 1.98928 | loss: 1.1521| constrast_loss: 4.53672| div_loss: 0.71674| %_mask_idx: 0.36169| ppl: 181.28421| %_neg_is_pos: 0.01656| lr: 0.0| temp: 1.98927 | loss: 1.16056| constrast_loss: 4.57136| div_loss: 0.70866| %_mask_idx: 0.41118| ppl: 186.45459| %_neg_is_pos: 0.0115| lr: 0.0| temp: 1.98927 | loss: 1.1515| constrast_loss: 4.53793| div_loss: 0.68083| %_mask_idx: 0.37563| ppl: 204.26825| %_neg_is_pos: 0.01977| lr: 0.0| temp: 1.98925| loss: 1.15954| constrast_loss: 4.56315| div_loss: 0.75013| %_mask_idx: 0.44909| ppl: 159.91946| %_neg_is_pos: 0.02087| lr: 0.0| temp: 1.98925 | loss: 1.14862| constrast_loss: 4.52083| div_loss: 0.73661| %_mask_idx: 0.3468| ppl: 168.57013| %_neg_is_pos: 0.02615| lr: 0.0| temp: 1.98924 | loss: 1.14894| constrast_loss: 4.52362| div_loss: 0.72135| %_mask_idx: 0.37218| ppl: 178.33591| %_neg_is_pos: 0.02721| lr: 0.0| temp: 1.98924 | loss: 1.15092| constrast_loss: 4.53225| div_loss: 0.71441| %_mask_idx: 0.4209| ppl: 182.77751| %_neg_is_pos: 0.01552| lr: 0.0| temp: 1.98922 | loss: 1.14873| constrast_loss: 4.5238| div_loss: 0.71113| %_mask_idx: 0.36873| ppl: 184.87888| %_neg_is_pos: 0.0275| lr: 0.0| temp: 1.98922 | loss: 1.15804| constrast_loss: 4.5565| div_loss: 0.7565| %_mask_idx: 0.40836| ppl: 155.83844| %_neg_is_pos: 0.01882| lr: 0.0| temp: 1.98921 | loss: 1.15786| constrast_loss: 4.56094| div_loss: 0.70512| %_mask_idx: 0.41964| ppl: 188.7244| %_neg_is_pos: 0.01268| lr: 0.0| temp: 1.98921 | loss: 1.15176| constrast_loss: 4.53588| div_loss: 0.71177| %_mask_idx: 0.34539| ppl: 184.46751| %_neg_is_pos: 0.03066| lr: 0.0| temp: 1.9892 | loss: 1.15519| constrast_loss: 4.55019| div_loss: 0.70562| %_mask_idx: 0.38549| ppl: 188.4021| %_neg_is_pos: 0.01502| lr: 0.0| temp: 1.9892 | loss: 1.15124| constrast_loss: 4.53281| div_loss: 0.72146| %_mask_idx: 0.38847| ppl: 178.26678| %_neg_is_pos: 0.01807| lr: 0.0| temp: 1.98919 | loss: 1.15486| constrast_loss: 4.54663| div_loss: 0.72805| %_mask_idx: 0.39834| ppl: 174.05045| %_neg_is_pos: 0.02182| lr: 0.0| temp: 1.98919 | loss: 1.14946| constrast_loss: 4.52913| div_loss: 0.68701| %_mask_idx: 0.36513| ppl: 200.31445| %_neg_is_pos: 0.01114| lr: 0.0| temp: 1.98917 | loss: 1.15552| constrast_loss: 4.55214| div_loss: 0.6996| %_mask_idx: 0.35996| ppl: 192.25607| %_neg_is_pos: 0.01425| lr: 0.0| temp: 1.98917 | loss: 1.15549| constrast_loss: 4.55356| div_loss: 0.68392| %_mask_idx: 0.39568| ppl: 202.28848| %_neg_is_pos: 0.02016| lr: 0.0| temp: 1.98916 | loss: 1.162| constrast_loss: 4.57663| div_loss: 0.71392| %_mask_idx: 0.41291| ppl: 183.089| %_neg_is_pos: 0.0098| lr: 0.0| temp: 1.98916 | loss: 1.15093| constrast_loss: 4.53011| div_loss: 0.73609| %_mask_idx: 0.36732| ppl: 168.90161| %_neg_is_pos: 0.02931| lr: 0.0| temp: 1.98915 | loss: 1.14971| constrast_loss: 4.52635| div_loss: 0.72483| %_mask_idx: 0.39176| ppl: 176.11104| %_neg_is_pos: 0.03033| lr: 0.0| temp: 1.98915 | loss: 1.1611| constrast_loss: 4.57487| div_loss: 0.69521| %_mask_idx: 0.48073| ppl: 195.06259| %_neg_is_pos: 0.00815| lr: 0.0| temp: 1.98914 | loss: 1.15742| constrast_loss: 4.56002| div_loss: 0.69657| %_mask_idx: 0.36184| ppl: 194.19475| %_neg_is_pos: 0.01474| lr: 0.0| temp: 1.98914 | loss: 1.14702| constrast_loss: 4.51496| div_loss: 0.73114| %_mask_idx: 0.35636| ppl: 172.0723| %_neg_is_pos: 0.02763| lr: 0.0| temp: 1.98912 | loss: 1.15097| constrast_loss: 4.53167| div_loss: 0.72222| %_mask_idx: 0.37782| ppl: 177.78168| %_neg_is_pos: 0.02214| lr: 0.0| temp: 1.98912 | loss: 1.15135| constrast_loss: 4.53376| div_loss: 0.71654| %_mask_idx: 0.34539| ppl: 181.41154| %_neg_is_pos: 0.02145| lr: 0.0| temp: 1.98911 | loss: 1.15327| constrast_loss: 4.54188| div_loss: 0.71211| %_mask_idx: 0.43593| ppl: 184.25008| %_neg_is_pos: 0.00939| lr: 0.0| temp: 1.98911 | loss: 1.16166| constrast_loss: 4.57714| div_loss: 0.69504| %_mask_idx: 0.36075| ppl: 195.17545| %_neg_is_pos: 0.01867| lr: 0.0| temp: 1.9891 | loss: 1.14687| constrast_loss: 4.51513| div_loss: 0.72339| %_mask_idx: 0.37093| ppl: 177.03305| %_neg_is_pos: 0.02124| lr: 0.0| temp: 1.9891 | loss: 1.15101| constrast_loss: 4.52804| div_loss: 0.75997| %_mask_idx: 0.39771| ppl: 153.61928| %_neg_is_pos: 0.01974| lr: 0.0| temp: 1.98909 | loss: 1.15835| constrast_loss: 4.56252| div_loss: 0.70863| %_mask_idx: 0.42888| ppl: 186.47543| %_neg_is_pos: 0.01115| lr: 0.0| temp: 1.98909 | loss: 1.14951| constrast_loss: 4.52675| div_loss: 0.71304| %_mask_idx: 0.31438| ppl: 183.65179| %_neg_is_pos: 0.0272| lr: 0.0| temp: 1.98907 | loss: 1.16217| constrast_loss: 4.57389| div_loss: 0.74781| %_mask_idx: 0.40163| ppl: 161.39874| %_neg_is_pos: 0.0253| lr: 0.0| temp: 1.98907 | loss: 1.14881| constrast_loss: 4.52352| div_loss: 0.71702| %_mask_idx: 0.3761| ppl: 181.10452| %_neg_is_pos: 0.0244| lr: 0.0| temp: 1.98906 | loss: 1.15834| constrast_loss: 4.5623| div_loss: 0.71057| %_mask_idx: 0.35808| ppl: 185.23518| %_neg_is_pos: 0.02028| lr: 0.0| temp: 1.98906 | loss: 1.15382| constrast_loss: 4.54406| div_loss: 0.71223| %_mask_idx: 0.38315| ppl: 184.17303| %_neg_is_pos: 0.02291| lr: 0.0| temp: 1.98904 | loss: 1.15443| constrast_loss: 4.54571| div_loss: 0.71999| %_mask_idx: 0.43045| ppl: 179.20703| %_neg_is_pos: 0.01325| lr: 0.0| temp: 1.98904 | loss: 1.14813| constrast_loss: 4.51969| div_loss: 0.72818| %_mask_idx: 0.35479| ppl: 173.9617| %_neg_is_pos: 0.02471| lr: 0.0| temp: 1.98903 | loss: 1.15078| constrast_loss: 4.52866| div_loss: 0.74453| %_mask_idx: 0.36936| ppl: 163.50397| %_neg_is_pos: 0.01834| lr: 0.0| temp: 1.98903 | loss: 1.15366| constrast_loss: 4.54455| div_loss: 0.70081| %_mask_idx: 0.40664| ppl: 191.48477| %_neg_is_pos: 0.01077| lr: 0.0| temp: 1.98902 | loss: 1.15231| constrast_loss: 4.53773| div_loss: 0.71518| %_mask_idx: 0.39333| ppl: 182.28564| %_neg_is_pos: 0.01938| lr: 0.0| temp: 1.98902 | loss: 1.15205| constrast_loss: 4.535| div_loss: 0.73219| %_mask_idx: 0.40617| ppl: 171.39754| %_neg_is_pos: 0.0204| lr: 0.0| temp: 1.98901 | loss: 1.15493| constrast_loss: 4.55161| div_loss: 0.6812| %_mask_idx: 0.36247| ppl: 204.03058| %_neg_is_pos: 0.01395| lr: 0.0| temp: 1.98901 | loss: 1.15212| constrast_loss: 4.53593| div_loss: 0.72527| %_mask_idx: 0.40147| ppl: 175.82645| %_neg_is_pos: 0.01833| lr: 0.0| temp: 1.98899 | loss: 1.15491| constrast_loss: 4.54871| div_loss: 0.70937| %_mask_idx: 0.38941| ppl: 186.00526| %_neg_is_pos: 0.01296| lr: 0.0| temp: 1.98899 | loss: 1.14295| constrast_loss: 4.49895| div_loss: 0.72858| %_mask_idx: 0.37845| ppl: 173.70947| %_neg_is_pos: 0.01849| lr: 0.0| temp: 1.98898 | loss: 1.15309| constrast_loss: 4.54112| div_loss: 0.71253| %_mask_idx: 0.40398| ppl: 183.98267| %_neg_is_pos: 0.01863| lr: 0.0| temp: 1.98898 | loss: 1.15564| constrast_loss: 4.553| div_loss: 0.69578| %_mask_idx: 0.41228| ppl: 194.69843| %_neg_is_pos: 0.02537| lr: 0.0| temp: 1.98897 | loss: 1.14896| constrast_loss: 4.52418| div_loss: 0.71672| %_mask_idx: 0.37484| ppl: 181.29633| %_neg_is_pos: 0.02232| lr: 0.0| temp: 1.98897 | loss: 1.16202| constrast_loss: 4.57585| div_loss: 0.72217| %_mask_idx: 0.41103| ppl: 177.81059| %_neg_is_pos: 0.01693| lr: 0.0| temp: 1.98896 | loss: 1.15338| constrast_loss: 4.54444| div_loss: 0.69084| %_mask_idx: 0.37093| ppl: 197.86467| %_neg_is_pos: 0.01748| lr: 0.0| temp: 1.98896 | loss: 1.14621| constrast_loss: 4.51091| div_loss: 0.73909| %_mask_idx: 0.38988| ppl: 166.98477| %_neg_is_pos: 0.02353| lr: 0.0| temp: 1.98894 | loss: 1.15519| constrast_loss: 4.55183| div_loss: 0.68929| %_mask_idx: 0.38706| ppl: 198.85385| %_neg_is_pos: 0.01877| lr: 0.0| temp: 1.98894 | loss: 1.15123| constrast_loss: 4.53159| div_loss: 0.73318| %_mask_idx: 0.37343| ppl: 170.76729| %_neg_is_pos: 0.018| lr: 0.0| temp: 1.98893 | loss: 1.14781| constrast_loss: 4.51821| div_loss: 0.73012| %_mask_idx: 0.36466| ppl: 172.72232| %_neg_is_pos: 0.02836| lr: 0.0| temp: 1.98893 | loss: 1.15117| constrast_loss: 4.53306| div_loss: 0.71638| %_mask_idx: 0.37672| ppl: 181.51505| %_neg_is_pos: 0.02328| lr: 0.0| temp: 1.98892 | loss: 1.1504| constrast_loss: 4.52956| div_loss: 0.72025| %_mask_idx: 0.35636| ppl: 179.03725| %_neg_is_pos: 0.02525| lr: 0.0| temp: 1.98892 | loss: 1.14672| constrast_loss: 4.51536| div_loss: 0.71531| %_mask_idx: 0.36544| ppl: 182.20398| %_neg_is_pos: 0.01958| lr: 0.0| temp: 1.98891 | loss: 1.15857| constrast_loss: 4.56364| div_loss: 0.70651| %_mask_idx: 0.4317| ppl: 187.83156| %_neg_is_pos: 0.00974| lr: 0.0| temp: 1.98891 | loss: 1.14434| constrast_loss: 4.50237| div_loss: 0.75008| %_mask_idx: 0.34289| ppl: 159.94598| %_neg_is_pos: 0.02248| lr: 0.0| temp: 1.98889 | loss: 1.14848| constrast_loss: 4.51986| div_loss: 0.74046| %_mask_idx: 0.41588| ppl: 166.10593| %_neg_is_pos: 0.02034| lr: 0.0| temp: 1.98889 | loss: 1.14732| constrast_loss: 4.51645| div_loss: 0.72828| %_mask_idx: 0.42544| ppl: 173.90285| %_neg_is_pos: 0.01381| lr: 0.0| temp: 1.98888 | loss: 1.15728| constrast_loss: 4.55829| div_loss: 0.70846| %_mask_idx: 0.38972| ppl: 186.58556| %_neg_is_pos: 0.02165| lr: 0.0| temp: 1.98888 | loss: 1.14849| constrast_loss: 4.52204| div_loss: 0.71917| %_mask_idx: 0.38769| ppl: 179.7332| %_neg_is_pos: 0.0211| lr: 0.0| temp: 1.98886 | loss: 1.15306| constrast_loss: 4.53987| div_loss: 0.72386| %_mask_idx: 0.38863| ppl: 176.72852| %_neg_is_pos: 0.0169| lr: 0.0| temp: 1.98886 | loss: 1.1427| constrast_loss: 4.49527| div_loss: 0.75524| %_mask_idx: 0.3302| ppl: 156.64491| %_neg_is_pos: 0.03967| lr: 0.0| temp: 1.98885 | loss: 1.1571| constrast_loss: 4.55428| div_loss: 0.7413| %_mask_idx: 0.37625| ppl: 165.56943| %_neg_is_pos: 0.02712| lr: 0.0| temp: 1.98885 | loss: 1.15137| constrast_loss: 4.53345| div_loss: 0.72041| %_mask_idx: 0.36607| ppl: 178.93683| %_neg_is_pos: 0.01699| lr: 0.0| temp: 1.98884 | loss: 1.15139| constrast_loss: 4.53284| div_loss: 0.72737| %_mask_idx: 0.44251| ppl: 174.48589| %_neg_is_pos: 0.01657| lr: 0.0| temp: 1.98884 | loss: 1.16246| constrast_loss: 4.57943| div_loss: 0.70402| %_mask_idx: 0.40241| ppl: 189.42491| %_neg_is_pos: 0.01247| lr: 0.0| temp: 1.98883 | loss: 1.15232| constrast_loss: 4.53842| div_loss: 0.70848| %_mask_idx: 0.41056| ppl: 186.57323| %_neg_is_pos: 0.01463| lr: 0.0| temp: 1.98883 | loss: 1.15061| constrast_loss: 4.5301| div_loss: 0.72338| %_mask_idx: 0.37516| ppl: 177.03561| %_neg_is_pos: 0.02038| lr: 0.0| temp: 1.98881 | loss: 1.15691| constrast_loss: 4.55627| div_loss: 0.71358| %_mask_idx: 0.37265| ppl: 183.31178| %_neg_is_pos: 0.02319| lr: 0.0| temp: 1.98881 | loss: 1.15239| constrast_loss: 4.53704| div_loss: 0.72534| %_mask_idx: 0.42043| ppl: 175.78136| %_neg_is_pos: 0.01632| lr: 0.0| temp: 1.9888 | loss: 1.15599| constrast_loss: 4.55203| div_loss: 0.71937| %_mask_idx: 0.35025| ppl: 179.60184| %_neg_is_pos: 0.02366| lr: 0.0| temp: 1.9888 | loss: 1.16144| constrast_loss: 4.57456| div_loss: 0.71177| %_mask_idx: 0.40789| ppl: 184.46948| %_neg_is_pos: 0.0106| lr: 0.0| temp: 1.98879 | loss: 1.15901| constrast_loss: 4.562| div_loss: 0.74033| %_mask_idx: 0.40648| ppl: 166.18622| %_neg_is_pos: 0.01774| lr: 0.0| temp: 1.98879 | loss: 1.15277| constrast_loss: 4.53988| div_loss: 0.71209| %_mask_idx: 0.3797| ppl: 184.26019| %_neg_is_pos: 0.01512| lr: 0.0| temp: 1.98878 | loss: 1.15215| constrast_loss: 4.53391| div_loss: 0.74674| %_mask_idx: 0.36842| ppl: 162.08379| %_neg_is_pos: 0.03085| lr: 0.0| temp: 1.98878 | loss: 1.15662| constrast_loss: 4.55433| div_loss: 0.72157| %_mask_idx: 0.36889| ppl: 178.19675| %_neg_is_pos: 0.01705| lr: 0.0| temp: 1.98876 | loss: 1.15382| constrast_loss: 4.54141| div_loss: 0.7385| %_mask_idx: 0.40915| ppl: 167.36148| %_neg_is_pos: 0.01422| lr: 0.0| temp: 1.98876 | loss: 1.14504| constrast_loss: 4.50621| div_loss: 0.73948| %_mask_idx: 0.36435| ppl: 166.73141| %_neg_is_pos: 0.02598| lr: 0.0| temp: 1.98875 | loss: 1.15467| constrast_loss: 4.54757| div_loss: 0.71116| %_mask_idx: 0.3833| ppl: 184.85753| %_neg_is_pos: 0.01838| lr: 0.0| temp: 1.98875 | loss: 1.14791| constrast_loss: 4.51648| div_loss: 0.75174| %_mask_idx: 0.41338| ppl: 158.88612| %_neg_is_pos: 0.0223| lr: 0.0| temp: 1.98874 | loss: 1.15291| constrast_loss: 4.54074| div_loss: 0.70899| %_mask_idx: 0.4292| ppl: 186.24855| %_neg_is_pos: 0.01038| lr: 0.0| temp: 1.98874 | loss: 1.14582| constrast_loss: 4.51358| div_loss: 0.69701| %_mask_idx: 0.32895| ppl: 193.91058| %_neg_is_pos: 0.02563| lr: 0.0| temp: 1.98873 | loss: 1.15396| constrast_loss: 4.54631| div_loss: 0.69533| %_mask_idx: 0.38518| ppl: 194.98776| %_neg_is_pos: 0.01225| lr: 0.0| temp: 1.98873 | loss: 1.15058| constrast_loss: 4.53229| div_loss: 0.70044| %_mask_idx: 0.36999| ppl: 191.71823| %_neg_is_pos: 0.01639| lr: 0.0| temp: 1.98871 | loss: 1.16136| constrast_loss: 4.57414| div_loss: 0.71289| %_mask_idx: 0.40445| ppl: 183.75256| %_neg_is_pos: 0.01957| lr: 0.0| temp: 1.98871 | loss: 1.15093| constrast_loss: 4.52923| div_loss: 0.74498| %_mask_idx: 0.36858| ppl: 163.2113| %_neg_is_pos: 0.02204| lr: 0.0| temp: 1.9887 | loss: 1.15154| constrast_loss: 4.53747| div_loss: 0.68676| %_mask_idx: 0.36654| ppl: 200.47256| %_neg_is_pos: 0.01437| lr: 0.0| temp: 1.9887 [2021-09-01 23:43:29,500] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 256.0, reducing to 128.0 [2021-09-01 23:43:29,500] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 256.0, reducing to 128.0 | loss: 1.15527| constrast_loss: 4.54903| div_loss: 0.72058| %_mask_idx: 0.38925| ppl: 178.83078| %_neg_is_pos: 0.01785| lr: 0.0| temp: 1.98868 | loss: 1.1547| constrast_loss: 4.54602| div_loss: 0.72789| %_mask_idx: 0.401| ppl: 174.15012| %_neg_is_pos: 0.02132| lr: 0.0| temp: 1.98868 | loss: 1.14821| constrast_loss: 4.52025| div_loss: 0.72583| %_mask_idx: 0.4057| ppl: 175.46747| %_neg_is_pos: 0.01912| lr: 0.0| temp: 1.98867 | loss: 1.15548| constrast_loss: 4.55094| div_loss: 0.70989| %_mask_idx: 0.37359| ppl: 185.66782| %_neg_is_pos: 0.01558| lr: 0.0| temp: 1.98867 | loss: 1.15792| constrast_loss: 4.55922| div_loss: 0.72457| %_mask_idx: 0.4057| ppl: 176.27301| %_neg_is_pos: 0.01236| lr: 0.0| temp: 1.98866 | loss: 1.15198| constrast_loss: 4.53601| div_loss: 0.71919| %_mask_idx: 0.39787| ppl: 179.71774| %_neg_is_pos: 0.01927| lr: 0.0| temp: 1.98866 | loss: 1.15833| constrast_loss: 4.56349| div_loss: 0.69848| %_mask_idx: 0.42246| ppl: 192.97351| %_neg_is_pos: 0.01538| lr: 0.0| temp: 1.98865 | loss: 1.14636| constrast_loss: 4.51344| div_loss: 0.72004| %_mask_idx: 0.35385| ppl: 179.177| %_neg_is_pos: 0.03391| lr: 0.0| temp: 1.98865 | loss: 1.14485| constrast_loss: 4.51041| div_loss: 0.69005| %_mask_idx: 0.38221| ppl: 198.37079| %_neg_is_pos: 0.01882| lr: 0.0| temp: 1.98863 | loss: 1.15304| constrast_loss: 4.54147| div_loss: 0.70702| %_mask_idx: 0.39442| ppl: 187.50717| %_neg_is_pos: 0.01588| lr: 0.0| temp: 1.98863 | loss: 1.15527| constrast_loss: 4.54868| div_loss: 0.72394| %_mask_idx: 0.31626| ppl: 176.67969| %_neg_is_pos: 0.03267| lr: 0.0| temp: 1.98862 | loss: 1.15882| constrast_loss: 4.56665| div_loss: 0.68618| %_mask_idx: 0.44236| ppl: 200.84787| %_neg_is_pos: 0.01051| lr: 0.0| temp: 1.98862 | loss: 1.1412| constrast_loss: 4.49354| div_loss: 0.71253| %_mask_idx: 0.37155| ppl: 183.97855| %_neg_is_pos: 0.01796| lr: 0.0| temp: 1.98861 | loss: 1.14799| constrast_loss: 4.51673| div_loss: 0.75231| %_mask_idx: 0.38706| ppl: 158.51877| %_neg_is_pos: 0.02917| lr: 0.0| temp: 1.98861 | loss: 1.15396| constrast_loss: 4.54693| div_loss: 0.68914| %_mask_idx: 0.37531| ppl: 198.95174| %_neg_is_pos: 0.01706| lr: 0.0| temp: 1.9886 | loss: 1.15401| constrast_loss: 4.5469| div_loss: 0.6915| %_mask_idx: 0.40445| ppl: 197.44063| %_neg_is_pos: 0.01348| lr: 0.0| temp: 1.9886 | loss: 1.15226| constrast_loss: 4.53689| div_loss: 0.72127| %_mask_idx: 0.41823| ppl: 178.38425| %_neg_is_pos: 0.01778| lr: 0.0| temp: 1.98858 | loss: 1.15709| constrast_loss: 4.5591| div_loss: 0.69266| %_mask_idx: 0.4104| ppl: 196.69666| %_neg_is_pos: 0.01307| lr: 0.0| temp: 1.98858 | loss: 1.15526| constrast_loss: 4.55129| div_loss: 0.69765| %_mask_idx: 0.40382| ppl: 193.50452| %_neg_is_pos: 0.02898| lr: 0.0| temp: 1.98857 | loss: 1.14683| constrast_loss: 4.51649| div_loss: 0.70832| %_mask_idx: 0.44878| ppl: 186.67461| %_neg_is_pos: 0.01755| lr: 0.0| temp: 1.98857 | loss: 1.1398| constrast_loss: 4.48553| div_loss: 0.73689| %_mask_idx: 0.43844| ppl: 168.3902| %_neg_is_pos: 0.02073| lr: 0.0| temp: 1.98856 | loss: 1.14957| constrast_loss: 4.52638| div_loss: 0.71914| %_mask_idx: 0.36544| ppl: 179.75348| %_neg_is_pos: 0.02781| lr: 0.0| temp: 1.98856 | loss: 1.1564| constrast_loss: 4.55699| div_loss: 0.68597| %_mask_idx: 0.40304| ppl: 200.97742| %_neg_is_pos: 0.02369| lr: 0.0| temp: 1.98855 | loss: 1.15676| constrast_loss: 4.56004| div_loss: 0.67012| %_mask_idx: 0.44283| ppl: 211.12309| %_neg_is_pos: 0.00792| lr: 0.0| temp: 1.98855 | loss: 1.15183| constrast_loss: 4.53533| div_loss: 0.71993| %_mask_idx: 0.43421| ppl: 179.24448| %_neg_is_pos: 0.03969| lr: 0.0| temp: 1.98853 | loss: 1.14813| constrast_loss: 4.52356| div_loss: 0.68971| %_mask_idx: 0.39677| ppl: 198.58412| %_neg_is_pos: 0.02027| lr: 0.0| temp: 1.98853 | loss: 1.15735| constrast_loss: 4.55999| div_loss: 0.69403| %_mask_idx: 0.35902| ppl: 195.81898| %_neg_is_pos: 0.02123| lr: 0.0| temp: 1.98852 | loss: 1.13396| constrast_loss: 4.46292| div_loss: 0.72937| %_mask_idx: 0.37829| ppl: 173.20355| %_neg_is_pos: 0.03565| lr: 0.0| temp: 1.98852 | loss: 1.15184| constrast_loss: 4.5361| div_loss: 0.7125| %_mask_idx: 0.37312| ppl: 183.99826| %_neg_is_pos: 0.03473| lr: 0.0| temp: 1.9885 | loss: 1.14713| constrast_loss: 4.51978| div_loss: 0.68724| %_mask_idx: 0.34242| ppl: 200.16676| %_neg_is_pos: 0.03035| lr: 0.0| temp: 1.9885 | loss: 1.14474| constrast_loss: 4.50778| div_loss: 0.71169| %_mask_idx: 0.3938| ppl: 184.52019| %_neg_is_pos: 0.02736| lr: 0.0| temp: 1.98849 | loss: 1.14628| constrast_loss: 4.51502| div_loss: 0.70097| %_mask_idx: 0.3443| ppl: 191.37967| %_neg_is_pos: 0.01723| lr: 0.0| temp: 1.98849 | loss: 1.14989| constrast_loss: 4.52896| div_loss: 0.70583| %_mask_idx: 0.40727| ppl: 188.27101| %_neg_is_pos: 0.02066| lr: 0.0| temp: 1.98848 | loss: 1.14967| constrast_loss: 4.53041| div_loss: 0.6826| %_mask_idx: 0.35056| ppl: 203.13478| %_neg_is_pos: 0.02473| lr: 0.0| temp: 1.98848 | loss: 1.15282| constrast_loss: 4.54309| div_loss: 0.68177| %_mask_idx: 0.414| ppl: 203.66849| %_neg_is_pos: 0.01744| lr: 0.0| temp: 1.98847 | loss: 1.1561| constrast_loss: 4.55581| div_loss: 0.6859| %_mask_idx: 0.34947| ppl: 201.02293| %_neg_is_pos: 0.02453| lr: 0.0| temp: 1.98847 | loss: 1.14643| constrast_loss: 4.51579| div_loss: 0.69915| %_mask_idx: 0.4198| ppl: 192.54419| %_neg_is_pos: 0.0141| lr: 0.0| temp: 1.98845 | loss: 1.1517| constrast_loss: 4.53759| div_loss: 0.69215| %_mask_idx: 0.39912| ppl: 197.02496| %_neg_is_pos: 0.02458| lr: 0.0| temp: 1.98845 | loss: 1.14625| constrast_loss: 4.51338| div_loss: 0.71612| %_mask_idx: 0.35213| ppl: 181.6821| %_neg_is_pos: 0.03753| lr: 0.0| temp: 1.98844 | loss: 1.15095| constrast_loss: 4.53647| div_loss: 0.67337| %_mask_idx: 0.41259| ppl: 209.04214| %_neg_is_pos: 0.01382| lr: 0.0| temp: 1.98844 | loss: 1.14327| constrast_loss: 4.50204| div_loss: 0.71057| %_mask_idx: 0.37202| ppl: 185.2382| %_neg_is_pos: 0.02915| lr: 0.0| temp: 1.98843 | loss: 1.1514| constrast_loss: 4.53553| div_loss: 0.70067| %_mask_idx: 0.35025| ppl: 191.57129| %_neg_is_pos: 0.02142| lr: 0.0| temp: 1.98843 | loss: 1.14937| constrast_loss: 4.52994| div_loss: 0.67549| %_mask_idx: 0.39912| ppl: 207.68947| %_neg_is_pos: 0.01792| lr: 0.0| temp: 1.98842 | loss: 1.14837| constrast_loss: 4.52112| div_loss: 0.72365| %_mask_idx: 0.42732| ppl: 176.86517| %_neg_is_pos: 0.02203| lr: 0.0| temp: 1.98842 | loss: 1.13642| constrast_loss: 4.47246| div_loss: 0.73202| %_mask_idx: 0.37406| ppl: 171.5058| %_neg_is_pos: 0.03001| lr: 0.0| temp: 1.9884 | loss: 1.14042| constrast_loss: 4.48957| div_loss: 0.72123| %_mask_idx: 0.37375| ppl: 178.41376| %_neg_is_pos: 0.03424| lr: 0.0| temp: 1.9884 | loss: 1.15575| constrast_loss: 4.55597| div_loss: 0.67018| %_mask_idx: 0.34931| ppl: 211.08591| %_neg_is_pos: 0.01053| lr: 0.0| temp: 1.98839 | loss: 1.14485| constrast_loss: 4.50519| div_loss: 0.74215| %_mask_idx: 0.34367| ppl: 165.02597| %_neg_is_pos: 0.04417| lr: 0.0| temp: 1.98839 | loss: 1.13762| constrast_loss: 4.47672| div_loss: 0.73752| %_mask_idx: 0.3891| ppl: 167.98608| %_neg_is_pos: 0.02395| lr: 0.0| temp: 1.98838 | loss: 1.1403| constrast_loss: 4.48767| div_loss: 0.73515| %_mask_idx: 0.38393| ppl: 169.50317| %_neg_is_pos: 0.03078| lr: 0.0| temp: 1.98838 | loss: 1.15078| constrast_loss: 4.53119| div_loss: 0.71946| %_mask_idx: 0.39066| ppl: 179.54391| %_neg_is_pos: 0.02302| lr: 0.0| temp: 1.98837 | loss: 1.14272| constrast_loss: 4.5003| div_loss: 0.70565| %_mask_idx: 0.39301| ppl: 188.38206| %_neg_is_pos: 0.02075| lr: 0.0| temp: 1.98837 | loss: 1.16018| constrast_loss: 4.57142| div_loss: 0.69307| %_mask_idx: 0.40429| ppl: 196.43658| %_neg_is_pos: 0.02119| lr: 0.0| temp: 1.98835 | loss: 1.13094| constrast_loss: 4.44868| div_loss: 0.75064| %_mask_idx: 0.36294| ppl: 159.58722| %_neg_is_pos: 0.03992| lr: 0.0| temp: 1.98835 | loss: 1.15861| constrast_loss: 4.56446| div_loss: 0.69986| %_mask_idx: 0.40774| ppl: 192.08969| %_neg_is_pos: 0.02383| lr: 0.0| temp: 1.98834 | loss: 1.14199| constrast_loss: 4.49548| div_loss: 0.7246| %_mask_idx: 0.39897| ppl: 176.25822| %_neg_is_pos: 0.03385| lr: 0.0| temp: 1.98834 | loss: 1.15164| constrast_loss: 4.53722| div_loss: 0.69334| %_mask_idx: 0.41087| ppl: 196.26114| %_neg_is_pos: 0.01776| lr: 0.0| temp: 1.98832 | loss: 1.14036| constrast_loss: 4.4912| div_loss: 0.70229| %_mask_idx: 0.37328| ppl: 190.53333| %_neg_is_pos: 0.02227| lr: 0.0| temp: 1.98832 | loss: 1.14447| constrast_loss: 4.50828| div_loss: 0.69592| %_mask_idx: 0.3739| ppl: 194.61038| %_neg_is_pos: 0.02466| lr: 0.0| temp: 1.98831 | loss: 1.13938| constrast_loss: 4.48465| div_loss: 0.72867| %_mask_idx: 0.3537| ppl: 173.65369| %_neg_is_pos: 0.03509| lr: 0.0| temp: 1.98831 | loss: 1.15599| constrast_loss: 4.55206| div_loss: 0.71896| %_mask_idx: 0.388| ppl: 179.86258| %_neg_is_pos: 0.0151| lr: 0.0| temp: 1.9883 | loss: 1.14597| constrast_loss: 4.51462| div_loss: 0.69265| %_mask_idx: 0.34618| ppl: 196.70284| %_neg_is_pos: 0.02199| lr: 0.0| temp: 1.9883 | loss: 1.13753| constrast_loss: 4.47714| div_loss: 0.72989| %_mask_idx: 0.32973| ppl: 172.87335| %_neg_is_pos: 0.02851| lr: 0.0| temp: 1.98829 | loss: 1.14412| constrast_loss: 4.50536| div_loss: 0.71105| %_mask_idx: 0.37187| ppl: 184.92503| %_neg_is_pos: 0.03103| lr: 0.0| temp: 1.98829 | loss: 1.13972| constrast_loss: 4.48791| div_loss: 0.70983| %_mask_idx: 0.36544| ppl: 185.71078| %_neg_is_pos: 0.0305| lr: 0.0| temp: 1.98827 | loss: 1.1515| constrast_loss: 4.53385| div_loss: 0.72129| %_mask_idx: 0.37265| ppl: 178.37488| %_neg_is_pos: 0.02086| lr: 0.0| temp: 1.98827 | loss: 1.15592| constrast_loss: 4.5536| div_loss: 0.70083| %_mask_idx: 0.36873| ppl: 191.47116| %_neg_is_pos: 0.0249| lr: 0.0| temp: 1.98826 | loss: 1.14833| constrast_loss: 4.52264| div_loss: 0.70665| %_mask_idx: 0.42434| ppl: 187.74394| %_neg_is_pos: 0.0204| lr: 0.0| temp: 1.98826 | loss: 1.15056| constrast_loss: 4.53207| div_loss: 0.70161| %_mask_idx: 0.33318| ppl: 190.97098| %_neg_is_pos: 0.03315| lr: 0.0| temp: 1.98825 | loss: 1.15209| constrast_loss: 4.53937| div_loss: 0.68976| %_mask_idx: 0.40742| ppl: 198.55658| %_neg_is_pos: 0.0168| lr: 0.0| temp: 1.98825 | loss: 1.14722| constrast_loss: 4.51815| div_loss: 0.7074| %_mask_idx: 0.35229| ppl: 187.26271| %_neg_is_pos: 0.03038| lr: 0.0| temp: 1.98824 | loss: 1.14505| constrast_loss: 4.50726| div_loss: 0.72934| %_mask_idx: 0.37234| ppl: 173.21954| %_neg_is_pos: 0.023| lr: 0.0| temp: 1.98824 | loss: 1.13208| constrast_loss: 4.45636| div_loss: 0.7195| %_mask_idx: 0.38346| ppl: 179.52121| %_neg_is_pos: 0.03627| lr: 0.0| temp: 1.98822 | loss: 1.14147| constrast_loss: 4.49403| div_loss: 0.71866| %_mask_idx: 0.39709| ppl: 180.05716| %_neg_is_pos: 0.02315| lr: 0.0| temp: 1.98822 | loss: 1.14549| constrast_loss: 4.51156| div_loss: 0.70395| %_mask_idx: 0.38424| ppl: 189.47469| %_neg_is_pos: 0.01557| lr: 0.0| temp: 1.98821 | loss: 1.14007| constrast_loss: 4.49015| div_loss: 0.70146| %_mask_idx: 0.3031| ppl: 191.06549| %_neg_is_pos: 0.02521| lr: 0.0| temp: 1.98821 | loss: 1.1467| constrast_loss: 4.51653| div_loss: 0.70279| %_mask_idx: 0.38174| ppl: 190.21457| %_neg_is_pos: 0.02595| lr: 0.0| temp: 1.9882 | loss: 1.15739| constrast_loss: 4.5599| div_loss: 0.69661| %_mask_idx: 0.35276| ppl: 194.16849| %_neg_is_pos: 0.01771| lr: 0.0| temp: 1.9882 | loss: 1.1545| constrast_loss: 4.5474| div_loss: 0.70618| %_mask_idx: 0.38315| ppl: 188.04674| %_neg_is_pos: 0.01403| lr: 0.0| temp: 1.98819 | loss: 1.14195| constrast_loss: 4.49455| div_loss: 0.73255| %_mask_idx: 0.34821| ppl: 171.16989| %_neg_is_pos: 0.04152| lr: 0.0| temp: 1.98819 | loss: 1.14881| constrast_loss: 4.52499| div_loss: 0.7027| %_mask_idx: 0.375| ppl: 190.27283| %_neg_is_pos: 0.02235| lr: 0.0| temp: 1.98818 | loss: 1.15777| constrast_loss: 4.56262| div_loss: 0.68463| %_mask_idx: 0.3703| ppl: 201.8358| %_neg_is_pos: 0.02196| lr: 0.0| temp: 1.98818 | loss: 1.14682| constrast_loss: 4.51718| div_loss: 0.70101| %_mask_idx: 0.39082| ppl: 191.35571| %_neg_is_pos: 0.02022| lr: 0.0| temp: 1.98817 | loss: 1.14812| constrast_loss: 4.52135| div_loss: 0.71122| %_mask_idx: 0.36341| ppl: 184.81656| %_neg_is_pos: 0.02003| lr: 0.0| temp: 1.98817 | loss: 1.15388| constrast_loss: 4.54528| div_loss: 0.70217| %_mask_idx: 0.38722| ppl: 190.60971| %_neg_is_pos: 0.02796| lr: 0.0| temp: 1.98815 | loss: 1.14243| constrast_loss: 4.49836| div_loss: 0.71358| %_mask_idx: 0.32644| ppl: 183.30966| %_neg_is_pos: 0.03783| lr: 0.0| temp: 1.98815 | loss: 1.14283| constrast_loss: 4.50094| div_loss: 0.70393| %_mask_idx: 0.38158| ppl: 189.4821| %_neg_is_pos: 0.01503| lr: 0.0| temp: 1.98814 | loss: 1.15128| constrast_loss: 4.53368| div_loss: 0.71436| %_mask_idx: 0.414| ppl: 182.80835| %_neg_is_pos: 0.01562| lr: 0.0| temp: 1.98814 | loss: 1.1409| constrast_loss: 4.48941| div_loss: 0.74181| %_mask_idx: 0.39364| ppl: 165.23975| %_neg_is_pos: 0.02517| lr: 0.0| temp: 1.98813 | loss: 1.14808| constrast_loss: 4.52282| div_loss: 0.69502| %_mask_idx: 0.45833| ppl: 195.18799| %_neg_is_pos: 0.01871| lr: 0.0| temp: 1.98813 | loss: 1.14837| constrast_loss: 4.5203| div_loss: 0.73172| %_mask_idx: 0.35887| ppl: 171.69701| %_neg_is_pos: 0.02867| lr: 0.0| temp: 1.98812 | loss: 1.14644| constrast_loss: 4.51536| div_loss: 0.70414| %_mask_idx: 0.40695| ppl: 189.35019| %_neg_is_pos: 0.01952| lr: 0.0| temp: 1.98812 | loss: 1.15135| constrast_loss: 4.53354| div_loss: 0.71883| %_mask_idx: 0.45536| ppl: 179.94661| %_neg_is_pos: 0.01738| lr: 0.0| temp: 1.9881 | loss: 1.14693| constrast_loss: 4.51811| div_loss: 0.69605| %_mask_idx: 0.38565| ppl: 194.5303| %_neg_is_pos: 0.01686| lr: 0.0| temp: 1.9881 | loss: 1.15195| constrast_loss: 4.53905| div_loss: 0.68745| %_mask_idx: 0.37218| ppl: 200.02893| %_neg_is_pos: 0.02415| lr: 0.0| temp: 1.98809 | loss: 1.14398| constrast_loss: 4.50349| div_loss: 0.72436| %_mask_idx: 0.40179| ppl: 176.40939| %_neg_is_pos: 0.01665| lr: 0.0| temp: 1.98809 | loss: 1.14781| constrast_loss: 4.51997| div_loss: 0.71259| %_mask_idx: 0.43562| ppl: 183.94467| %_neg_is_pos: 0.01959| lr: 0.0| temp: 1.98808 | loss: 1.1398| constrast_loss: 4.48736| div_loss: 0.71825| %_mask_idx: 0.4068| ppl: 180.32297| %_neg_is_pos: 0.02622| lr: 0.0| temp: 1.98808 | loss: 1.14374| constrast_loss: 4.49952| div_loss: 0.75445| %_mask_idx: 0.37907| ppl: 157.15286| %_neg_is_pos: 0.02722| lr: 0.0| temp: 1.98807 | loss: 1.14998| constrast_loss: 4.53031| div_loss: 0.69597| %_mask_idx: 0.37766| ppl: 194.57788| %_neg_is_pos: 0.0327| lr: 0.0| temp: 1.98807 | loss: 1.14454| constrast_loss: 4.50454| div_loss: 0.7361| %_mask_idx: 0.34947| ppl: 168.89313| %_neg_is_pos: 0.02824| lr: 0.0| temp: 1.98805 | loss: 1.14097| constrast_loss: 4.49181| div_loss: 0.72055| %_mask_idx: 0.3916| ppl: 178.85019| %_neg_is_pos: 0.02925| lr: 0.0| temp: 1.98805 | loss: 1.14903| constrast_loss: 4.52595| div_loss: 0.70159| %_mask_idx: 0.43186| ppl: 190.98192| %_neg_is_pos: 0.03024| lr: 0.0| temp: 1.98804 | loss: 1.14665| constrast_loss: 4.51753| div_loss: 0.69079| %_mask_idx: 0.37672| ppl: 197.89514| %_neg_is_pos: 0.0221| lr: 0.0| temp: 1.98804 | loss: 1.14472| constrast_loss: 4.50731| div_loss: 0.71584| %_mask_idx: 0.40132| ppl: 181.86543| %_neg_is_pos: 0.0214| lr: 0.0| temp: 1.98803 | loss: 1.15366| constrast_loss: 4.54659| div_loss: 0.68059| %_mask_idx: 0.43327| ppl: 204.42557| %_neg_is_pos: 0.00885| lr: 0.0| temp: 1.98803 | loss: 1.1369| constrast_loss: 4.47524| div_loss: 0.7235| %_mask_idx: 0.36529| ppl: 176.96027| %_neg_is_pos: 0.03176| lr: 0.0| temp: 1.98802 | loss: 1.13904| constrast_loss: 4.4843| div_loss: 0.71847| %_mask_idx: 0.39662| ppl: 180.17674| %_neg_is_pos: 0.02928| lr: 0.0| temp: 1.98802 | loss: 1.15114| constrast_loss: 4.53207| div_loss: 0.72511| %_mask_idx: 0.39568| ppl: 175.93185| %_neg_is_pos: 0.0373| lr: 0.0| temp: 1.988 | loss: 1.16247| constrast_loss: 4.57679| div_loss: 0.73097| %_mask_idx: 0.41432| ppl: 172.17654| %_neg_is_pos: 0.01943| lr: 0.0| temp: 1.988 | loss: 1.15466| constrast_loss: 4.54685| div_loss: 0.718| %_mask_idx: 0.37484| ppl: 180.48093| %_neg_is_pos: 0.0185| lr: 0.0| temp: 1.98799 | loss: 1.1463| constrast_loss: 4.51482| div_loss: 0.70383| %_mask_idx: 0.34868| ppl: 189.54648| %_neg_is_pos: 0.03505| lr: 0.0| temp: 1.98799 [2021-09-01 23:52:44,522] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 128.0, reducing to 64.0 [2021-09-01 23:52:44,522] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 128.0, reducing to 64.0 | loss: 1.14275| constrast_loss: 4.50074| div_loss: 0.70258| %_mask_idx: 0.43296| ppl: 190.34897| %_neg_is_pos: 0.0172| lr: 0.0| temp: 1.98797 | loss: 1.14574| constrast_loss: 4.51063| div_loss: 0.72309| %_mask_idx: 0.38127| ppl: 177.22316| %_neg_is_pos: 0.02413| lr: 0.0| temp: 1.98797 | loss: 1.15417| constrast_loss: 4.54907| div_loss: 0.67598| %_mask_idx: 0.41338| ppl: 207.3718| %_neg_is_pos: 0.01849| lr: 0.0| temp: 1.98796 | loss: 1.13994| constrast_loss: 4.48667| div_loss: 0.73075| %_mask_idx: 0.36216| ppl: 172.31686| %_neg_is_pos: 0.03205| lr: 0.0| temp: 1.98796 | loss: 1.1514| constrast_loss: 4.53719| div_loss: 0.68433| %_mask_idx: 0.42325| ppl: 202.02631| %_neg_is_pos: 0.01451| lr: 0.0| temp: 1.98795 | loss: 1.15001| constrast_loss: 4.52961| div_loss: 0.70431| %_mask_idx: 0.39411| ppl: 189.24318| %_neg_is_pos: 0.01758| lr: 0.0| temp: 1.98795 | loss: 1.13048| constrast_loss: 4.44654| div_loss: 0.75391| %_mask_idx: 0.38706| ppl: 157.49924| %_neg_is_pos: 0.02618| lr: 0.0| temp: 1.98794 | loss: 1.1461| constrast_loss: 4.51054| div_loss: 0.73851| %_mask_idx: 0.39975| ppl: 167.35669| %_neg_is_pos: 0.03027| lr: 0.0| temp: 1.98794 | loss: 1.15285| constrast_loss: 4.54261| div_loss: 0.68772| %_mask_idx: 0.40664| ppl: 199.85876| %_neg_is_pos: 0.01139| lr: 0.0| temp: 1.98792 | loss: 1.14682| constrast_loss: 4.51341| div_loss: 0.73874| %_mask_idx: 0.39348| ppl: 167.20688| %_neg_is_pos: 0.03497| lr: 0.0| temp: 1.98792 | loss: 1.12905| constrast_loss: 4.44237| div_loss: 0.73827| %_mask_idx: 0.36638| ppl: 167.50679| %_neg_is_pos: 0.04769| lr: 0.0| temp: 1.98791 | loss: 1.14289| constrast_loss: 4.50235| div_loss: 0.69214| %_mask_idx: 0.37892| ppl: 197.02765| %_neg_is_pos: 0.02564| lr: 0.0| temp: 1.98791 | loss: 1.14671| constrast_loss: 4.51759| div_loss: 0.69228| %_mask_idx: 0.40492| ppl: 196.9391| %_neg_is_pos: 0.01193| lr: 0.0| temp: 1.9879 | loss: 1.13016| constrast_loss: 4.44601| div_loss: 0.74637| %_mask_idx: 0.38315| ppl: 162.32152| %_neg_is_pos: 0.03961| lr: 0.0| temp: 1.9879 | loss: 1.14742| constrast_loss: 4.51796| div_loss: 0.7173| %_mask_idx: 0.41964| ppl: 180.92616| %_neg_is_pos: 0.02314| lr: 0.0| temp: 1.98789 | loss: 1.15202| constrast_loss: 4.53746| div_loss: 0.70614| %_mask_idx: 0.38675| ppl: 188.07019| %_neg_is_pos: 0.01983| lr: 0.0| temp: 1.98789 | loss: 1.15281| constrast_loss: 4.5399| div_loss: 0.7136| %_mask_idx: 0.35182| ppl: 183.29443| %_neg_is_pos: 0.03417| lr: 0.0| temp: 1.98787 | loss: 1.15619| constrast_loss: 4.55832| div_loss: 0.6645| %_mask_idx: 0.41416| ppl: 214.72171| %_neg_is_pos: 0.01152| lr: 0.0| temp: 1.98787 | loss: 1.15415| constrast_loss: 4.54493| div_loss: 0.71665| %_mask_idx: 0.38706| ppl: 181.34135| %_neg_is_pos: 0.02433| lr: 0.0| temp: 1.98786 | loss: 1.14329| constrast_loss: 4.50439| div_loss: 0.68771| %_mask_idx: 0.3985| ppl: 199.86816| %_neg_is_pos: 0.02084| lr: 0.0| temp: 1.98786 | loss: 1.15359| constrast_loss: 4.54677| div_loss: 0.67572| %_mask_idx: 0.37751| ppl: 207.54237| %_neg_is_pos: 0.02123| lr: 0.0| temp: 1.98785 | loss: 1.13402| constrast_loss: 4.46236| div_loss: 0.73718| %_mask_idx: 0.36685| ppl: 168.20747| %_neg_is_pos: 0.03779| lr: 0.0| temp: 1.98785 | loss: 1.14443| constrast_loss: 4.50832| div_loss: 0.69401| %_mask_idx: 0.42137| ppl: 195.8342| %_neg_is_pos: 0.01818| lr: 0.0| temp: 1.98784 | loss: 1.14729| constrast_loss: 4.5207| div_loss: 0.68451| %_mask_idx: 0.40006| ppl: 201.91251| %_neg_is_pos: 0.01864| lr: 0.0| temp: 1.98784 | loss: 1.14856| constrast_loss: 4.52211| div_loss: 0.7214| %_mask_idx: 0.39693| ppl: 178.30637| %_neg_is_pos: 0.03163| lr: 0.0| temp: 1.98782 | loss: 1.13822| constrast_loss: 4.48078| div_loss: 0.72091| %_mask_idx: 0.37594| ppl: 178.61548| %_neg_is_pos: 0.02729| lr: 0.0| temp: 1.98782 | loss: 1.13717| constrast_loss: 4.47934| div_loss: 0.69359| %_mask_idx: 0.35589| ppl: 196.10339| %_neg_is_pos: 0.02476| lr: 0.0| temp: 1.98781 | loss: 1.1397| constrast_loss: 4.4881| div_loss: 0.70718| %_mask_idx: 0.38001| ppl: 187.40228| %_neg_is_pos: 0.03184| lr: 0.0| temp: 1.98781 | loss: 1.14429| constrast_loss: 4.50785| div_loss: 0.69317| %_mask_idx: 0.33991| ppl: 196.37271| %_neg_is_pos: 0.02457| lr: 0.0| temp: 1.98779 | loss: 1.13841| constrast_loss: 4.48215| div_loss: 0.71472| %_mask_idx: 0.37719| ppl: 182.57788| %_neg_is_pos: 0.03736| lr: 0.0| temp: 1.98779 | loss: 1.14051| constrast_loss: 4.49547| div_loss: 0.66573| %_mask_idx: 0.41463| ppl: 213.93294| %_neg_is_pos: 0.01972| lr: 0.0| temp: 1.98778 | loss: 1.14089| constrast_loss: 4.49297| div_loss: 0.70581| %_mask_idx: 0.4234| ppl: 188.2829| %_neg_is_pos: 0.01894| lr: 0.0| temp: 1.98778 | loss: 1.14033| constrast_loss: 4.49081| div_loss: 0.70524| %_mask_idx: 0.39113| ppl: 188.64957| %_neg_is_pos: 0.02899| lr: 0.0| temp: 1.98777 | loss: 1.13353| constrast_loss: 4.46023| div_loss: 0.73908| %_mask_idx: 0.35824| ppl: 166.98721| %_neg_is_pos: 0.03344| lr: 0.0| temp: 1.98777 | loss: 1.1412| constrast_loss: 4.49437| div_loss: 0.70432| %_mask_idx: 0.39411| ppl: 189.23558| %_neg_is_pos: 0.03848| lr: 0.0| temp: 1.98776 | loss: 1.14907| constrast_loss: 4.5293| div_loss: 0.66992| %_mask_idx: 0.40523| ppl: 211.25435| %_neg_is_pos: 0.01671| lr: 0.0| temp: 1.98776 | loss: 1.13898| constrast_loss: 4.48624| div_loss: 0.69659| %_mask_idx: 0.34618| ppl: 194.17937| %_neg_is_pos: 0.0281| lr: 0.0| temp: 1.98774 | loss: 1.14478| constrast_loss: 4.51047| div_loss: 0.68649| %_mask_idx: 0.42011| ppl: 200.64716| %_neg_is_pos: 0.02684| lr: 0.0| temp: 1.98774 | loss: 1.13354| constrast_loss: 4.4628| div_loss: 0.71354| %_mask_idx: 0.41165| ppl: 183.33145| %_neg_is_pos: 0.02517| lr: 0.0| temp: 1.98773 | loss: 1.14704| constrast_loss: 4.52097| div_loss: 0.67201| %_mask_idx: 0.4209| ppl: 209.91113| %_neg_is_pos: 0.02416| lr: 0.0| temp: 1.98773 | loss: 1.13604| constrast_loss: 4.47368| div_loss: 0.70471| %_mask_idx: 0.32268| ppl: 188.98398| %_neg_is_pos: 0.0407| lr: 0.0| temp: 1.98772 | loss: 1.14731| constrast_loss: 4.51954| div_loss: 0.69677| %_mask_idx: 0.37218| ppl: 194.06779| %_neg_is_pos: 0.02825| lr: 0.0| temp: 1.98772 | loss: 1.1361| constrast_loss: 4.47504| div_loss: 0.69367| %_mask_idx: 0.34477| ppl: 196.05278| %_neg_is_pos: 0.02578| lr: 0.0| temp: 1.98771 | loss: 1.13802| constrast_loss: 4.48334| div_loss: 0.68739| %_mask_idx: 0.43374| ppl: 200.07291| %_neg_is_pos: 0.02731| lr: 0.0| temp: 1.98771 | loss: 1.14421| constrast_loss: 4.50573| div_loss: 0.71112| %_mask_idx: 0.39677| ppl: 184.88113| %_neg_is_pos: 0.0207| lr: 0.0| temp: 1.98769 | loss: 1.14934| constrast_loss: 4.53031| div_loss: 0.67055| %_mask_idx: 0.41056| ppl: 210.85098| %_neg_is_pos: 0.013| lr: 0.0| temp: 1.98769 | loss: 1.13179| constrast_loss: 4.45747| div_loss: 0.69688| %_mask_idx: 0.37061| ppl: 193.99371| %_neg_is_pos: 0.01911| lr: 0.0| temp: 1.98768 | loss: 1.13439| constrast_loss: 4.465| div_loss: 0.72556| %_mask_idx: 0.43029| ppl: 175.64232| %_neg_is_pos: 0.03012| lr: 0.0| temp: 1.98768 | loss: 1.14938| constrast_loss: 4.52867| div_loss: 0.6885| %_mask_idx: 0.4093| ppl: 199.35709| %_neg_is_pos: 0.02783| lr: 0.0| temp: 1.98767 | loss: 1.14498| constrast_loss: 4.51093| div_loss: 0.68984| %_mask_idx: 0.43186| ppl: 198.5043| %_neg_is_pos: 0.02803| lr: 0.0| temp: 1.98767 | loss: 1.14075| constrast_loss: 4.48917| div_loss: 0.73841| %_mask_idx: 0.34853| ppl: 167.4155| %_neg_is_pos: 0.03531| lr: 0.0| temp: 1.98766 | loss: 1.14894| constrast_loss: 4.52527| div_loss: 0.7051| %_mask_idx: 0.39505| ppl: 188.73523| %_neg_is_pos: 0.03485| lr: 0.0| temp: 1.98766 | loss: 1.13504| constrast_loss: 4.46928| div_loss: 0.70893| %_mask_idx: 0.35495| ppl: 186.28458| %_neg_is_pos: 0.03343| lr: 0.0| temp: 1.98764 | loss: 1.13744| constrast_loss: 4.47858| div_loss: 0.71195| %_mask_idx: 0.42027| ppl: 184.35443| %_neg_is_pos: 0.02757| lr: 0.0| temp: 1.98764 | loss: 1.13709| constrast_loss: 4.47775| div_loss: 0.70627| %_mask_idx: 0.3584| ppl: 187.98836| %_neg_is_pos: 0.02497| lr: 0.0| temp: 1.98763 | loss: 1.15043| constrast_loss: 4.5311| div_loss: 0.70621| %_mask_idx: 0.39192| ppl: 188.02658| %_neg_is_pos: 0.02413| lr: 0.0| temp: 1.98763 | loss: 1.14722| constrast_loss: 4.51652| div_loss: 0.72356| %_mask_idx: 0.39724| ppl: 176.92238| %_neg_is_pos: 0.03528| lr: 0.0| temp: 1.98761 | loss: 1.13431| constrast_loss: 4.46699| div_loss: 0.70235| %_mask_idx: 0.32268| ppl: 190.49811| %_neg_is_pos: 0.0351| lr: 0.0| temp: 1.98761 | loss: 1.13569| constrast_loss: 4.47067| div_loss: 0.72082| %_mask_idx: 0.3891| ppl: 178.67212| %_neg_is_pos: 0.04134| lr: 0.0| temp: 1.9876 | loss: 1.14949| constrast_loss: 4.52988| div_loss: 0.68061| %_mask_idx: 0.43562| ppl: 204.40741| %_neg_is_pos: 0.01564| lr: 0.0| temp: 1.9876 | loss: 1.14529| constrast_loss: 4.51172| div_loss: 0.69433| %_mask_idx: 0.42325| ppl: 195.62985| %_neg_is_pos: 0.01694| lr: 0.0| temp: 1.98759 | loss: 1.1446| constrast_loss: 4.50747| div_loss: 0.70912| %_mask_idx: 0.37798| ppl: 186.16013| %_neg_is_pos: 0.02553| lr: 0.0| temp: 1.98759 | loss: 1.12899| constrast_loss: 4.44456| div_loss: 0.71411| %_mask_idx: 0.32738| ppl: 182.96828| %_neg_is_pos: 0.03306| lr: 0.0| temp: 1.98758 | loss: 1.13966| constrast_loss: 4.4863| div_loss: 0.72344| %_mask_idx: 0.4057| ppl: 177.00052| %_neg_is_pos: 0.02546| lr: 0.0| temp: 1.98758 | loss: 1.13921| constrast_loss: 4.48774| div_loss: 0.69115| %_mask_idx: 0.39834| ppl: 197.66383| %_neg_is_pos: 0.02063| lr: 0.0| temp: 1.98756 | loss: 1.12516| constrast_loss: 4.42773| div_loss: 0.72897| %_mask_idx: 0.37046| ppl: 173.46017| %_neg_is_pos: 0.02972| lr: 0.0| temp: 1.98756 | loss: 1.1438| constrast_loss: 4.50636| div_loss: 0.68828| %_mask_idx: 0.39881| ppl: 199.5007| %_neg_is_pos: 0.01807| lr: 0.0| temp: 1.98755 | loss: 1.13418| constrast_loss: 4.46677| div_loss: 0.69957| %_mask_idx: 0.40351| ppl: 192.27548| %_neg_is_pos: 0.02675| lr: 0.0| temp: 1.98755 | loss: 1.13879| constrast_loss: 4.48319| div_loss: 0.71957| %_mask_idx: 0.39521| ppl: 179.47415| %_neg_is_pos: 0.04075| lr: 0.0| temp: 1.98754 | loss: 1.1403| constrast_loss: 4.49331| div_loss: 0.67886| %_mask_idx: 0.38784| ppl: 205.52888| %_neg_is_pos: 0.02589| lr: 0.0| temp: 1.98754 | loss: 1.1324| constrast_loss: 4.45747| div_loss: 0.72143| %_mask_idx: 0.3808| ppl: 178.28473| %_neg_is_pos: 0.03306| lr: 0.0| temp: 1.98753 | loss: 1.14939| constrast_loss: 4.52837| div_loss: 0.6918| %_mask_idx: 0.38643| ppl: 197.24622| %_neg_is_pos: 0.02368| lr: 0.0| temp: 1.98753 | loss: 1.14636| constrast_loss: 4.51771| div_loss: 0.67714| %_mask_idx: 0.41573| ppl: 206.62743| %_neg_is_pos: 0.02525| lr: 0.0| temp: 1.98751 | loss: 1.14811| constrast_loss: 4.52478| div_loss: 0.67654| %_mask_idx: 0.39364| ppl: 207.01274| %_neg_is_pos: 0.0159| lr: 0.0| temp: 1.98751 | loss: 1.1386| constrast_loss: 4.48267| div_loss: 0.71735| %_mask_idx: 0.35636| ppl: 180.89499| %_neg_is_pos: 0.03631| lr: 0.0| temp: 1.9875 | loss: 1.14666| constrast_loss: 4.51849| div_loss: 0.68141| %_mask_idx: 0.38534| ppl: 203.89841| %_neg_is_pos: 0.01299| lr: 0.0| temp: 1.9875 | loss: 1.13266| constrast_loss: 4.46026| div_loss: 0.70393| %_mask_idx: 0.42293| ppl: 189.48798| %_neg_is_pos: 0.02278| lr: 0.0| temp: 1.98749 | loss: 1.1429| constrast_loss: 4.50283| div_loss: 0.68764| %_mask_idx: 0.36435| ppl: 199.90736| %_neg_is_pos: 0.03631| lr: 0.0| temp: 1.98749 | loss: 1.15027| constrast_loss: 4.53414| div_loss: 0.66945| %_mask_idx: 0.39129| ppl: 211.54904| %_neg_is_pos: 0.03353| lr: 0.0| temp: 1.98748 | loss: 1.14006| constrast_loss: 4.49184| div_loss: 0.68384| %_mask_idx: 0.38957| ppl: 202.34079| %_neg_is_pos: 0.02731| lr: 0.0| temp: 1.98748 | loss: 1.14018| constrast_loss: 4.48864| div_loss: 0.72064| %_mask_idx: 0.35432| ppl: 178.79114| %_neg_is_pos: 0.05079| lr: 0.0| temp: 1.98746 | loss: 1.13148| constrast_loss: 4.45486| div_loss: 0.7105| %_mask_idx: 0.38362| ppl: 185.27911| %_neg_is_pos: 0.04234| lr: 0.0| temp: 1.98746 | loss: 1.14339| constrast_loss: 4.50618| div_loss: 0.67373| %_mask_idx: 0.37202| ppl: 208.81247| %_neg_is_pos: 0.02172| lr: 0.0| temp: 1.98745 | loss: 1.14582| constrast_loss: 4.51114| div_loss: 0.72141| %_mask_idx: 0.42967| ppl: 178.29916| %_neg_is_pos: 0.02699| lr: 0.0| temp: 1.98745 | loss: 1.13251| constrast_loss: 4.45909| div_loss: 0.70939| %_mask_idx: 0.35589| ppl: 185.98837| %_neg_is_pos: 0.04048| lr: 0.0| temp: 1.98743 | loss: 1.14407| constrast_loss: 4.50862| div_loss: 0.67649| %_mask_idx: 0.39427| ppl: 207.04694| %_neg_is_pos: 0.02011| lr: 0.0| temp: 1.98743 | loss: 1.13408| constrast_loss: 4.46809| div_loss: 0.68227| %_mask_idx: 0.37531| ppl: 203.34808| %_neg_is_pos: 0.02372| lr: 0.0| temp: 1.98742 | loss: 1.14376| constrast_loss: 4.50496| div_loss: 0.70081| %_mask_idx: 0.39098| ppl: 191.48122| %_neg_is_pos: 0.02718| lr: 0.0| temp: 1.98742 | loss: 1.13928| constrast_loss: 4.48652| div_loss: 0.70604| %_mask_idx: 0.36263| ppl: 188.13713| %_neg_is_pos: 0.02665| lr: 0.0| temp: 1.98741 | loss: 1.13433| constrast_loss: 4.46567| div_loss: 0.71634| %_mask_idx: 0.30232| ppl: 181.53986| %_neg_is_pos: 0.03837| lr: 0.0| temp: 1.98741 | loss: 1.14603| constrast_loss: 4.51547| div_loss: 0.68642| %_mask_idx: 0.36169| ppl: 200.68808| %_neg_is_pos: 0.01992| lr: 0.0| temp: 1.9874 | loss: 1.14038| constrast_loss: 4.48871| div_loss: 0.72791| %_mask_idx: 0.38142| ppl: 174.13818| %_neg_is_pos: 0.04091| lr: 0.0| temp: 1.9874 | loss: 1.14051| constrast_loss: 4.49316| div_loss: 0.68876| %_mask_idx: 0.44846| ppl: 199.19121| %_neg_is_pos: 0.01202| lr: 0.0| temp: 1.98738 | loss: 1.14216| constrast_loss: 4.4988| div_loss: 0.69843| %_mask_idx: 0.36294| ppl: 193.00748| %_neg_is_pos: 0.02624| lr: 0.0| temp: 1.98738 | loss: 1.13375| constrast_loss: 4.46182| div_loss: 0.73182| %_mask_idx: 0.40648| ppl: 171.63458| %_neg_is_pos: 0.02839| lr: 0.0| temp: 1.98737 | loss: 1.14846| constrast_loss: 4.52603| div_loss: 0.678| %_mask_idx: 0.39411| ppl: 206.07919| %_neg_is_pos: 0.02236| lr: 0.0| temp: 1.98737 | loss: 1.12504| constrast_loss: 4.42692| div_loss: 0.73254| %_mask_idx: 0.36544| ppl: 171.17508| %_neg_is_pos: 0.02972| lr: 0.0| temp: 1.98736 | loss: 1.12558| constrast_loss: 4.42883| div_loss: 0.73502| %_mask_idx: 0.38221| ppl: 169.58939| %_neg_is_pos: 0.04329| lr: 0.0| temp: 1.98736 | loss: 1.15292| constrast_loss: 4.54493| div_loss: 0.66743| %_mask_idx: 0.42716| ppl: 212.84529| %_neg_is_pos: 0.0183| lr: 0.0| temp: 1.98735 | loss: 1.13533| constrast_loss: 4.47013| div_loss: 0.71193| %_mask_idx: 0.38831| ppl: 184.36501| %_neg_is_pos: 0.02593| lr: 0.0| temp: 1.98735 | loss: 1.13448| constrast_loss: 4.46306| div_loss: 0.74846| %_mask_idx: 0.39427| ppl: 160.98457| %_neg_is_pos: 0.0239| lr: 0.0| temp: 1.98733 | loss: 1.13707| constrast_loss: 4.47852| div_loss: 0.69763| %_mask_idx: 0.47039| ppl: 193.51952| %_neg_is_pos: 0.01794| lr: 0.0| temp: 1.98733 | loss: 1.13756| constrast_loss: 4.48021| div_loss: 0.70048| %_mask_idx: 0.38095| ppl: 191.6947| %_neg_is_pos: 0.03324| lr: 0.0| temp: 1.98732 | loss: 1.13816| constrast_loss: 4.47806| div_loss: 0.74574| %_mask_idx: 0.34931| ppl: 162.72874| %_neg_is_pos: 0.04996| lr: 0.0| temp: 1.98732 | loss: 1.14121| constrast_loss: 4.49624| div_loss: 0.68591| %_mask_idx: 0.41667| ppl: 201.01836| %_neg_is_pos: 0.02176| lr: 0.0| temp: 1.98731 | loss: 1.13022| constrast_loss: 4.44919| div_loss: 0.71673| %_mask_idx: 0.39364| ppl: 181.29184| %_neg_is_pos: 0.03124| lr: 0.0| temp: 1.98731 | loss: 1.13957| constrast_loss: 4.48527| div_loss: 0.73002| %_mask_idx: 0.41526| ppl: 172.78738| %_neg_is_pos: 0.032| lr: 0.0| temp: 1.9873 | loss: 1.14267| constrast_loss: 4.50311| div_loss: 0.67581| %_mask_idx: 0.40664| ppl: 207.48468| %_neg_is_pos: 0.02324| lr: 0.0| temp: 1.9873 | loss: 1.15237| constrast_loss: 4.54361| div_loss: 0.65878| %_mask_idx: 0.40555| ppl: 218.37886| %_neg_is_pos: 0.02208| lr: 0.0| temp: 1.98728 | loss: 1.14059| constrast_loss: 4.49418| div_loss: 0.68176| %_mask_idx: 0.39489| ppl: 203.6747| %_neg_is_pos: 0.01914| lr: 0.0| temp: 1.98728 | loss: 1.14614| constrast_loss: 4.5165| div_loss: 0.68045| %_mask_idx: 0.38283| ppl: 204.51221| %_neg_is_pos: 0.01316| lr: 0.0| temp: 1.98727 | loss: 1.14039| constrast_loss: 4.49423| div_loss: 0.67331| %_mask_idx: 0.41776| ppl: 209.07925| %_neg_is_pos: 0.02386| lr: 0.0| temp: 1.98727 [2021-09-02 00:02:00,688] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 64.0, reducing to 32.0 [2021-09-02 00:02:00,688] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 64.0, reducing to 32.0 | loss: 1.14314| constrast_loss: 4.50396| div_loss: 0.68611| %_mask_idx: 0.39145| ppl: 200.88962| %_neg_is_pos: 0.02784| lr: 0.0| temp: 1.98725 | loss: 1.15186| constrast_loss: 4.54039| div_loss: 0.6705| %_mask_idx: 0.43954| ppl: 210.87924| %_neg_is_pos: 0.01646| lr: 0.0| temp: 1.98725 | loss: 1.14654| constrast_loss: 4.51936| div_loss: 0.66808| %_mask_idx: 0.38174| ppl: 212.42593| %_neg_is_pos: 0.01516| lr: 0.0| temp: 1.98724 | loss: 1.14944| constrast_loss: 4.52929| div_loss: 0.68465| %_mask_idx: 0.388| ppl: 201.8268| %_neg_is_pos: 0.02978| lr: 0.0| temp: 1.98724 | loss: 1.13593| constrast_loss: 4.4719| div_loss: 0.7181| %_mask_idx: 0.38941| ppl: 180.41721| %_neg_is_pos: 0.02316| lr: 0.0| temp: 1.98723 | loss: 1.13789| constrast_loss: 4.48075| div_loss: 0.70822| %_mask_idx: 0.40367| ppl: 186.742| %_neg_is_pos: 0.02709| lr: 0.0| temp: 1.98723 | loss: 1.13301| constrast_loss: 4.45936| div_loss: 0.7267| %_mask_idx: 0.35401| ppl: 174.91385| %_neg_is_pos: 0.04178| lr: 0.0| temp: 1.98722 | loss: 1.13278| constrast_loss: 4.45911| div_loss: 0.71999| %_mask_idx: 0.37077| ppl: 179.20663| %_neg_is_pos: 0.02233| lr: 0.0| temp: 1.98722 | loss: 1.13176| constrast_loss: 4.45623| div_loss: 0.70822| %_mask_idx: 0.37296| ppl: 186.73672| %_neg_is_pos: 0.04065| lr: 0.0| temp: 1.9872 | loss: 1.1463| constrast_loss: 4.5167| div_loss: 0.68503| %_mask_idx: 0.41792| ppl: 201.5791| %_neg_is_pos: 0.02108| lr: 0.0| temp: 1.9872 | loss: 1.1371| constrast_loss: 4.4787| div_loss: 0.69715| %_mask_idx: 0.39223| ppl: 193.82526| %_neg_is_pos: 0.03277| lr: 0.0| temp: 1.98719 | loss: 1.13723| constrast_loss: 4.48029| div_loss: 0.68641| %_mask_idx: 0.388| ppl: 200.70004| %_neg_is_pos: 0.03212| lr: 0.0| temp: 1.98719 | loss: 1.13694| constrast_loss: 4.47986| div_loss: 0.67894| %_mask_idx: 0.39771| ppl: 205.48035| %_neg_is_pos: 0.02699| lr: 0.0| temp: 1.98718 | loss: 1.14602| constrast_loss: 4.51572| div_loss: 0.68347| %_mask_idx: 0.35072| ppl: 202.57887| %_neg_is_pos: 0.02468| lr: 0.0| temp: 1.98718 | loss: 1.13732| constrast_loss: 4.48116| div_loss: 0.68136| %_mask_idx: 0.41698| ppl: 203.9296| %_neg_is_pos: 0.02373| lr: 0.0| temp: 1.98717 | loss: 1.13379| constrast_loss: 4.46551| div_loss: 0.69632| %_mask_idx: 0.33318| ppl: 194.35376| %_neg_is_pos: 0.03089| lr: 0.0| temp: 1.98717 | loss: 1.12624| constrast_loss: 4.43293| div_loss: 0.72027| %_mask_idx: 0.3407| ppl: 179.02466| %_neg_is_pos: 0.03339| lr: 0.0| temp: 1.98715 | loss: 1.14027| constrast_loss: 4.49122| div_loss: 0.69861| %_mask_idx: 0.38111| ppl: 192.88728| %_neg_is_pos: 0.0178| lr: 0.0| temp: 1.98715 | loss: 1.1413| constrast_loss: 4.49475| div_loss: 0.70443| %_mask_idx: 0.36764| ppl: 189.16766| %_neg_is_pos: 0.02833| lr: 0.0| temp: 1.98714 | loss: 1.13216| constrast_loss: 4.45769| div_loss: 0.70964| %_mask_idx: 0.4093| ppl: 185.82777| %_neg_is_pos: 0.02508| lr: 0.0| temp: 1.98714 | loss: 1.13706| constrast_loss: 4.48046| div_loss: 0.67771| %_mask_idx: 0.37014| ppl: 206.26706| %_neg_is_pos: 0.03333| lr: 0.0| temp: 1.98713 | loss: 1.13506| constrast_loss: 4.46937| div_loss: 0.70884| %_mask_idx: 0.39803| ppl: 186.34106| %_neg_is_pos: 0.02132| lr: 0.0| temp: 1.98713 | loss: 1.14019| constrast_loss: 4.49459| div_loss: 0.6616| %_mask_idx: 0.41479| ppl: 216.5733| %_neg_is_pos: 0.01076| lr: 0.0| temp: 1.98712 | loss: 1.13569| constrast_loss: 4.47349| div_loss: 0.69252| %_mask_idx: 0.36858| ppl: 196.78714| %_neg_is_pos: 0.02945| lr: 0.0| temp: 1.98712 | loss: 1.13049| constrast_loss: 4.45234| div_loss: 0.6962| %_mask_idx: 0.3249| ppl: 194.4296| %_neg_is_pos: 0.05083| lr: 0.0| temp: 1.9871 | loss: 1.12962| constrast_loss: 4.44653| div_loss: 0.71955| %_mask_idx: 0.43014| ppl: 179.48685| %_neg_is_pos: 0.04229| lr: 0.0| temp: 1.9871 | loss: 1.12711| constrast_loss: 4.43909| div_loss: 0.6936| %_mask_idx: 0.42246| ppl: 196.09335| %_neg_is_pos: 0.02625| lr: 0.0| temp: 1.98709 | loss: 1.13494| constrast_loss: 4.47006| div_loss: 0.69714| %_mask_idx: 0.40241| ppl: 193.83112| %_neg_is_pos: 0.01284| lr: 0.0| temp: 1.98709 | loss: 1.1263| constrast_loss: 4.43679| div_loss: 0.68408| %_mask_idx: 0.43343| ppl: 202.1913| %_neg_is_pos: 0.0296| lr: 0.0| temp: 1.98707 | loss: 1.12702| constrast_loss: 4.4373| div_loss: 0.70796| %_mask_idx: 0.38377| ppl: 186.90814| %_neg_is_pos: 0.03386| lr: 0.0| temp: 1.98707 | loss: 1.13754| constrast_loss: 4.48231| div_loss: 0.67859| %_mask_idx: 0.37359| ppl: 205.70445| %_neg_is_pos: 0.03233| lr: 0.0| temp: 1.98706 | loss: 1.12358| constrast_loss: 4.42506| div_loss: 0.6927| %_mask_idx: 0.39505| ppl: 196.67023| %_neg_is_pos: 0.05949| lr: 0.0| temp: 1.98706 | loss: 1.13945| constrast_loss: 4.48945| div_loss: 0.68355| %_mask_idx: 0.40147| ppl: 202.52794| %_neg_is_pos: 0.02302| lr: 0.0| temp: 1.98705 | loss: 1.14951| constrast_loss: 4.53224| div_loss: 0.65786| %_mask_idx: 0.41071| ppl: 218.96689| %_neg_is_pos: 0.0141| lr: 0.0| temp: 1.98705 | loss: 1.13704| constrast_loss: 4.4781| div_loss: 0.7008| %_mask_idx: 0.40476| ppl: 191.4892| %_neg_is_pos: 0.04421| lr: 0.0| temp: 1.98704 | loss: 1.13298| constrast_loss: 4.46386| div_loss: 0.68045| %_mask_idx: 0.37312| ppl: 204.51501| %_neg_is_pos: 0.02197| lr: 0.0| temp: 1.98704 | loss: 1.1371| constrast_loss: 4.47952| div_loss: 0.68901| %_mask_idx: 0.35965| ppl: 199.03345| %_neg_is_pos: 0.02357| lr: 0.0| temp: 1.98702 | loss: 1.14155| constrast_loss: 4.4975| div_loss: 0.68709| %_mask_idx: 0.37704| ppl: 200.26392| %_neg_is_pos: 0.01904| lr: 0.0| temp: 1.98702 | loss: 1.13618| constrast_loss: 4.47752| div_loss: 0.67213| %_mask_idx: 0.34884| ppl: 209.83704| %_neg_is_pos: 0.03098| lr: 0.0| temp: 1.98701 | loss: 1.12064| constrast_loss: 4.41297| div_loss: 0.69588| %_mask_idx: 0.39051| ppl: 194.63535| %_neg_is_pos: 0.04036| lr: 0.0| temp: 1.98701 | loss: 1.12629| constrast_loss: 4.43488| div_loss: 0.70267| %_mask_idx: 0.39818| ppl: 190.29305| %_neg_is_pos: 0.03947| lr: 0.0| temp: 1.987 | loss: 1.12293| constrast_loss: 4.419| div_loss: 0.72725| %_mask_idx: 0.40414| ppl: 174.56293| %_neg_is_pos: 0.02691| lr: 0.0| temp: 1.987 | loss: 1.12787| constrast_loss: 4.44344| div_loss: 0.68047| %_mask_idx: 0.40836| ppl: 204.49796| %_neg_is_pos: 0.02476| lr: 0.0| temp: 1.98699 | loss: 1.14074| constrast_loss: 4.49523| div_loss: 0.6775| %_mask_idx: 0.40836| ppl: 206.40186| %_neg_is_pos: 0.02404| lr: 0.0| temp: 1.98699 | loss: 1.13296| constrast_loss: 4.46092| div_loss: 0.70933| %_mask_idx: 0.3739| ppl: 186.02771| %_neg_is_pos: 0.04499| lr: 0.0| temp: 1.98697 | loss: 1.13367| constrast_loss: 4.46636| div_loss: 0.68324| %_mask_idx: 0.38581| ppl: 202.72855| %_neg_is_pos: 0.01828| lr: 0.0| temp: 1.98697 | loss: 1.14743| constrast_loss: 4.52141| div_loss: 0.68301| %_mask_idx: 0.36654| ppl: 202.87436| %_neg_is_pos: 0.02837| lr: 0.0| temp: 1.98696 | loss: 1.12267| constrast_loss: 4.42005| div_loss: 0.70654| %_mask_idx: 0.37798| ppl: 187.81543| %_neg_is_pos: 0.04682| lr: 0.0| temp: 1.98696 | loss: 1.13713| constrast_loss: 4.4806| div_loss: 0.67909| %_mask_idx: 0.38972| ppl: 205.3837| %_neg_is_pos: 0.02177| lr: 0.0| temp: 1.98695 | loss: 1.13097| constrast_loss: 4.45394| div_loss: 0.69949| %_mask_idx: 0.40022| ppl: 192.3259| %_neg_is_pos: 0.04061| lr: 0.0| temp: 1.98695 | loss: 1.11924| constrast_loss: 4.40678| div_loss: 0.70163| %_mask_idx: 0.39568| ppl: 190.95432| %_neg_is_pos: 0.03799| lr: 0.0| temp: 1.98694 | loss: 1.14041| constrast_loss: 4.49163| div_loss: 0.69993| %_mask_idx: 0.42935| ppl: 192.04465| %_neg_is_pos: 0.02156| lr: 0.0| temp: 1.98694 | loss: 1.1309| constrast_loss: 4.45238| div_loss: 0.71226| %_mask_idx: 0.43061| ppl: 184.15063| %_neg_is_pos: 0.02329| lr: 0.0| temp: 1.98692 | loss: 1.13877| constrast_loss: 4.48559| div_loss: 0.69482| %_mask_idx: 0.43264| ppl: 195.31363| %_neg_is_pos: 0.02573| lr: 0.0| temp: 1.98692 | loss: 1.12963| constrast_loss: 4.44577| div_loss: 0.72748| %_mask_idx: 0.40351| ppl: 174.41064| %_neg_is_pos: 0.04682| lr: 0.0| temp: 1.98691 | loss: 1.1277| constrast_loss: 4.44039| div_loss: 0.70418| %_mask_idx: 0.37234| ppl: 189.32544| %_neg_is_pos: 0.03796| lr: 0.0| temp: 1.98691 | loss: 1.13419| constrast_loss: 4.46582| div_loss: 0.70961| %_mask_idx: 0.36544| ppl: 185.85123| %_neg_is_pos: 0.03026| lr: 0.0| temp: 1.98689 | loss: 1.12859| constrast_loss: 4.44395| div_loss: 0.70404| %_mask_idx: 0.3739| ppl: 189.41245| %_neg_is_pos: 0.0285| lr: 0.0| temp: 1.98689 | loss: 1.13207| constrast_loss: 4.45886| div_loss: 0.69411| %_mask_idx: 0.36591| ppl: 195.76794| %_neg_is_pos: 0.02306| lr: 0.0| temp: 1.98688 | loss: 1.14786| constrast_loss: 4.52577| div_loss: 0.65658| %_mask_idx: 0.40288| ppl: 219.78864| %_neg_is_pos: 0.02705| lr: 0.0| temp: 1.98688 | loss: 1.14407| constrast_loss: 4.50748| div_loss: 0.68817| %_mask_idx: 0.4032| ppl: 199.57065| %_neg_is_pos: 0.03285| lr: 0.0| temp: 1.98687 | loss: 1.13515| constrast_loss: 4.47284| div_loss: 0.67768| %_mask_idx: 0.44236| ppl: 206.28465| %_neg_is_pos: 0.01757| lr: 0.0| temp: 1.98687 | loss: 1.15013| constrast_loss: 4.53555| div_loss: 0.64971| %_mask_idx: 0.39333| ppl: 224.1871| %_neg_is_pos: 0.02666| lr: 0.0| temp: 1.98686 | loss: 1.14439| constrast_loss: 4.51003| div_loss: 0.67531| %_mask_idx: 0.44643| ppl: 207.79929| %_neg_is_pos: 0.0252| lr: 0.0| temp: 1.98686 | loss: 1.12766| constrast_loss: 4.44185| div_loss: 0.68797| %_mask_idx: 0.3385| ppl: 199.69794| %_neg_is_pos: 0.03865| lr: 0.0| temp: 1.98684 | loss: 1.13332| constrast_loss: 4.4636| div_loss: 0.69675| %_mask_idx: 0.37672| ppl: 194.07803| %_neg_is_pos: 0.0397| lr: 0.0| temp: 1.98684 | loss: 1.13391| constrast_loss: 4.46541| div_loss: 0.70245| %_mask_idx: 0.37171| ppl: 190.43234| %_neg_is_pos: 0.04917| lr: 0.0| temp: 1.98683 | loss: 1.14416| constrast_loss: 4.50932| div_loss: 0.67331| %_mask_idx: 0.45128| ppl: 209.0802| %_neg_is_pos: 0.01447| lr: 0.0| temp: 1.98683 | loss: 1.12312| constrast_loss: 4.42272| div_loss: 0.69756| %_mask_idx: 0.34383| ppl: 193.56131| %_neg_is_pos: 0.05023| lr: 0.0| temp: 1.98682 | loss: 1.13199| constrast_loss: 4.45651| div_loss: 0.71457| %_mask_idx: 0.4104| ppl: 182.67267| %_neg_is_pos: 0.02833| lr: 0.0| temp: 1.98682 | loss: 1.13936| constrast_loss: 4.48573| div_loss: 0.71719| %_mask_idx: 0.39411| ppl: 181.0016| %_neg_is_pos: 0.03516| lr: 0.0| temp: 1.98681 | loss: 1.13372| constrast_loss: 4.4653| div_loss: 0.69571| %_mask_idx: 0.41573| ppl: 194.74591| %_neg_is_pos: 0.02356| lr: 0.0| temp: 1.98681 | loss: 1.12967| constrast_loss: 4.449| div_loss: 0.69658| %_mask_idx: 0.34117| ppl: 194.18907| %_neg_is_pos: 0.04036| lr: 0.0| temp: 1.98679 | loss: 1.14096| constrast_loss: 4.49419| div_loss: 0.69637| %_mask_idx: 0.40758| ppl: 194.32053| %_neg_is_pos: 0.03488| lr: 0.0| temp: 1.98679 | loss: 1.13919| constrast_loss: 4.4894| div_loss: 0.67378| %_mask_idx: 0.39615| ppl: 208.77963| %_neg_is_pos: 0.02537| lr: 0.0| temp: 1.98678 | loss: 1.11744| constrast_loss: 4.39757| div_loss: 0.72175| %_mask_idx: 0.40194| ppl: 178.08211| %_neg_is_pos: 0.03648| lr: 0.0| temp: 1.98678 | loss: 1.11973| constrast_loss: 4.40856| div_loss: 0.70359| %_mask_idx: 0.41714| ppl: 189.70256| %_neg_is_pos: 0.04102| lr: 0.0| temp: 1.98677 | loss: 1.13572| constrast_loss: 4.47448| div_loss: 0.68403| %_mask_idx: 0.34398| ppl: 202.22377| %_neg_is_pos: 0.03142| lr: 0.0| temp: 1.98677 | loss: 1.14217| constrast_loss: 4.50241| div_loss: 0.66276| %_mask_idx: 0.41635| ppl: 215.83304| %_neg_is_pos: 0.02539| lr: 0.0| temp: 1.98676 | loss: 1.13766| constrast_loss: 4.48255| div_loss: 0.68089| %_mask_idx: 0.388| ppl: 204.22757| %_neg_is_pos: 0.02234| lr: 0.0| temp: 1.98676 | loss: 1.12849| constrast_loss: 4.44447| div_loss: 0.69476| %_mask_idx: 0.42434| ppl: 195.35529| %_neg_is_pos: 0.02339| lr: 0.0| temp: 1.98674 | loss: 1.13276| constrast_loss: 4.46197| div_loss: 0.69051| %_mask_idx: 0.40085| ppl: 198.07076| %_neg_is_pos: 0.04126| lr: 0.0| temp: 1.98674 | loss: 1.12881| constrast_loss: 4.4443| div_loss: 0.70932| %_mask_idx: 0.39082| ppl: 186.03671| %_neg_is_pos: 0.03922| lr: 0.0| temp: 1.98673 | loss: 1.1168| constrast_loss: 4.39364| div_loss: 0.73578| %_mask_idx: 0.35354| ppl: 169.09903| %_neg_is_pos: 0.04019| lr: 0.0| temp: 1.98673 | loss: 1.15083| constrast_loss: 4.53244| div_loss: 0.70863| %_mask_idx: 0.39693| ppl: 186.47617| %_neg_is_pos: 0.03072| lr: 0.0| temp: 1.98671 | loss: 1.13321| constrast_loss: 4.46557| div_loss: 0.67271| %_mask_idx: 0.41808| ppl: 209.46704| %_neg_is_pos: 0.01869| lr: 0.0| temp: 1.98671 | loss: 1.14682| constrast_loss: 4.519| div_loss: 0.68287| %_mask_idx: 0.36278| ppl: 202.96271| %_neg_is_pos: 0.01799| lr: 0.0| temp: 1.9867 | loss: 1.12249| constrast_loss: 4.41525| div_loss: 0.74708| %_mask_idx: 0.35323| ppl: 161.87154| %_neg_is_pos: 0.04446| lr: 0.0| temp: 1.9867 | loss: 1.10528| constrast_loss: 4.34775| div_loss: 0.73376| %_mask_idx: 0.31093| ppl: 170.3925| %_neg_is_pos: 0.06846| lr: 0.0| temp: 1.98669 | loss: 1.13396| constrast_loss: 4.46495| div_loss: 0.70886| %_mask_idx: 0.41745| ppl: 186.32809| %_neg_is_pos: 0.02562| lr: 0.0| temp: 1.98669 | loss: 1.13108| constrast_loss: 4.45585| div_loss: 0.68476| %_mask_idx: 0.37923| ppl: 201.75558| %_neg_is_pos: 0.02389| lr: 0.0| temp: 1.98668 | loss: 1.13578| constrast_loss: 4.4762| div_loss: 0.66899| %_mask_idx: 0.4162| ppl: 211.84586| %_neg_is_pos: 0.0258| lr: 0.0| temp: 1.98668 | loss: 1.13387| constrast_loss: 4.4669| div_loss: 0.68587| %_mask_idx: 0.3573| ppl: 201.04147| %_neg_is_pos: 0.03539| lr: 0.0| temp: 1.98666 | loss: 1.13527| constrast_loss: 4.47295| div_loss: 0.68114| %_mask_idx: 0.41118| ppl: 204.0683| %_neg_is_pos: 0.02537| lr: 0.0| temp: 1.98666 | loss: 1.14808| constrast_loss: 4.5224| div_loss: 0.69927| %_mask_idx: 0.39489| ppl: 192.46802| %_neg_is_pos: 0.03005| lr: 0.0| temp: 1.98665 | loss: 1.14533| constrast_loss: 4.51355| div_loss: 0.67764| %_mask_idx: 0.4032| ppl: 206.31049| %_neg_is_pos: 0.02355| lr: 0.0| temp: 1.98665 | loss: 1.14183| constrast_loss: 4.49944| div_loss: 0.67876| %_mask_idx: 0.40523| ppl: 205.59366| %_neg_is_pos: 0.03104| lr: 0.0| temp: 1.98664 | loss: 1.13257| constrast_loss: 4.46114| div_loss: 0.69123| %_mask_idx: 0.45724| ppl: 197.61264| %_neg_is_pos: 0.02479| lr: 0.0| temp: 1.98664 | loss: 1.13284| constrast_loss: 4.46201| div_loss: 0.69355| %_mask_idx: 0.41244| ppl: 196.1265| %_neg_is_pos: 0.0237| lr: 0.0| temp: 1.98663 | loss: 1.14192| constrast_loss: 4.50123| div_loss: 0.66466| %_mask_idx: 0.44236| ppl: 214.61546| %_neg_is_pos: 0.01978| lr: 0.0| temp: 1.98663 | loss: 1.1351| constrast_loss: 4.47068| div_loss: 0.69713| %_mask_idx: 0.44533| ppl: 193.83838| %_neg_is_pos: 0.01963| lr: 0.0| temp: 1.98661 | loss: 1.13492| constrast_loss: 4.47021| div_loss: 0.69453| %_mask_idx: 0.35699| ppl: 195.50162| %_neg_is_pos: 0.03557| lr: 0.0| temp: 1.98661 | loss: 1.12181| constrast_loss: 4.41789| div_loss: 0.69371| %_mask_idx: 0.34586| ppl: 196.02443| %_neg_is_pos: 0.05372| lr: 0.0| temp: 1.98661 | loss: 1.1319| constrast_loss: 4.45768| div_loss: 0.69929| %_mask_idx: 0.39192| ppl: 192.45267| %_neg_is_pos: 0.01921| lr: 0.0| temp: 1.98661 | loss: 1.13987| constrast_loss: 4.48891| div_loss: 0.70576| %_mask_idx: 0.42278| ppl: 188.31476| %_neg_is_pos: 0.01559| lr: 0.0| temp: 1.9866 | loss: 1.12436| constrast_loss: 4.42695| div_loss: 0.705| %_mask_idx: 0.42372| ppl: 188.80109| %_neg_is_pos: 0.02141| lr: 0.0| temp: 1.9866 | loss: 1.13308| constrast_loss: 4.46344| div_loss: 0.68879| %_mask_idx: 0.37625| ppl: 199.172| %_neg_is_pos: 0.03281| lr: 0.0| temp: 1.98659 | loss: 1.1417| constrast_loss: 4.49858| div_loss: 0.68222| %_mask_idx: 0.37594| ppl: 203.38153| %_neg_is_pos: 0.02526| lr: 0.0| temp: 1.98659 | loss: 1.13621| constrast_loss: 4.47668| div_loss: 0.68161| %_mask_idx: 0.37594| ppl: 203.77054| %_neg_is_pos: 0.03764| lr: 0.0| temp: 1.98657 | loss: 1.14492| constrast_loss: 4.51225| div_loss: 0.67436| %_mask_idx: 0.40617| ppl: 208.4117| %_neg_is_pos: 0.01918| lr: 0.0| temp: 1.98657 | loss: 1.16046| constrast_loss: 4.57791| div_loss: 0.63917| %_mask_idx: 0.40523| ppl: 230.92987| %_neg_is_pos: 0.00927| lr: 0.0| temp: 1.98656 | loss: 1.14105| constrast_loss: 4.49392| div_loss: 0.70295| %_mask_idx: 0.40523| ppl: 190.11081| %_neg_is_pos: 0.02991| lr: 0.0| temp: 1.98656 [2021-09-02 00:11:15,617] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 32.0, reducing to 16.0 [2021-09-02 00:11:15,617] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 32.0, reducing to 16.0 | loss: 1.13219| constrast_loss: 4.45993| div_loss: 0.68835| %_mask_idx: 0.34853| ppl: 199.45534| %_neg_is_pos: 0.02887| lr: 0.0| temp: 1.98654 | loss: 1.1346| constrast_loss: 4.47136| div_loss: 0.67032| %_mask_idx: 0.41056| ppl: 210.99829| %_neg_is_pos: 0.01823| lr: 0.0| temp: 1.98654 | loss: 1.13067| constrast_loss: 4.45249| div_loss: 0.70198| %_mask_idx: 0.4422| ppl: 190.73079| %_neg_is_pos: 0.03497| lr: 0.0| temp: 1.98653 | loss: 1.13155| constrast_loss: 4.45534| div_loss: 0.70845| %_mask_idx: 0.37625| ppl: 186.59447| %_neg_is_pos: 0.0453| lr: 0.0| temp: 1.98653 | loss: 1.10817| constrast_loss: 4.36137| div_loss: 0.71318| %_mask_idx: 0.33036| ppl: 183.56317| %_neg_is_pos: 0.04301| lr: 0.0| temp: 1.98652 | loss: 1.13116| constrast_loss: 4.45126| div_loss: 0.73384| %_mask_idx: 0.37249| ppl: 170.33971| %_neg_is_pos: 0.04045| lr: 0.0| temp: 1.98652 | loss: 1.13326| constrast_loss: 4.46602| div_loss: 0.67029| %_mask_idx: 0.43327| ppl: 211.01404| %_neg_is_pos: 0.02681| lr: 0.0| temp: 1.98651 | loss: 1.13057| constrast_loss: 4.45296| div_loss: 0.69323| %_mask_idx: 0.39051| ppl: 196.3334| %_neg_is_pos: 0.03839| lr: 0.0| temp: 1.98651 | loss: 1.12632| constrast_loss: 4.43643| div_loss: 0.68847| %_mask_idx: 0.35777| ppl: 199.37674| %_neg_is_pos: 0.04149| lr: 0.0| temp: 1.98649 | loss: 1.1307| constrast_loss: 4.45393| div_loss: 0.68862| %_mask_idx: 0.38424| ppl: 199.28555| %_neg_is_pos: 0.03948| lr: 0.0| temp: 1.98649 | loss: 1.12725| constrast_loss: 4.43858| div_loss: 0.70432| %_mask_idx: 0.32049| ppl: 189.23297| %_neg_is_pos: 0.04766| lr: 0.0| temp: 1.98648 | loss: 1.12949| constrast_loss: 4.44841| div_loss: 0.69562| %_mask_idx: 0.39552| ppl: 194.80508| %_neg_is_pos: 0.04386| lr: 0.0| temp: 1.98648 | loss: 1.13048| constrast_loss: 4.45482| div_loss: 0.67111| %_mask_idx: 0.38878| ppl: 210.48746| %_neg_is_pos: 0.04533| lr: 0.0| temp: 1.98647 | loss: 1.12075| constrast_loss: 4.41254| div_loss: 0.7044| %_mask_idx: 0.43374| ppl: 189.18318| %_neg_is_pos: 0.03687| lr: 0.0| temp: 1.98647 | loss: 1.12369| constrast_loss: 4.42461| div_loss: 0.70155| %_mask_idx: 0.42873| ppl: 191.01031| %_neg_is_pos: 0.02661| lr: 0.0| temp: 1.98646 | loss: 1.12265| constrast_loss: 4.4171| div_loss: 0.73515| %_mask_idx: 0.38863| ppl: 169.50093| %_neg_is_pos: 0.05399| lr: 0.0| temp: 1.98646 | loss: 1.12821| constrast_loss: 4.44529| div_loss: 0.67539| %_mask_idx: 0.38894| ppl: 207.75253| %_neg_is_pos: 0.03813| lr: 0.0| temp: 1.98644 | loss: 1.13284| constrast_loss: 4.46199| div_loss: 0.69393| %_mask_idx: 0.37719| ppl: 195.88412| %_neg_is_pos: 0.04359| lr: 0.0| temp: 1.98644 | loss: 1.10528| constrast_loss: 4.34449| div_loss: 0.76613| %_mask_idx: 0.32206| ppl: 149.67984| %_neg_is_pos: 0.07322| lr: 0.0| temp: 1.98643 | loss: 1.13791| constrast_loss: 4.48051| div_loss: 0.71145| %_mask_idx: 0.41776| ppl: 184.6745| %_neg_is_pos: 0.04437| lr: 0.0| temp: 1.98643 | loss: 1.11354| constrast_loss: 4.381| div_loss: 0.73155| %_mask_idx: 0.30404| ppl: 171.80759| %_neg_is_pos: 0.07128| lr: 0.0| temp: 1.98642 | loss: 1.11518| constrast_loss: 4.38713| div_loss: 0.73601| %_mask_idx: 0.4151| ppl: 168.95416| %_neg_is_pos: 0.03758| lr: 0.0| temp: 1.98642 | loss: 1.11554| constrast_loss: 4.39319| div_loss: 0.68977| %_mask_idx: 0.4505| ppl: 198.54926| %_neg_is_pos: 0.03568| lr: 0.0| temp: 1.98641 | loss: 1.13547| constrast_loss: 4.47087| div_loss: 0.7102| %_mask_idx: 0.35981| ppl: 185.4689| %_neg_is_pos: 0.04176| lr: 0.0| temp: 1.98641 | loss: 1.14305| constrast_loss: 4.50574| div_loss: 0.6648| %_mask_idx: 0.45363| ppl: 214.53033| %_neg_is_pos: 0.01503| lr: 0.0| temp: 1.98639 | loss: 1.1414| constrast_loss: 4.4985| div_loss: 0.67084| %_mask_idx: 0.44424| ppl: 210.66292| %_neg_is_pos: 0.01848| lr: 0.0| temp: 1.98639 | loss: 1.13606| constrast_loss: 4.47687| div_loss: 0.67355| %_mask_idx: 0.39145| ppl: 208.93063| %_neg_is_pos: 0.02744| lr: 0.0| temp: 1.98638 | loss: 1.13681| constrast_loss: 4.48039| div_loss: 0.6687| %_mask_idx: 0.40132| ppl: 212.03333| %_neg_is_pos: 0.02568| lr: 0.0| temp: 1.98638 | loss: 1.13089| constrast_loss: 4.45287| div_loss: 0.70671| %_mask_idx: 0.39035| ppl: 187.70554| %_neg_is_pos: 0.05612| lr: 0.0| temp: 1.98636 | loss: 1.12123| constrast_loss: 4.41289| div_loss: 0.72022| %_mask_idx: 0.35135| ppl: 179.0582| %_neg_is_pos: 0.0638| lr: 0.0| temp: 1.98636 | loss: 1.11319| constrast_loss: 4.38098| div_loss: 0.71764| %_mask_idx: 0.38001| ppl: 180.70795| %_neg_is_pos: 0.05959| lr: 0.0| temp: 1.98635 | loss: 1.13776| constrast_loss: 4.48306| div_loss: 0.67963| %_mask_idx: 0.40868| ppl: 205.03537| %_neg_is_pos: 0.04118| lr: 0.0| temp: 1.98635 | loss: 1.12498| constrast_loss: 4.43309| div_loss: 0.66819| %_mask_idx: 0.44878| ppl: 212.35585| %_neg_is_pos: 0.01631| lr: 0.0| temp: 1.98634 | loss: 1.12528| constrast_loss: 4.43309| div_loss: 0.68011| %_mask_idx: 0.40555| ppl: 204.7299| %_neg_is_pos: 0.04608| lr: 0.0| temp: 1.98634 | loss: 1.1149| constrast_loss: 4.3865| div_loss: 0.73089| %_mask_idx: 0.39928| ppl: 172.23325| %_neg_is_pos: 0.07079| lr: 0.0| temp: 1.98633 | loss: 1.12385| constrast_loss: 4.42497| div_loss: 0.70419| %_mask_idx: 0.40367| ppl: 189.31712| %_neg_is_pos: 0.04419| lr: 0.0| temp: 1.98633 | loss: 1.14312| constrast_loss: 4.50171| div_loss: 0.70756| %_mask_idx: 0.41604| ppl: 187.16061| %_neg_is_pos: 0.03026| lr: 0.0| temp: 1.98631 | loss: 1.11465| constrast_loss: 4.3878| div_loss: 0.70782| %_mask_idx: 0.37798| ppl: 186.99423| %_neg_is_pos: 0.05086| lr: 0.0| temp: 1.98631 | loss: 1.11198| constrast_loss: 4.37661| div_loss: 0.71315| %_mask_idx: 0.37296| ppl: 183.58118| %_neg_is_pos: 0.05879| lr: 0.0| temp: 1.9863 | loss: 1.11973| constrast_loss: 4.40834| div_loss: 0.70594| %_mask_idx: 0.37704| ppl: 188.19914| %_neg_is_pos: 0.04809| lr: 0.0| temp: 1.9863 | loss: 1.13087| constrast_loss: 4.45671| div_loss: 0.66773| %_mask_idx: 0.40273| ppl: 212.65021| %_neg_is_pos: 0.03154| lr: 0.0| temp: 1.98629 | loss: 1.12601| constrast_loss: 4.43608| div_loss: 0.6797| %_mask_idx: 0.36137| ppl: 204.99414| %_neg_is_pos: 0.02599| lr: 0.0| temp: 1.98629 | loss: 1.13557| constrast_loss: 4.47241| div_loss: 0.6986| %_mask_idx: 0.44095| ppl: 192.89789| %_neg_is_pos: 0.03532| lr: 0.0| temp: 1.98628 | loss: 1.14066| constrast_loss: 4.49591| div_loss: 0.66733| %_mask_idx: 0.33709| ppl: 212.91003| %_neg_is_pos: 0.0462| lr: 0.0| temp: 1.98628 | loss: 1.13147| constrast_loss: 4.45571| div_loss: 0.70177| %_mask_idx: 0.39803| ppl: 190.86914| %_neg_is_pos: 0.04291| lr: 0.0| temp: 1.98626 | loss: 1.11338| constrast_loss: 4.38514| div_loss: 0.6838| %_mask_idx: 0.36419| ppl: 202.37057| %_neg_is_pos: 0.04232| lr: 0.0| temp: 1.98626 | loss: 1.12085| constrast_loss: 4.4167| div_loss: 0.66684| %_mask_idx: 0.3927| ppl: 213.22458| %_neg_is_pos: 0.02759| lr: 0.0| temp: 1.98625 | loss: 1.12171| constrast_loss: 4.41775| div_loss: 0.69101| %_mask_idx: 0.40163| ppl: 197.75446| %_neg_is_pos: 0.0256| lr: 0.0| temp: 1.98625 | loss: 1.11463| constrast_loss: 4.3878| div_loss: 0.707| %_mask_idx: 0.41886| ppl: 187.51825| %_neg_is_pos: 0.03406| lr: 0.0| temp: 1.98624 | loss: 1.13253| constrast_loss: 4.4617| div_loss: 0.6841| %_mask_idx: 0.39881| ppl: 202.17421| %_neg_is_pos: 0.02914| lr: 0.0| temp: 1.98624 | loss: 1.13811| constrast_loss: 4.48551| div_loss: 0.66938| %_mask_idx: 0.4057| ppl: 211.59648| %_neg_is_pos: 0.02092| lr: 0.0| temp: 1.98623 | loss: 1.13454| constrast_loss: 4.47006| div_loss: 0.68112| %_mask_idx: 0.38628| ppl: 204.0834| %_neg_is_pos: 0.04027| lr: 0.0| temp: 1.98623 | loss: 1.13026| constrast_loss: 4.45324| div_loss: 0.67809| %_mask_idx: 0.41306| ppl: 206.02083| %_neg_is_pos: 0.02492| lr: 0.0| temp: 1.98621 | loss: 1.11799| constrast_loss: 4.40235| div_loss: 0.69612| %_mask_idx: 0.42137| ppl: 194.48569| %_neg_is_pos: 0.03611| lr: 0.0| temp: 1.98621 | loss: 1.11204| constrast_loss: 4.37798| div_loss: 0.70157| %_mask_idx: 0.3891| ppl: 190.99789| %_neg_is_pos: 0.05018| lr: 0.0| temp: 1.9862 | loss: 1.14572| constrast_loss: 4.5161| div_loss: 0.66769| %_mask_idx: 0.42591| ppl: 212.68076| %_neg_is_pos: 0.02552| lr: 0.0| temp: 1.9862 | loss: 1.12834| constrast_loss: 4.44526| div_loss: 0.68096| %_mask_idx: 0.40492| ppl: 204.18454| %_neg_is_pos: 0.03495| lr: 0.0| temp: 1.98618 | loss: 1.12581| constrast_loss: 4.43454| div_loss: 0.68703| %_mask_idx: 0.41306| ppl: 200.30339| %_neg_is_pos: 0.04922| lr: 0.0| temp: 1.98618 | loss: 1.13271| constrast_loss: 4.46474| div_loss: 0.66109| %_mask_idx: 0.39035| ppl: 216.90097| %_neg_is_pos: 0.04468| lr: 0.0| temp: 1.98617 | loss: 1.1288| constrast_loss: 4.44636| div_loss: 0.68838| %_mask_idx: 0.40179| ppl: 199.4339| %_neg_is_pos: 0.03523| lr: 0.0| temp: 1.98617 | loss: 1.11074| constrast_loss: 4.37338| div_loss: 0.69566| %_mask_idx: 0.4021| ppl: 194.77867| %_neg_is_pos: 0.04043| lr: 0.0| temp: 1.98616 | loss: 1.13054| constrast_loss: 4.45371| div_loss: 0.68436| %_mask_idx: 0.39066| ppl: 202.00983| %_neg_is_pos: 0.03038| lr: 0.0| temp: 1.98616 | loss: 1.13336| constrast_loss: 4.46503| div_loss: 0.68401| %_mask_idx: 0.36294| ppl: 202.23181| %_neg_is_pos: 0.04135| lr: 0.0| temp: 1.98615 | loss: 1.12799| constrast_loss: 4.44122| div_loss: 0.70754| %_mask_idx: 0.34555| ppl: 187.17249| %_neg_is_pos: 0.042| lr: 0.0| temp: 1.98615 | loss: 1.12795| constrast_loss: 4.44436| div_loss: 0.67442| %_mask_idx: 0.40664| ppl: 208.37389| %_neg_is_pos: 0.04598| lr: 0.0| temp: 1.98613 | loss: 1.13993| constrast_loss: 4.49412| div_loss: 0.65609| %_mask_idx: 0.35558| ppl: 220.10449| %_neg_is_pos: 0.03527| lr: 0.0| temp: 1.98613 | loss: 1.10304| constrast_loss: 4.33869| div_loss: 0.73466| %_mask_idx: 0.34994| ppl: 169.81807| %_neg_is_pos: 0.07748| lr: 0.0| temp: 1.98612 | loss: 1.11347| constrast_loss: 4.38207| div_loss: 0.71797| %_mask_idx: 0.34383| ppl: 180.49641| %_neg_is_pos: 0.0541| lr: 0.0| temp: 1.98612 | loss: 1.11348| constrast_loss: 4.38483| div_loss: 0.69085| %_mask_idx: 0.39693| ppl: 197.85483| %_neg_is_pos: 0.05122| lr: 0.0| temp: 1.98611 | loss: 1.13418| constrast_loss: 4.46959| div_loss: 0.67113| %_mask_idx: 0.3562| ppl: 210.47704| %_neg_is_pos: 0.02615| lr: 0.0| temp: 1.98611 | loss: 1.1303| constrast_loss: 4.4545| div_loss: 0.66689| %_mask_idx: 0.40194| ppl: 213.18994| %_neg_is_pos: 0.02761| lr: 0.0| temp: 1.9861 | loss: 1.11085| constrast_loss: 4.37152| div_loss: 0.71867| %_mask_idx: 0.38252| ppl: 180.0531| %_neg_is_pos: 0.0494| lr: 0.0| temp: 1.9861 | loss: 1.13164| constrast_loss: 4.45878| div_loss: 0.67791| %_mask_idx: 0.36028| ppl: 206.13562| %_neg_is_pos: 0.03152| lr: 0.0| temp: 1.98608 | loss: 1.11761| constrast_loss: 4.40056| div_loss: 0.69864| %_mask_idx: 0.42513| ppl: 192.87187| %_neg_is_pos: 0.04806| lr: 0.0| temp: 1.98608 | loss: 1.12439| constrast_loss: 4.42902| div_loss: 0.68557| %_mask_idx: 0.36623| ppl: 201.23407| %_neg_is_pos: 0.02844| lr: 0.0| temp: 1.98607 | loss: 1.1153| constrast_loss: 4.39162| div_loss: 0.69599| %_mask_idx: 0.38268| ppl: 194.56384| %_neg_is_pos: 0.04773| lr: 0.0| temp: 1.98607 | loss: 1.09511| constrast_loss: 4.30695| div_loss: 0.73492| %_mask_idx: 0.40241| ppl: 169.65366| %_neg_is_pos: 0.06011| lr: 0.0| temp: 1.98606 | loss: 1.11042| constrast_loss: 4.37207| div_loss: 0.69621| %_mask_idx: 0.40398| ppl: 194.42296| %_neg_is_pos: 0.04484| lr: 0.0| temp: 1.98606 | loss: 1.14538| constrast_loss: 4.5149| div_loss: 0.66611| %_mask_idx: 0.40852| ppl: 213.68822| %_neg_is_pos: 0.01826| lr: 0.0| temp: 1.98605 | loss: 1.12053| constrast_loss: 4.4088| div_loss: 0.73329| %_mask_idx: 0.37516| ppl: 170.69452| %_neg_is_pos: 0.04334| lr: 0.0| temp: 1.98605 | loss: 1.12618| constrast_loss: 4.43742| div_loss: 0.67286| %_mask_idx: 0.41714| ppl: 209.36839| %_neg_is_pos: 0.03458| lr: 0.0| temp: 1.98603 | loss: 1.13618| constrast_loss: 4.4751| div_loss: 0.69607| %_mask_idx: 0.37704| ppl: 194.51511| %_neg_is_pos: 0.03025| lr: 0.0| temp: 1.98603 | loss: 1.10812| constrast_loss: 4.36071| div_loss: 0.71771| %_mask_idx: 0.35981| ppl: 180.66455| %_neg_is_pos: 0.04777| lr: 0.0| temp: 1.98602 | loss: 1.12276| constrast_loss: 4.42133| div_loss: 0.69697| %_mask_idx: 0.37249| ppl: 193.93985| %_neg_is_pos: 0.03644| lr: 0.0| temp: 1.98602 | loss: 1.13391| constrast_loss: 4.46654| div_loss: 0.69098| %_mask_idx: 0.38941| ppl: 197.77306| %_neg_is_pos: 0.0334| lr: 0.0| temp: 1.986 | loss: 1.1302| constrast_loss: 4.45285| div_loss: 0.6794| %_mask_idx: 0.41761| ppl: 205.18494| %_neg_is_pos: 0.03308| lr: 0.0| temp: 1.986 | loss: 1.14004| constrast_loss: 4.49077| div_loss: 0.69378| %_mask_idx: 0.38925| ppl: 195.98175| %_neg_is_pos: 0.03655| lr: 0.0| temp: 1.98599 | loss: 1.11529| constrast_loss: 4.39153| div_loss: 0.69648| %_mask_idx: 0.38033| ppl: 194.25201| %_neg_is_pos: 0.05464| lr: 0.0| temp: 1.98599 | loss: 1.10349| constrast_loss: 4.34283| div_loss: 0.7114| %_mask_idx: 0.30905| ppl: 184.70352| %_neg_is_pos: 0.07496| lr: 0.0| temp: 1.98598 | loss: 1.13752| constrast_loss: 4.48335| div_loss: 0.66745| %_mask_idx: 0.37892| ppl: 212.82968| %_neg_is_pos: 0.02212| lr: 0.0| temp: 1.98598 | loss: 1.11151| constrast_loss: 4.37345| div_loss: 0.72579| %_mask_idx: 0.40085| ppl: 175.49136| %_neg_is_pos: 0.06256| lr: 0.0| temp: 1.98597 | loss: 1.13071| constrast_loss: 4.45222| div_loss: 0.70628| %_mask_idx: 0.39505| ppl: 187.98132| %_neg_is_pos: 0.04437| lr: 0.0| temp: 1.98597 | loss: 1.1232| constrast_loss: 4.4218| div_loss: 0.71011| %_mask_idx: 0.3891| ppl: 185.53256| %_neg_is_pos: 0.06169| lr: 0.0| temp: 1.98595 | loss: 1.11871| constrast_loss: 4.40552| div_loss: 0.69308| %_mask_idx: 0.41244| ppl: 196.42574| %_neg_is_pos: 0.03229| lr: 0.0| temp: 1.98595 | loss: 1.13| constrast_loss: 4.45272| div_loss: 0.67265| %_mask_idx: 0.39693| ppl: 209.50131| %_neg_is_pos: 0.02458| lr: 0.0| temp: 1.98594 | loss: 1.1006| constrast_loss: 4.3279| div_loss: 0.74518| %_mask_idx: 0.29715| ppl: 163.08679| %_neg_is_pos: 0.08325| lr: 0.0| temp: 1.98594 | loss: 1.12096| constrast_loss: 4.41247| div_loss: 0.71374| %_mask_idx: 0.38549| ppl: 183.20341| %_neg_is_pos: 0.04484| lr: 0.0| temp: 1.98593 | loss: 1.13622| constrast_loss: 4.47672| div_loss: 0.6815| %_mask_idx: 0.44377| ppl: 203.84048| %_neg_is_pos: 0.02476| lr: 0.0| temp: 1.98593 | loss: 1.10199| constrast_loss: 4.33536| div_loss: 0.72581| %_mask_idx: 0.38471| ppl: 175.47891| %_neg_is_pos: 0.04712| lr: 0.0| temp: 1.98592 | loss: 1.11119| constrast_loss: 4.37294| div_loss: 0.71838| %_mask_idx: 0.38424| ppl: 180.23709| %_neg_is_pos: 0.0465| lr: 0.0| temp: 1.98592 | loss: 1.12338| constrast_loss: 4.42359| div_loss: 0.69918| %_mask_idx: 0.38831| ppl: 192.5249| %_neg_is_pos: 0.04221| lr: 0.0| temp: 1.9859 | loss: 1.12595| constrast_loss: 4.43449| div_loss: 0.69295| %_mask_idx: 0.36059| ppl: 196.50888| %_neg_is_pos: 0.03232| lr: 0.0| temp: 1.9859 | loss: 1.12373| constrast_loss: 4.42526| div_loss: 0.69666| %_mask_idx: 0.43499| ppl: 194.1391| %_neg_is_pos: 0.02434| lr: 0.0| temp: 1.98589 | loss: 1.12937| constrast_loss: 4.44858| div_loss: 0.68921| %_mask_idx: 0.37531| ppl: 198.90407| %_neg_is_pos: 0.03791| lr: 0.0| temp: 1.98589 | loss: 1.12633| constrast_loss: 4.43776| div_loss: 0.67565| %_mask_idx: 0.44721| ppl: 207.58521| %_neg_is_pos: 0.01765| lr: 0.0| temp: 1.98588 | loss: 1.12429| constrast_loss: 4.42892| div_loss: 0.68244| %_mask_idx: 0.38017| ppl: 203.24031| %_neg_is_pos: 0.03888| lr: 0.0| temp: 1.98588 | loss: 1.11006| constrast_loss: 4.36752| div_loss: 0.7271| %_mask_idx: 0.34743| ppl: 174.65741| %_neg_is_pos: 0.04297| lr: 0.0| temp: 1.98587 | loss: 1.12471| constrast_loss: 4.42867| div_loss: 0.70165| %_mask_idx: 0.41244| ppl: 190.94269| %_neg_is_pos: 0.03563| lr: 0.0| temp: 1.98587 | loss: 1.11465| constrast_loss: 4.39108| div_loss: 0.67532| %_mask_idx: 0.33553| ppl: 207.79355| %_neg_is_pos: 0.05359| lr: 0.0| temp: 1.98585 | loss: 1.11102| constrast_loss: 4.37426| div_loss: 0.69835| %_mask_idx: 0.39113| ppl: 193.05823| %_neg_is_pos: 0.04193| lr: 0.0| temp: 1.98585 | loss: 1.10991| constrast_loss: 4.36442| div_loss: 0.75225| %_mask_idx: 0.35088| ppl: 158.55847| %_neg_is_pos: 0.06373| lr: 0.0| temp: 1.98584 | loss: 1.10578| constrast_loss: 4.35252| div_loss: 0.70588| %_mask_idx: 0.32534| ppl: 188.23842| %_neg_is_pos: 0.06305| lr: 0.0| temp: 1.98584 [2021-09-02 00:20:31,530] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 16.0, reducing to 8.0 [2021-09-02 00:20:31,530] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 16.0, reducing to 8.0 | loss: 1.13168| constrast_loss: 4.4581| div_loss: 0.68629| %_mask_idx: 0.40633| ppl: 200.77597| %_neg_is_pos: 0.03494| lr: 0.0| temp: 1.98582 | loss: 1.12175| constrast_loss: 4.41446| div_loss: 0.72561| %_mask_idx: 0.38894| ppl: 175.6105| %_neg_is_pos: 0.04388| lr: 0.0| temp: 1.98582 | loss: 1.1339| constrast_loss: 4.46755| div_loss: 0.68056| %_mask_idx: 0.36075| ppl: 204.43921| %_neg_is_pos: 0.03775| lr: 0.0| temp: 1.98581 | loss: 1.13073| constrast_loss: 4.45487| div_loss: 0.68041| %_mask_idx: 0.41808| ppl: 204.53888| %_neg_is_pos: 0.03321| lr: 0.0| temp: 1.98581 | loss: 1.11733| constrast_loss: 4.39766| div_loss: 0.71649| %_mask_idx: 0.40085| ppl: 181.44788| %_neg_is_pos: 0.0681| lr: 0.0| temp: 1.9858 | loss: 1.13538| constrast_loss: 4.47492| div_loss: 0.66594| %_mask_idx: 0.39411| ppl: 213.79611| %_neg_is_pos: 0.02844| lr: 0.0| temp: 1.9858 | loss: 1.11351| constrast_loss: 4.38388| div_loss: 0.70155| %_mask_idx: 0.40429| ppl: 191.00607| %_neg_is_pos: 0.05176| lr: 0.0| temp: 1.98579 | loss: 1.13736| constrast_loss: 4.48114| div_loss: 0.68291| %_mask_idx: 0.40241| ppl: 202.93465| %_neg_is_pos: 0.05129| lr: 0.0| temp: 1.98579 | loss: 1.12215| constrast_loss: 4.41943| div_loss: 0.69176| %_mask_idx: 0.33286| ppl: 197.27216| %_neg_is_pos: 0.0334| lr: 0.0| temp: 1.98577 | loss: 1.1126| constrast_loss: 4.37893| div_loss: 0.71449| %_mask_idx: 0.4093| ppl: 182.72498| %_neg_is_pos: 0.03966| lr: 0.0| temp: 1.98577 | loss: 1.10981| constrast_loss: 4.37051| div_loss: 0.68723| %_mask_idx: 0.35511| ppl: 200.17224| %_neg_is_pos: 0.05421| lr: 0.0| temp: 1.98576 | loss: 1.10713| constrast_loss: 4.3569| div_loss: 0.71637| %_mask_idx: 0.37892| ppl: 181.52365| %_neg_is_pos: 0.04975| lr: 0.0| temp: 1.98576 | loss: 1.12854| constrast_loss: 4.44322| div_loss: 0.70943| %_mask_idx: 0.35119| ppl: 185.96664| %_neg_is_pos: 0.0736| lr: 0.0| temp: 1.98575 | loss: 1.13088| constrast_loss: 4.45778| div_loss: 0.65746| %_mask_idx: 0.42732| ppl: 219.22476| %_neg_is_pos: 0.02612| lr: 0.0| temp: 1.98575 | loss: 1.11765| constrast_loss: 4.40219| div_loss: 0.68412| %_mask_idx: 0.38017| ppl: 202.16245| %_neg_is_pos: 0.03787| lr: 0.0| temp: 1.98574 | loss: 1.12559| constrast_loss: 4.4359| div_loss: 0.66474| %_mask_idx: 0.36999| ppl: 214.56891| %_neg_is_pos: 0.02752| lr: 0.0| temp: 1.98574 | loss: 1.09855| constrast_loss: 4.32424| div_loss: 0.69967| %_mask_idx: 0.36028| ppl: 192.21074| %_neg_is_pos: 0.04157| lr: 0.0| temp: 1.98572 | loss: 1.12162| constrast_loss: 4.41758| div_loss: 0.68898| %_mask_idx: 0.38158| ppl: 199.05106| %_neg_is_pos: 0.05032| lr: 0.0| temp: 1.98572 | loss: 1.12305| constrast_loss: 4.42365| div_loss: 0.68554| %_mask_idx: 0.3786| ppl: 201.25487| %_neg_is_pos: 0.03224| lr: 0.0| temp: 1.98571 | loss: 1.13011| constrast_loss: 4.45218| div_loss: 0.68249| %_mask_idx: 0.37531| ppl: 203.20946| %_neg_is_pos: 0.06737| lr: 0.0| temp: 1.98571 | loss: 1.12805| constrast_loss: 4.44246| div_loss: 0.69744| %_mask_idx: 0.36732| ppl: 193.63589| %_neg_is_pos: 0.03255| lr: 0.0| temp: 1.9857 | loss: 1.12744| constrast_loss: 4.44098| div_loss: 0.68775| %_mask_idx: 0.43045| ppl: 199.8421| %_neg_is_pos: 0.02572| lr: 0.0| temp: 1.9857 | loss: 1.13537| constrast_loss: 4.47655| div_loss: 0.64922| %_mask_idx: 0.4057| ppl: 224.5015| %_neg_is_pos: 0.02438| lr: 0.0| temp: 1.98569 | loss: 1.11044| constrast_loss: 4.37096| div_loss: 0.70786| %_mask_idx: 0.45348| ppl: 186.96809| %_neg_is_pos: 0.03368| lr: 0.0| temp: 1.98569 | loss: 1.12804| constrast_loss: 4.44207| div_loss: 0.70106| %_mask_idx: 0.39646| ppl: 191.32364| %_neg_is_pos: 0.03869| lr: 0.0| temp: 1.98567| loss: 1.11865| constrast_loss: 4.40579| div_loss: 0.68798| %_mask_idx: 0.35103| ppl: 199.69258| %_neg_is_pos: 0.04798| lr: 0.0| temp: 1.98567 | loss: 1.1374| constrast_loss: 4.48335| div_loss: 0.66244| %_mask_idx: 0.39693| ppl: 216.03656| %_neg_is_pos: 0.0297| lr: 0.0| temp: 1.98566 | loss: 1.13759| constrast_loss: 4.48265| div_loss: 0.67713| %_mask_idx: 0.37155| ppl: 206.63483| %_neg_is_pos: 0.04098| lr: 0.0| temp: 1.98566 | loss: 1.14926| constrast_loss: 4.53289| div_loss: 0.64165| %_mask_idx: 0.42528| ppl: 229.34564| %_neg_is_pos: 0.02029| lr: 0.0| temp: 1.98564 | loss: 1.13561| constrast_loss: 4.47392| div_loss: 0.68508| %_mask_idx: 0.38142| ppl: 201.55084| %_neg_is_pos: 0.02489| lr: 0.0| temp: 1.98564 | loss: 1.12147| constrast_loss: 4.41631| div_loss: 0.69569| %_mask_idx: 0.40993| ppl: 194.75653| %_neg_is_pos: 0.03045| lr: 0.0| temp: 1.98563 | loss: 1.15464| constrast_loss: 4.55345| div_loss: 0.65126| %_mask_idx: 0.35229| ppl: 223.19339| %_neg_is_pos: 0.03964| lr: 0.0| temp: 1.98563 | loss: 1.13672| constrast_loss: 4.48123| div_loss: 0.65636| %_mask_idx: 0.32018| ppl: 219.92976| %_neg_is_pos: 0.03905| lr: 0.0| temp: 1.98562 | loss: 1.13005| constrast_loss: 4.45278| div_loss: 0.67416| %_mask_idx: 0.38252| ppl: 208.53812| %_neg_is_pos: 0.02745| lr: 0.0| temp: 1.98562 | loss: 1.11521| constrast_loss: 4.39219| div_loss: 0.68663| %_mask_idx: 0.39552| ppl: 200.55362| %_neg_is_pos: 0.03063| lr: 0.0| temp: 1.98561 | loss: 1.1212| constrast_loss: 4.4141| div_loss: 0.70688| %_mask_idx: 0.35291| ppl: 187.59665| %_neg_is_pos: 0.03888| lr: 0.0| temp: 1.98561 | loss: 1.11602| constrast_loss: 4.39442| div_loss: 0.69669| %_mask_idx: 0.36404| ppl: 194.1189| %_neg_is_pos: 0.04594| lr: 0.0| temp: 1.98559 | loss: 1.11628| constrast_loss: 4.39705| div_loss: 0.68087| %_mask_idx: 0.3808| ppl: 204.24338| %_neg_is_pos: 0.04918| lr: 0.0| temp: 1.98559 | loss: 1.09992| constrast_loss: 4.32908| div_loss: 0.7059| %_mask_idx: 0.34821| ppl: 188.22412| %_neg_is_pos: 0.03613| lr: 0.0| temp: 1.98558 | loss: 1.1236| constrast_loss: 4.42731| div_loss: 0.67096| %_mask_idx: 0.37014| ppl: 210.58493| %_neg_is_pos: 0.03172| lr: 0.0| temp: 1.98558 | loss: 1.11694| constrast_loss: 4.39834| div_loss: 0.69405| %_mask_idx: 0.36435| ppl: 195.80719| %_neg_is_pos: 0.04321| lr: 0.0| temp: 1.98557 | loss: 1.13863| constrast_loss: 4.48556| div_loss: 0.68973| %_mask_idx: 0.38064| ppl: 198.57086| %_neg_is_pos: 0.03109| lr: 0.0| temp: 1.98557 | loss: 1.1294| constrast_loss: 4.4493| div_loss: 0.68305| %_mask_idx: 0.36372| ppl: 202.84674| %_neg_is_pos: 0.03565| lr: 0.0| temp: 1.98556 | loss: 1.12879| constrast_loss: 4.44993| div_loss: 0.6524| %_mask_idx: 0.4433| ppl: 222.46365| %_neg_is_pos: 0.02986| lr: 0.0| temp: 1.98556 | loss: 1.1136| constrast_loss: 4.38435| div_loss: 0.70067| %_mask_idx: 0.40523| ppl: 191.57178| %_neg_is_pos: 0.04132| lr: 0.0| temp: 1.98554 | loss: 1.13403| constrast_loss: 4.46862| div_loss: 0.67508| %_mask_idx: 0.40883| ppl: 207.9516| %_neg_is_pos: 0.02138| lr: 0.0| temp: 1.98554 | loss: 1.12123| constrast_loss: 4.4134| div_loss: 0.71505| %_mask_idx: 0.38456| ppl: 182.3689| %_neg_is_pos: 0.05362| lr: 0.0| temp: 1.98553 | loss: 1.13294| constrast_loss: 4.462| div_loss: 0.69742| %_mask_idx: 0.41338| ppl: 193.65262| %_neg_is_pos: 0.02541| lr: 0.0| temp: 1.98553 | loss: 1.1283| constrast_loss: 4.44492| div_loss: 0.68287| %_mask_idx: 0.4458| ppl: 202.96399| %_neg_is_pos: 0.03526| lr: 0.0| temp: 1.98552 | loss: 1.13409| constrast_loss: 4.47013| div_loss: 0.66232| %_mask_idx: 0.34007| ppl: 216.11227| %_neg_is_pos: 0.02939| lr: 0.0| temp: 1.98552 | loss: 1.14251| constrast_loss: 4.50317| div_loss: 0.66852| %_mask_idx: 0.37986| ppl: 212.14456| %_neg_is_pos: 0.03149| lr: 0.0| temp: 1.98551 | loss: 1.13101| constrast_loss: 4.45603| div_loss: 0.67997| %_mask_idx: 0.37234| ppl: 204.82184| %_neg_is_pos: 0.04156| lr: 0.0| temp: 1.98551 | loss: 1.11331| constrast_loss: 4.38253| div_loss: 0.70713| %_mask_idx: 0.35542| ppl: 187.4366| %_neg_is_pos: 0.05264| lr: 0.0| temp: 1.98549 | loss: 1.14079| constrast_loss: 4.49785| div_loss: 0.65333| %_mask_idx: 0.36623| ppl: 221.86737| %_neg_is_pos: 0.03872| lr: 0.0| temp: 1.98549 | loss: 1.1389| constrast_loss: 4.49271| div_loss: 0.62908| %_mask_idx: 0.42951| ppl: 237.38744| %_neg_is_pos: 0.01223| lr: 0.0| temp: 1.98548 | loss: 1.12607| constrast_loss: 4.43629| div_loss: 0.67992| %_mask_idx: 0.41902| ppl: 204.85278| %_neg_is_pos: 0.02414| lr: 0.0| temp: 1.98548 | loss: 1.14858| constrast_loss: 4.52845| div_loss: 0.65854| %_mask_idx: 0.39583| ppl: 218.53476| %_neg_is_pos: 0.01176| lr: 0.0| temp: 1.98546 | loss: 1.13872| constrast_loss: 4.49049| div_loss: 0.644| %_mask_idx: 0.38064| ppl: 227.84077| %_neg_is_pos: 0.01859| lr: 0.0| temp: 1.98546 | loss: 1.12226| constrast_loss: 4.41716| div_loss: 0.71897| %_mask_idx: 0.43264| ppl: 179.86198| %_neg_is_pos: 0.02801| lr: 0.0| temp: 1.98545 | loss: 1.1224| constrast_loss: 4.42239| div_loss: 0.67187| %_mask_idx: 0.38299| ppl: 210.00497| %_neg_is_pos: 0.04743| lr: 0.0| temp: 1.98545 | loss: 1.10328| constrast_loss: 4.33955| div_loss: 0.73589| %_mask_idx: 0.39959| ppl: 169.03331| %_neg_is_pos: 0.05483| lr: 0.0| temp: 1.98544 | loss: 1.14427| constrast_loss: 4.50988| div_loss: 0.6721| %_mask_idx: 0.38549| ppl: 209.85538| %_neg_is_pos: 0.02379| lr: 0.0| temp: 1.98544 | loss: 1.12727| constrast_loss: 4.44091| div_loss: 0.68155| %_mask_idx: 0.37124| ppl: 203.80664| %_neg_is_pos: 0.05133| lr: 0.0| temp: 1.98543 | loss: 1.11677| constrast_loss: 4.39675| div_loss: 0.7033| %_mask_idx: 0.43186| ppl: 189.88525| %_neg_is_pos: 0.02954| lr: 0.0| temp: 1.98543 | loss: 1.10673| constrast_loss: 4.35663| div_loss: 0.7028| %_mask_idx: 0.39991| ppl: 190.20889| %_neg_is_pos: 0.05097| lr: 0.0| temp: 1.98541 | loss: 1.13044| constrast_loss: 4.45317| div_loss: 0.68584| %_mask_idx: 0.3468| ppl: 201.06094| %_neg_is_pos: 0.04526| lr: 0.0| temp: 1.98541 | loss: 1.12431| constrast_loss: 4.42638| div_loss: 0.70865| %_mask_idx: 0.37437| ppl: 186.46713| %_neg_is_pos: 0.03687| lr: 0.0| temp: 1.9854 | loss: 1.12874| constrast_loss: 4.44675| div_loss: 0.68206| %_mask_idx: 0.39035| ppl: 203.47954| %_neg_is_pos: 0.04457| lr: 0.0| temp: 1.9854 | loss: 1.10768| constrast_loss: 4.35945| div_loss: 0.71255| %_mask_idx: 0.42011| ppl: 183.96753| %_neg_is_pos: 0.04289| lr: 0.0| temp: 1.98539 | loss: 1.1301| constrast_loss: 4.45185| div_loss: 0.6855| %_mask_idx: 0.40367| ppl: 201.27982| %_neg_is_pos: 0.03879| lr: 0.0| temp: 1.98539 | loss: 1.13214| constrast_loss: 4.46018| div_loss: 0.6836| %_mask_idx: 0.42403| ppl: 202.49747| %_neg_is_pos: 0.02916| lr: 0.0| temp: 1.98538 | loss: 1.12854| constrast_loss: 4.44731| div_loss: 0.66833| %_mask_idx: 0.35056| ppl: 212.27197| %_neg_is_pos: 0.02871| lr: 0.0| temp: 1.98538 | loss: 1.11766| constrast_loss: 4.40252| div_loss: 0.68117| %_mask_idx: 0.37923| ppl: 204.05063| %_neg_is_pos: 0.04454| lr: 0.0| temp: 1.98536 | loss: 1.1364| constrast_loss: 4.47987| div_loss: 0.65715| %_mask_idx: 0.43562| ppl: 219.42229| %_neg_is_pos: 0.01365| lr: 0.0| temp: 1.98536 | loss: 1.1311| constrast_loss: 4.45604| div_loss: 0.68345| %_mask_idx: 0.36137| ppl: 202.59402| %_neg_is_pos: 0.03951| lr: 0.0| temp: 1.98535 | loss: 1.12624| constrast_loss: 4.43505| div_loss: 0.69904| %_mask_idx: 0.40053| ppl: 192.61304| %_neg_is_pos: 0.03807| lr: 0.0| temp: 1.98535 | loss: 1.13036| constrast_loss: 4.45312| div_loss: 0.68327| %_mask_idx: 0.39113| ppl: 202.70724| %_neg_is_pos: 0.01964| lr: 0.0| temp: 1.98534 | loss: 1.09201| constrast_loss: 4.2968| div_loss: 0.71232| %_mask_idx: 0.36826| ppl: 184.11728| %_neg_is_pos: 0.06797| lr: 0.0| temp: 1.98534 | loss: 1.12756| constrast_loss: 4.43976| div_loss: 0.705| %_mask_idx: 0.41165| ppl: 188.79791| %_neg_is_pos: 0.03178| lr: 0.0| temp: 1.98533 | loss: 1.11289| constrast_loss: 4.37896| div_loss: 0.72587| %_mask_idx: 0.35871| ppl: 175.44263| %_neg_is_pos: 0.05302| lr: 0.0| temp: 1.98533 | loss: 1.10657| constrast_loss: 4.35412| div_loss: 0.72168| %_mask_idx: 0.37218| ppl: 178.12738| %_neg_is_pos: 0.04413| lr: 0.0| temp: 1.98531 | loss: 1.12643| constrast_loss: 4.43712| div_loss: 0.68594| %_mask_idx: 0.43139| ppl: 201.0011| %_neg_is_pos: 0.02606| lr: 0.0| temp: 1.98531 | loss: 1.13437| constrast_loss: 4.47164| div_loss: 0.65857| %_mask_idx: 0.40398| ppl: 218.51793| %_neg_is_pos: 0.01995| lr: 0.0| temp: 1.9853 | loss: 1.13954| constrast_loss: 4.49115| div_loss: 0.67009| %_mask_idx: 0.36607| ppl: 211.14282| %_neg_is_pos: 0.03229| lr: 0.0| temp: 1.9853 | loss: 1.12509| constrast_loss: 4.42998| div_loss: 0.70376| %_mask_idx: 0.3916| ppl: 189.59534| %_neg_is_pos: 0.02789| lr: 0.0| temp: 1.98528 | loss: 1.10955| constrast_loss: 4.36451| div_loss: 0.73675| %_mask_idx: 0.34586| ppl: 168.48175| %_neg_is_pos: 0.05907| lr: 0.0| temp: 1.98528 | loss: 1.12852| constrast_loss: 4.4478| div_loss: 0.66279| %_mask_idx: 0.40053| ppl: 215.81448| %_neg_is_pos: 0.03816| lr: 0.0| temp: 1.98527 | loss: 1.10805| constrast_loss: 4.36186| div_loss: 0.70358| %_mask_idx: 0.36717| ppl: 189.71066| %_neg_is_pos: 0.06699| lr: 0.0| temp: 1.98527 | loss: 1.13539| constrast_loss: 4.47601| div_loss: 0.65541| %_mask_idx: 0.41416| ppl: 220.54048| %_neg_is_pos: 0.02698| lr: 0.0| temp: 1.98526 | loss: 1.11907| constrast_loss: 4.40797| div_loss: 0.68316| %_mask_idx: 0.35197| ppl: 202.77592| %_neg_is_pos: 0.048| lr: 0.0| temp: 1.98526 | loss: 1.12011| constrast_loss: 4.41034| div_loss: 0.701| %_mask_idx: 0.38127| ppl: 191.36237| %_neg_is_pos: 0.03706| lr: 0.0| temp: 1.98525 | loss: 1.10235| constrast_loss: 4.33777| div_loss: 0.71642| %_mask_idx: 0.37719| ppl: 181.49115| %_neg_is_pos: 0.04371| lr: 0.0| temp: 1.98525 | loss: 1.12388| constrast_loss: 4.42748| div_loss: 0.68055| %_mask_idx: 0.37202| ppl: 204.44879| %_neg_is_pos: 0.02663| lr: 0.0| temp: 1.98523 | loss: 1.11374| constrast_loss: 4.38531| div_loss: 0.69647| %_mask_idx: 0.38142| ppl: 194.2562| %_neg_is_pos: 0.03963| lr: 0.0| temp: 1.98523 | loss: 1.11776| constrast_loss: 4.40142| div_loss: 0.69596| %_mask_idx: 0.39066| ppl: 194.58328| %_neg_is_pos: 0.03646| lr: 0.0| temp: 1.98522 | loss: 1.11754| constrast_loss: 4.40274| div_loss: 0.67406| %_mask_idx: 0.36826| ppl: 208.6017| %_neg_is_pos: 0.04621| lr: 0.0| temp: 1.98522 | loss: 1.1307| constrast_loss: 4.45305| div_loss: 0.69755| %_mask_idx: 0.39192| ppl: 193.56944| %_neg_is_pos: 0.03982| lr: 0.0| temp: 1.98521 | loss: 1.1162| constrast_loss: 4.39502| div_loss: 0.69797| %_mask_idx: 0.38863| ppl: 193.30229| %_neg_is_pos: 0.03169| lr: 0.0| temp: 1.98521 | loss: 1.09848| constrast_loss: 4.32322| div_loss: 0.70682| %_mask_idx: 0.39662| ppl: 187.63739| %_neg_is_pos: 0.0388| lr: 0.0| temp: 1.9852 | loss: 1.13671| constrast_loss: 4.47977| div_loss: 0.67082| %_mask_idx: 0.41917| ppl: 210.67773| %_neg_is_pos: 0.01855| lr: 0.0| temp: 1.9852 | loss: 1.10454| constrast_loss: 4.34998| div_loss: 0.68175| %_mask_idx: 0.31955| ppl: 203.6777| %_neg_is_pos: 0.07205| lr: 0.0| temp: 1.98519 | loss: 1.09881| constrast_loss: 4.3206| div_loss: 0.74646| %_mask_idx: 0.39646| ppl: 162.26314| %_neg_is_pos: 0.05779| lr: 0.0| temp: 1.98519 | loss: 1.14369| constrast_loss: 4.50882| div_loss: 0.65934| %_mask_idx: 0.3844| ppl: 218.02373| %_neg_is_pos: 0.03256| lr: 0.0| temp: 1.98518 | loss: 1.10234| constrast_loss: 4.33882| div_loss: 0.7053| %_mask_idx: 0.36419| ppl: 188.60623| %_neg_is_pos: 0.04954| lr: 0.0| temp: 1.98518 | loss: 1.11361| constrast_loss: 4.38226| div_loss: 0.7216| %_mask_idx: 0.39975| ppl: 178.17674| %_neg_is_pos: 0.04296| lr: 0.0| temp: 1.98517 | loss: 1.12601| constrast_loss: 4.43451| div_loss: 0.69522| %_mask_idx: 0.43766| ppl: 195.05623| %_neg_is_pos: 0.01404| lr: 0.0| temp: 1.98517 | loss: 1.11584| constrast_loss: 4.39574| div_loss: 0.67623| %_mask_idx: 0.35949| ppl: 207.21265| %_neg_is_pos: 0.04268| lr: 0.0| temp: 1.98516 | loss: 1.13042| constrast_loss: 4.45433| div_loss: 0.67362| %_mask_idx: 0.38659| ppl: 208.88631| %_neg_is_pos: 0.05142| lr: 0.0| temp: 1.98516 | loss: 1.1251| constrast_loss: 4.42995| div_loss: 0.70454| %_mask_idx: 0.38221| ppl: 189.09721| %_neg_is_pos: 0.04629| lr: 0.0| temp: 1.98514 | loss: 1.12009| constrast_loss: 4.41058| div_loss: 0.69765| %_mask_idx: 0.37375| ppl: 193.50281| %_neg_is_pos: 0.04941| lr: 0.0| temp: 1.98514 | loss: 1.13344| constrast_loss: 4.46634| div_loss: 0.67438| %_mask_idx: 0.40836| ppl: 208.39426| %_neg_is_pos: 0.02587| lr: 0.0| temp: 1.98513 | loss: 1.13962| constrast_loss: 4.48994| div_loss: 0.68531| %_mask_idx: 0.43421| ppl: 201.40413| %_neg_is_pos: 0.03354| lr: 0.0| temp: 1.98513 [2021-09-02 00:29:46,077] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 8.0, reducing to 4.0 [2021-09-02 00:29:46,077] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 8.0, reducing to 4.0 | loss: 1.12334| constrast_loss: 4.42742| div_loss: 0.65932| %_mask_idx: 0.3761| ppl: 218.03409| %_neg_is_pos: 0.02341| lr: 0.0| temp: 1.98511 | loss: 1.1185| constrast_loss: 4.40541| div_loss: 0.68587| %_mask_idx: 0.38831| ppl: 201.04602| %_neg_is_pos: 0.02823| lr: 0.0| temp: 1.98511 | loss: 1.12604| constrast_loss: 4.4347| div_loss: 0.6945| %_mask_idx: 0.38487| ppl: 195.51721| %_neg_is_pos: 0.03382| lr: 0.0| temp: 1.9851 | loss: 1.12567| constrast_loss: 4.43395| div_loss: 0.68735| %_mask_idx: 0.44314| ppl: 200.09674| %_neg_is_pos: 0.03973| lr: 0.0| temp: 1.9851 | loss: 1.12342| constrast_loss: 4.4257| div_loss: 0.67978| %_mask_idx: 0.38315| ppl: 204.9425| %_neg_is_pos: 0.02491| lr: 0.0| temp: 1.98509 | loss: 1.12542| constrast_loss: 4.43594| div_loss: 0.6574| %_mask_idx: 0.38205| ppl: 219.26581| %_neg_is_pos: 0.03721| lr: 0.0| temp: 1.98509 | loss: 1.11898| constrast_loss: 4.40653| div_loss: 0.69386| %_mask_idx: 0.3432| ppl: 195.92776| %_neg_is_pos: 0.03084| lr: 0.0| temp: 1.98508 | loss: 1.10844| constrast_loss: 4.3654| div_loss: 0.68381| %_mask_idx: 0.35307| ppl: 202.36206| %_neg_is_pos: 0.0362| lr: 0.0| temp: 1.98508 | loss: 1.14137| constrast_loss: 4.49614| div_loss: 0.69319| %_mask_idx: 0.35605| ppl: 196.35628| %_neg_is_pos: 0.03557| lr: 0.0| temp: 1.98506 | loss: 1.1465| constrast_loss: 4.51949| div_loss: 0.66527| %_mask_idx: 0.3562| ppl: 214.23003| %_neg_is_pos: 0.03991| lr: 0.0| temp: 1.98506 | loss: 1.12866| constrast_loss: 4.4469| div_loss: 0.6774| %_mask_idx: 0.42497| ppl: 206.46475| %_neg_is_pos: 0.02532| lr: 0.0| temp: 1.98505 | loss: 1.12024| constrast_loss: 4.41244| div_loss: 0.68505| %_mask_idx: 0.40069| ppl: 201.56644| %_neg_is_pos: 0.03554| lr: 0.0| temp: 1.98505 | loss: 1.13869| constrast_loss: 4.49096| div_loss: 0.63816| %_mask_idx: 0.42622| ppl: 231.57767| %_neg_is_pos: 0.02066| lr: 0.0| temp: 1.98504 | loss: 1.11567| constrast_loss: 4.39283| div_loss: 0.69838| %_mask_idx: 0.36967| ppl: 193.03827| %_neg_is_pos: 0.06566| lr: 0.0| temp: 1.98504 | loss: 1.09678| constrast_loss: 4.31476| div_loss: 0.7238| %_mask_idx: 0.38001| ppl: 176.76836| %_neg_is_pos: 0.06874| lr: 0.0| temp: 1.98503 | loss: 1.12543| constrast_loss: 4.43313| div_loss: 0.68581| %_mask_idx: 0.35213| ppl: 201.0799| %_neg_is_pos: 0.03354| lr: 0.0| temp: 1.98503 | loss: 1.11435| constrast_loss: 4.38764| div_loss: 0.69747| %_mask_idx: 0.42607| ppl: 193.61633| %_neg_is_pos: 0.03104| lr: 0.0| temp: 1.98501 | loss: 1.12625| constrast_loss: 4.43441| div_loss: 0.70582| %_mask_idx: 0.34806| ppl: 188.27341| %_neg_is_pos: 0.06085| lr: 0.0| temp: 1.98501 | loss: 1.12655| constrast_loss: 4.43705| div_loss: 0.69157| %_mask_idx: 0.37547| ppl: 197.39697| %_neg_is_pos: 0.06111| lr: 0.0| temp: 1.985 | loss: 1.12134| constrast_loss: 4.42031| div_loss: 0.65043| %_mask_idx: 0.38064| ppl: 223.72189| %_neg_is_pos: 0.04147| lr: 0.0| temp: 1.985 | loss: 1.12992| constrast_loss: 4.45511| div_loss: 0.64575| %_mask_idx: 0.4115| ppl: 226.72304| %_neg_is_pos: 0.02873| lr: 0.0| temp: 1.98499 | loss: 1.11763| constrast_loss: 4.40092| div_loss: 0.69616| %_mask_idx: 0.43484| ppl: 194.46011| %_neg_is_pos: 0.03671| lr: 0.0| temp: 1.98499 | loss: 1.1058| constrast_loss: 4.35239| div_loss: 0.7081| %_mask_idx: 0.36497| ppl: 186.81557| %_neg_is_pos: 0.04497| lr: 0.0| temp: 1.98498 | loss: 1.14068| constrast_loss: 4.49403| div_loss: 0.68681| %_mask_idx: 0.3537| ppl: 200.4442| %_neg_is_pos: 0.04151| lr: 0.0| temp: 1.98498 | loss: 1.08181| constrast_loss: 4.25231| div_loss: 0.74924| %_mask_idx: 0.38393| ppl: 160.48442| %_neg_is_pos: 0.06065| lr: 0.0| temp: 1.98496| loss: 1.11921| constrast_loss: 4.40712| div_loss: 0.69702| %_mask_idx: 0.41087| ppl: 193.90979| %_neg_is_pos: 0.03093| lr: 0.0| temp: 1.98496 | loss: 1.11368| constrast_loss: 4.38538| div_loss: 0.69345| %_mask_idx: 0.41056| ppl: 196.19299| %_neg_is_pos: 0.04234| lr: 0.0| temp: 1.98495 | loss: 1.11414| constrast_loss: 4.38317| div_loss: 0.73381| %_mask_idx: 0.39646| ppl: 170.36478| %_neg_is_pos: 0.05504| lr: 0.0| temp: 1.98495 | loss: 1.11762| constrast_loss: 4.4016| div_loss: 0.68893| %_mask_idx: 0.45489| ppl: 199.08249| %_neg_is_pos: 0.04095| lr: 0.0| temp: 1.98493 | loss: 1.12159| constrast_loss: 4.41705| div_loss: 0.69322| %_mask_idx: 0.37892| ppl: 196.34108| %_neg_is_pos: 0.05512| lr: 0.0| temp: 1.98493 | loss: 1.09757| constrast_loss: 4.31718| div_loss: 0.73106| %_mask_idx: 0.36779| ppl: 172.12106| %_neg_is_pos: 0.06796| lr: 0.0| temp: 1.98492 | loss: 1.1149| constrast_loss: 4.39066| div_loss: 0.68935| %_mask_idx: 0.39427| ppl: 198.81465| %_neg_is_pos: 0.06667| lr: 0.0| temp: 1.98492 | loss: 1.08992| constrast_loss: 4.28714| div_loss: 0.72532| %_mask_idx: 0.35761| ppl: 175.79544| %_neg_is_pos: 0.06522| lr: 0.0| temp: 1.98491 | loss: 1.10859| constrast_loss: 4.36493| div_loss: 0.69446| %_mask_idx: 0.39662| ppl: 195.54649| %_neg_is_pos: 0.03945| lr: 0.0| temp: 1.98491 | loss: 1.11245| constrast_loss: 4.38146| div_loss: 0.68341| %_mask_idx: 0.36278| ppl: 202.61688| %_neg_is_pos: 0.0379| lr: 0.0| temp: 1.9849 | loss: 1.1125| constrast_loss: 4.3811| div_loss: 0.68893| %_mask_idx: 0.38549| ppl: 199.08765| %_neg_is_pos: 0.04268| lr: 0.0| temp: 1.9849 | loss: 1.11406| constrast_loss: 4.38572| div_loss: 0.70535| %_mask_idx: 0.3656| ppl: 188.57639| %_neg_is_pos: 0.04013| lr: 0.0| temp: 1.98488 | loss: 1.09515| constrast_loss: 4.30661| div_loss: 0.73974| %_mask_idx: 0.40053| ppl: 166.56445| %_neg_is_pos: 0.04882| lr: 0.0| temp: 1.98488 | loss: 1.1066| constrast_loss: 4.35834| div_loss: 0.68079| %_mask_idx: 0.40977| ppl: 204.29419| %_neg_is_pos: 0.04941| lr: 0.0| temp: 1.98487 | loss: 1.11521| constrast_loss: 4.39167| div_loss: 0.69169| %_mask_idx: 0.42152| ppl: 197.31854| %_neg_is_pos: 0.03443| lr: 0.0| temp: 1.98487 | loss: 1.09291| constrast_loss: 4.29905| div_loss: 0.72584| %_mask_idx: 0.32863| ppl: 175.45984| %_neg_is_pos: 0.08365| lr: 0.0| temp: 1.98486 | loss: 1.13155| constrast_loss: 4.45733| div_loss: 0.68876| %_mask_idx: 0.38863| ppl: 199.19089| %_neg_is_pos: 0.04538| lr: 0.0| temp: 1.98486 | loss: 1.11834| constrast_loss: 4.40759| div_loss: 0.65782| %_mask_idx: 0.43405| ppl: 218.99725| %_neg_is_pos: 0.02802| lr: 0.0| temp: 1.98485 | loss: 1.10725| constrast_loss: 4.35732| div_loss: 0.71691| %_mask_idx: 0.3667| ppl: 181.17761| %_neg_is_pos: 0.05029| lr: 0.0| temp: 1.98485 | loss: 1.09796| constrast_loss: 4.32126| div_loss: 0.70591| %_mask_idx: 0.3526| ppl: 188.22058| %_neg_is_pos: 0.06479| lr: 0.0| temp: 1.98483 | loss: 1.10058| constrast_loss: 4.33197| div_loss: 0.70344| %_mask_idx: 0.42137| ppl: 189.8009| %_neg_is_pos: 0.04051| lr: 0.0| temp: 1.98483 | loss: 1.11628| constrast_loss: 4.39677| div_loss: 0.68364| %_mask_idx: 0.37704| ppl: 202.46725| %_neg_is_pos: 0.03982| lr: 0.0| temp: 1.98482 | loss: 1.1319| constrast_loss: 4.46046| div_loss: 0.67137| %_mask_idx: 0.44032| ppl: 210.32458| %_neg_is_pos: 0.01967| lr: 0.0| temp: 1.98482 | loss: 1.12289| constrast_loss: 4.4234| div_loss: 0.68153| %_mask_idx: 0.39364| ppl: 203.82146| %_neg_is_pos: 0.03625| lr: 0.0| temp: 1.98481 | loss: 1.1203| constrast_loss: 4.4111| div_loss: 0.70115| %_mask_idx: 0.3974| ppl: 191.26398| %_neg_is_pos: 0.04029| lr: 0.0| temp: 1.98481 | loss: 1.10923| constrast_loss: 4.36583| div_loss: 0.71075| %_mask_idx: 0.43578| ppl: 185.11823| %_neg_is_pos: 0.0324| lr: 0.0| temp: 1.9848 | loss: 1.11972| constrast_loss: 4.41234| div_loss: 0.66548| %_mask_idx: 0.42591| ppl: 214.09441| %_neg_is_pos: 0.03151| lr: 0.0| temp: 1.9848 | loss: 1.12461| constrast_loss: 4.42913| div_loss: 0.69323| %_mask_idx: 0.39834| ppl: 196.33508| %_neg_is_pos: 0.05133| lr: 0.0| temp: 1.98478 | loss: 1.11198| constrast_loss: 4.377| div_loss: 0.70908| %_mask_idx: 0.40742| ppl: 186.19135| %_neg_is_pos: 0.04378| lr: 0.0| temp: 1.98478 | loss: 1.12504| constrast_loss: 4.43098| div_loss: 0.69189| %_mask_idx: 0.37265| ppl: 197.18887| %_neg_is_pos: 0.04945| lr: 0.0| temp: 1.98477 | loss: 1.11153| constrast_loss: 4.37793| div_loss: 0.682| %_mask_idx: 0.41541| ppl: 203.52103| %_neg_is_pos: 0.03279| lr: 0.0| temp: 1.98477 | loss: 1.09653| constrast_loss: 4.3145| div_loss: 0.71602| %_mask_idx: 0.37014| ppl: 181.7442| %_neg_is_pos: 0.06485| lr: 0.0| temp: 1.98475 | loss: 1.0882| constrast_loss: 4.28047| div_loss: 0.72339| %_mask_idx: 0.41526| ppl: 177.03101| %_neg_is_pos: 0.05449| lr: 0.0| temp: 1.98475 | loss: 1.11168| constrast_loss: 4.37849| div_loss: 0.6822| %_mask_idx: 0.3963| ppl: 203.3949| %_neg_is_pos: 0.03987| lr: 0.0| temp: 1.98474 | loss: 1.11254| constrast_loss: 4.38145| div_loss: 0.68704| %_mask_idx: 0.40555| ppl: 200.29221| %_neg_is_pos: 0.04464| lr: 0.0| temp: 1.98474 | loss: 1.09612| constrast_loss: 4.31549| div_loss: 0.68975| %_mask_idx: 0.38299| ppl: 198.55853| %_neg_is_pos: 0.05441| lr: 0.0| temp: 1.98473 | loss: 1.13525| constrast_loss: 4.4743| div_loss: 0.66686| %_mask_idx: 0.38409| ppl: 213.20734| %_neg_is_pos: 0.04262| lr: 0.0| temp: 1.98473 | loss: 1.1233| constrast_loss: 4.42704| div_loss: 0.66149| %_mask_idx: 0.43954| ppl: 216.64545| %_neg_is_pos: 0.03813| lr: 0.0| temp: 1.98472 | loss: 1.11171| constrast_loss: 4.37606| div_loss: 0.70777| %_mask_idx: 0.41338| ppl: 187.02725| %_neg_is_pos: 0.0543| lr: 0.0| temp: 1.98472 | loss: 1.10587| constrast_loss: 4.35491| div_loss: 0.68561| %_mask_idx: 0.42011| ppl: 201.21091| %_neg_is_pos: 0.04406| lr: 0.0| temp: 1.9847 | loss: 1.11764| constrast_loss: 4.40185| div_loss: 0.68709| %_mask_idx: 0.36748| ppl: 200.26166| %_neg_is_pos: 0.05827| lr: 0.0| temp: 1.9847 | loss: 1.09292| constrast_loss: 4.29978| div_loss: 0.71887| %_mask_idx: 0.35793| ppl: 179.92151| %_neg_is_pos: 0.0644| lr: 0.0| temp: 1.98469 | loss: 1.13087| constrast_loss: 4.45778| div_loss: 0.65681| %_mask_idx: 0.41322| ppl: 219.64081| %_neg_is_pos: 0.0233| lr: 0.0| temp: 1.98469 | loss: 1.12272| constrast_loss: 4.42365| div_loss: 0.67232| %_mask_idx: 0.40445| ppl: 209.71228| %_neg_is_pos: 0.04547| lr: 0.0| temp: 1.98468 | loss: 1.10235| constrast_loss: 4.34068| div_loss: 0.6871| %_mask_idx: 0.36983| ppl: 200.25919| %_neg_is_pos: 0.05682| lr: 0.0| temp: 1.98468 | loss: 1.10194| constrast_loss: 4.33673| div_loss: 0.71049| %_mask_idx: 0.39427| ppl: 185.28691| %_neg_is_pos: 0.05443| lr: 0.0| temp: 1.98467 | loss: 1.12723| constrast_loss: 4.44148| div_loss: 0.67426| %_mask_idx: 0.37281| ppl: 208.47305| %_neg_is_pos: 0.06819| lr: 0.0| temp: 1.98467 | loss: 1.11308| constrast_loss: 4.38479| div_loss: 0.67523| %_mask_idx: 0.43076| ppl: 207.85371| %_neg_is_pos: 0.03458| lr: 0.0| temp: 1.98465 | loss: 1.12414| constrast_loss: 4.42421| div_loss: 0.72333| %_mask_idx: 0.37813| ppl: 177.06786| %_neg_is_pos: 0.06985| lr: 0.0| temp: 1.98465 | loss: 1.128| constrast_loss: 4.44672| div_loss: 0.65288| %_mask_idx: 0.43014| ppl: 222.15985| %_neg_is_pos: 0.02778| lr: 0.0| temp: 1.98464 | loss: 1.09867| constrast_loss: 4.32481| div_loss: 0.69888| %_mask_idx: 0.42732| ppl: 192.71458| %_neg_is_pos: 0.03618| lr: 0.0| temp: 1.98464 | loss: 1.12829| constrast_loss: 4.44572| div_loss: 0.67438| %_mask_idx: 0.33443| ppl: 208.39917| %_neg_is_pos: 0.05472| lr: 0.0| temp: 1.98463 | loss: 1.13554| constrast_loss: 4.4764| div_loss: 0.6577| %_mask_idx: 0.33913| ppl: 219.07076| %_neg_is_pos: 0.04573| lr: 0.0| temp: 1.98463 | loss: 1.11183| constrast_loss: 4.37929| div_loss: 0.68019| %_mask_idx: 0.40163| ppl: 204.67807| %_neg_is_pos: 0.04324| lr: 0.0| temp: 1.98462 | loss: 1.11196| constrast_loss: 4.37731| div_loss: 0.7051| %_mask_idx: 0.38487| ppl: 188.73315| %_neg_is_pos: 0.05035| lr: 0.0| temp: 1.98462 | loss: 1.12539| constrast_loss: 4.42951| div_loss: 0.7205| %_mask_idx: 0.47509| ppl: 178.88054| %_neg_is_pos: 0.041| lr: 0.0| temp: 1.9846 | loss: 1.12891| constrast_loss: 4.44957| div_loss: 0.66066| %_mask_idx: 0.375| ppl: 217.18015| %_neg_is_pos: 0.03991| lr: 0.0| temp: 1.9846 | loss: 1.12839| constrast_loss: 4.44516| div_loss: 0.68411| %_mask_idx: 0.43061| ppl: 202.17157| %_neg_is_pos: 0.03393| lr: 0.0| temp: 1.98459 | loss: 1.10462| constrast_loss: 4.34752| div_loss: 0.70946| %_mask_idx: 0.41103| ppl: 185.94839| %_neg_is_pos: 0.04826| lr: 0.0| temp: 1.98459 | loss: 1.09912| constrast_loss: 4.32528| div_loss: 0.71182| %_mask_idx: 0.36623| ppl: 184.4328| %_neg_is_pos: 0.06076| lr: 0.0| temp: 1.98457 | loss: 1.12779| constrast_loss: 4.4435| div_loss: 0.67653| %_mask_idx: 0.41667| ppl: 207.02116| %_neg_is_pos: 0.02997| lr: 0.0| temp: 1.98457 | loss: 1.11367| constrast_loss: 4.38412| div_loss: 0.70572| %_mask_idx: 0.39129| ppl: 188.34149| %_neg_is_pos: 0.0469| lr: 0.0| temp: 1.98456 | loss: 1.10848| constrast_loss: 4.3653| div_loss: 0.68605| %_mask_idx: 0.37218| ppl: 200.92868| %_neg_is_pos: 0.04177| lr: 0.0| temp: 1.98456 | loss: 1.13058| constrast_loss: 4.45536| div_loss: 0.66943| %_mask_idx: 0.4256| ppl: 211.56165| %_neg_is_pos: 0.02977| lr: 0.0| temp: 1.98455 | loss: 1.10545| constrast_loss: 4.35163| div_loss: 0.70165| %_mask_idx: 0.35072| ppl: 190.94527| %_neg_is_pos: 0.044| lr: 0.0| temp: 1.98455 | loss: 1.12677| constrast_loss: 4.43769| div_loss: 0.69401| %_mask_idx: 0.38174| ppl: 195.83507| %_neg_is_pos: 0.04713| lr: 0.0| temp: 1.98454 | loss: 1.12031| constrast_loss: 4.41445| div_loss: 0.66806| %_mask_idx: 0.34743| ppl: 212.4416| %_neg_is_pos: 0.05716| lr: 0.0| temp: 1.98454 | loss: 1.10802| constrast_loss: 4.36012| div_loss: 0.71956| %_mask_idx: 0.40022| ppl: 179.48392| %_neg_is_pos: 0.05385| lr: 0.0| temp: 1.98452 | loss: 1.11139| constrast_loss: 4.3774| div_loss: 0.68166| %_mask_idx: 0.36435| ppl: 203.73877| %_neg_is_pos: 0.05061| lr: 0.0| temp: 1.98452 | loss: 1.10971| constrast_loss: 4.37051| div_loss: 0.68338| %_mask_idx: 0.42215| ppl: 202.63989| %_neg_is_pos: 0.03623| lr: 0.0| temp: 1.98451 | loss: 1.11152| constrast_loss: 4.3751| div_loss: 0.70986| %_mask_idx: 0.40868| ppl: 185.68901| %_neg_is_pos: 0.04455| lr: 0.0| temp: 1.98451 | loss: 1.11115| constrast_loss: 4.37234| div_loss: 0.72244| %_mask_idx: 0.41714| ppl: 177.63918| %_neg_is_pos: 0.04941| lr: 0.0| temp: 1.9845 | loss: 1.10853| constrast_loss: 4.36527| div_loss: 0.68857| %_mask_idx: 0.41996| ppl: 199.31828| %_neg_is_pos: 0.04267| lr: 0.0| temp: 1.9845 | loss: 1.12162| constrast_loss: 4.41951| div_loss: 0.66989| %_mask_idx: 0.39192| ppl: 211.2724| %_neg_is_pos: 0.03776| lr: 0.0| temp: 1.98449 | loss: 1.12| constrast_loss: 4.41137| div_loss: 0.68641| %_mask_idx: 0.36153| ppl: 200.6987| %_neg_is_pos: 0.05151| lr: 0.0| temp: 1.98449 | loss: 1.09494| constrast_loss: 4.30593| div_loss: 0.73812| %_mask_idx: 0.39082| ppl: 167.60028| %_neg_is_pos: 0.06921| lr: 0.0| temp: 1.98447 | loss: 1.1128| constrast_loss: 4.37867| div_loss: 0.72547| %_mask_idx: 0.40727| ppl: 175.69734| %_neg_is_pos: 0.06087| lr: 0.0| temp: 1.98447 | loss: 1.14305| constrast_loss: 4.5073| div_loss: 0.64892| %_mask_idx: 0.41949| ppl: 224.6904| %_neg_is_pos: 0.02203| lr: 0.0| temp: 1.98446 | loss: 1.07496| constrast_loss: 4.22671| div_loss: 0.73134| %_mask_idx: 0.32127| ppl: 171.94028| %_neg_is_pos: 0.09289| lr: 0.0| temp: 1.98446 | loss: 1.13558| constrast_loss: 4.47715| div_loss: 0.6516| %_mask_idx: 0.44549| ppl: 222.97513| %_neg_is_pos: 0.0325| lr: 0.0| temp: 1.98445 | loss: 1.0835| constrast_loss: 4.25926| div_loss: 0.74716| %_mask_idx: 0.35777| ppl: 161.8154| %_neg_is_pos: 0.05419| lr: 0.0| temp: 1.98445 | loss: 1.10646| constrast_loss: 4.35712| div_loss: 0.68724| %_mask_idx: 0.43029| ppl: 200.16431| %_neg_is_pos: 0.03523| lr: 0.0| temp: 1.98444 | loss: 1.12579| constrast_loss: 4.43413| div_loss: 0.69043| %_mask_idx: 0.38095| ppl: 198.12747| %_neg_is_pos: 0.03959| lr: 0.0| temp: 1.98444 | loss: 1.08137| constrast_loss: 4.25297| div_loss: 0.72501| %_mask_idx: 0.35072| ppl: 175.99515| %_neg_is_pos: 0.07527| lr: 0.0| temp: 1.98442 | loss: 1.11614| constrast_loss: 4.39318| div_loss: 0.71367| %_mask_idx: 0.38628| ppl: 183.25304| %_neg_is_pos: 0.07263| lr: 0.0| temp: 1.98442 | loss: 1.13152| constrast_loss: 4.46118| div_loss: 0.64887| %_mask_idx: 0.43139| ppl: 224.72517| %_neg_is_pos: 0.04598| lr: 0.0| temp: 1.98441 | loss: 1.11006| constrast_loss: 4.37035| div_loss: 0.69871| %_mask_idx: 0.39207| ppl: 192.82303| %_neg_is_pos: 0.04044| lr: 0.0| temp: 1.98441 [2021-09-02 00:39:01,600] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 4.0, reducing to 2.0 [2021-09-02 00:39:01,601] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 4.0, reducing to 2.0 | loss: 1.10074| constrast_loss: 4.32977| div_loss: 0.73203| %_mask_idx: 0.39599| ppl: 171.5031| %_neg_is_pos: 0.04286| lr: 0.0| temp: 1.98439 | loss: 1.08305| constrast_loss: 4.25718| div_loss: 0.75012| %_mask_idx: 0.35495| ppl: 159.92496| %_neg_is_pos: 0.08885| lr: 0.0| temp: 1.98439 | loss: 1.10705| constrast_loss: 4.35858| div_loss: 0.69603| %_mask_idx: 0.37892| ppl: 194.5383| %_neg_is_pos: 0.04371| lr: 0.0| temp: 1.98438 | loss: 1.10566| constrast_loss: 4.35427| div_loss: 0.68366| %_mask_idx: 0.40335| ppl: 202.45932| %_neg_is_pos: 0.03636| lr: 0.0| temp: 1.98438 | loss: 1.13565| constrast_loss: 4.47741| div_loss: 0.65173| %_mask_idx: 0.39536| ppl: 222.89371| %_neg_is_pos: 0.03669| lr: 0.0| temp: 1.98437 | loss: 1.10307| constrast_loss: 4.34551| div_loss: 0.66755| %_mask_idx: 0.38534| ppl: 212.7674| %_neg_is_pos: 0.03841| lr: 0.0| temp: 1.98437 | loss: 1.11176| constrast_loss: 4.37745| div_loss: 0.69588| %_mask_idx: 0.36638| ppl: 194.63429| %_neg_is_pos: 0.04098| lr: 0.0| temp: 1.98436 | loss: 1.10842| constrast_loss: 4.36545| div_loss: 0.68211| %_mask_idx: 0.37563| ppl: 203.44647| %_neg_is_pos: 0.04503| lr: 0.0| temp: 1.98436 | loss: 1.11747| constrast_loss: 4.39942| div_loss: 0.70456| %_mask_idx: 0.36795| ppl: 189.08081| %_neg_is_pos: 0.03215| lr: 0.0| temp: 1.98434 | loss: 1.13907| constrast_loss: 4.49156| div_loss: 0.64743| %_mask_idx: 0.43844| ppl: 225.64606| %_neg_is_pos: 0.02048| lr: 0.0| temp: 1.98434 | loss: 1.13856| constrast_loss: 4.48913| div_loss: 0.6513| %_mask_idx: 0.42951| ppl: 223.17041| %_neg_is_pos: 0.02291| lr: 0.0| temp: 1.98433 | loss: 1.14352| constrast_loss: 4.51021| div_loss: 0.63855| %_mask_idx: 0.43562| ppl: 231.32823| %_neg_is_pos: 0.02025| lr: 0.0| temp: 1.98433 | loss: 1.12877| constrast_loss: 4.447| div_loss: 0.68089| %_mask_idx: 0.42105| ppl: 204.22723| %_neg_is_pos: 0.02992| lr: 0.0| temp: 1.98432 | loss: 1.11597| constrast_loss: 4.39503| div_loss: 0.6884| %_mask_idx: 0.39082| ppl: 199.42296| %_neg_is_pos: 0.04591| lr: 0.0| temp: 1.98432 | loss: 1.1218| constrast_loss: 4.41957| div_loss: 0.67642| %_mask_idx: 0.37406| ppl: 207.08893| %_neg_is_pos: 0.03188| lr: 0.0| temp: 1.98431 | loss: 1.124| constrast_loss: 4.42858| div_loss: 0.67443| %_mask_idx: 0.40962| ppl: 208.36212| %_neg_is_pos: 0.02576| lr: 0.0| temp: 1.98431 | loss: 1.12683| constrast_loss: 4.43795| div_loss: 0.69375| %_mask_idx: 0.42137| ppl: 196.00073| %_neg_is_pos: 0.04546| lr: 0.0| temp: 1.98429 | loss: 1.13827| constrast_loss: 4.48597| div_loss: 0.67113| %_mask_idx: 0.4281| ppl: 210.47729| %_neg_is_pos: 0.03822| lr: 0.0| temp: 1.98429 | loss: 1.11974| constrast_loss: 4.41198| div_loss: 0.66964| %_mask_idx: 0.36137| ppl: 211.43283| %_neg_is_pos: 0.04075| lr: 0.0| temp: 1.98428 | loss: 1.1348| constrast_loss: 4.4722| div_loss: 0.66977| %_mask_idx: 0.42967| ppl: 211.34825| %_neg_is_pos: 0.02986| lr: 0.0| temp: 1.98428 | loss: 1.13613| constrast_loss: 4.47612| div_loss: 0.68414| %_mask_idx: 0.42199| ppl: 202.15013| %_neg_is_pos: 0.04639| lr: 0.0| temp: 1.98427 | loss: 1.13307| constrast_loss: 4.46458| div_loss: 0.67685| %_mask_idx: 0.33929| ppl: 206.81361| %_neg_is_pos: 0.03258| lr: 0.0| temp: 1.98427 | loss: 1.10782| constrast_loss: 4.36229| div_loss: 0.68996| %_mask_idx: 0.39207| ppl: 198.42731| %_neg_is_pos: 0.04931| lr: 0.0| temp: 1.98426 | loss: 1.12621| constrast_loss: 4.43827| div_loss: 0.66561| %_mask_idx: 0.40962| ppl: 214.01242| %_neg_is_pos: 0.0183| lr: 0.0| temp: 1.98426 | loss: 1.12164| constrast_loss: 4.41648| div_loss: 0.70071| %_mask_idx: 0.39442| ppl: 191.54773| %_neg_is_pos: 0.04161| lr: 0.0| temp: 1.98424 | loss: 1.11986| constrast_loss: 4.41197| div_loss: 0.67466| %_mask_idx: 0.42231| ppl: 208.21979| %_neg_is_pos: 0.03212| lr: 0.0| temp: 1.98424 | loss: 1.12133| constrast_loss: 4.42068| div_loss: 0.64653| %_mask_idx: 0.40602| ppl: 226.21957| %_neg_is_pos: 0.02725| lr: 0.0| temp: 1.98423 | loss: 1.12473| constrast_loss: 4.43124| div_loss: 0.67676| %_mask_idx: 0.34445| ppl: 206.87378| %_neg_is_pos: 0.05205| lr: 0.0| temp: 1.98423 | loss: 1.10329| constrast_loss: 4.34513| div_loss: 0.68035| %_mask_idx: 0.35526| ppl: 204.57764| %_neg_is_pos: 0.0463| lr: 0.0| temp: 1.98421 | loss: 1.12217| constrast_loss: 4.42089| div_loss: 0.67809| %_mask_idx: 0.37939| ppl: 206.02052| %_neg_is_pos: 0.03648| lr: 0.0| temp: 1.98421 | loss: 1.11554| constrast_loss: 4.39374| div_loss: 0.68404| %_mask_idx: 0.3562| ppl: 202.21278| %_neg_is_pos: 0.03854| lr: 0.0| temp: 1.9842 | loss: 1.10691| constrast_loss: 4.35852| div_loss: 0.69126| %_mask_idx: 0.38753| ppl: 197.59274| %_neg_is_pos: 0.07115| lr: 0.0| temp: 1.9842 | loss: 1.11577| constrast_loss: 4.39258| div_loss: 0.70514| %_mask_idx: 0.37296| ppl: 188.70787| %_neg_is_pos: 0.07544| lr: 0.0| temp: 1.98419 | loss: 1.12089| constrast_loss: 4.41502| div_loss: 0.68554| %_mask_idx: 0.40147| ppl: 201.25153| %_neg_is_pos: 0.02056| lr: 0.0| temp: 1.98419 | loss: 1.12248| constrast_loss: 4.42311| div_loss: 0.66815| %_mask_idx: 0.41886| ppl: 212.38193| %_neg_is_pos: 0.04456| lr: 0.0| temp: 1.98418 | loss: 1.12346| constrast_loss: 4.42822| div_loss: 0.65642| %_mask_idx: 0.40179| ppl: 219.89224| %_neg_is_pos: 0.019| lr: 0.0| temp: 1.98418 | loss: 1.13257| constrast_loss: 4.46131| div_loss: 0.68964| %_mask_idx: 0.38628| ppl: 198.62871| %_neg_is_pos: 0.02722| lr: 0.0| temp: 1.98416 | loss: 1.09857| constrast_loss: 4.32223| div_loss: 0.72068| %_mask_idx: 0.36059| ppl: 178.76562| %_neg_is_pos: 0.04352| lr: 0.0| temp: 1.98416 | loss: 1.1007| constrast_loss: 4.331| div_loss: 0.71778| %_mask_idx: 0.4068| ppl: 180.62015| %_neg_is_pos: 0.04763| lr: 0.0| temp: 1.98415 | loss: 1.13373| constrast_loss: 4.46858| div_loss: 0.66327| %_mask_idx: 0.35558| ppl: 215.50708| %_neg_is_pos: 0.02785| lr: 0.0| temp: 1.98415 | loss: 1.10698| constrast_loss: 4.35822| div_loss: 0.69685| %_mask_idx: 0.38878| ppl: 194.01468| %_neg_is_pos: 0.03143| lr: 0.0| temp: 1.98414 | loss: 1.11168| constrast_loss: 4.38013| div_loss: 0.66576| %_mask_idx: 0.38722| ppl: 213.91199| %_neg_is_pos: 0.03881| lr: 0.0| temp: 1.98414 | loss: 1.13843| constrast_loss: 4.48842| div_loss: 0.65317| %_mask_idx: 0.34696| ppl: 221.9713| %_neg_is_pos: 0.0334| lr: 0.0| temp: 1.98413 | loss: 1.11177| constrast_loss: 4.37573| div_loss: 0.71349| %_mask_idx: 0.37688| ppl: 183.36923| %_neg_is_pos: 0.04827| lr: 0.0| temp: 1.98413 | loss: 1.12738| constrast_loss: 4.44475| div_loss: 0.64785| %_mask_idx: 0.40273| ppl: 225.37643| %_neg_is_pos: 0.03783| lr: 0.0| temp: 1.98411 | loss: 1.123| constrast_loss: 4.42212| div_loss: 0.6989| %_mask_idx: 0.39411| ppl: 192.70309| %_neg_is_pos: 0.03948| lr: 0.0| temp: 1.98411 | loss: 1.11715| constrast_loss: 4.40278| div_loss: 0.65833| %_mask_idx: 0.38221| ppl: 218.66852| %_neg_is_pos: 0.04475| lr: 0.0| temp: 1.9841 | loss: 1.13577| constrast_loss: 4.47932| div_loss: 0.63755| %_mask_idx: 0.4093| ppl: 231.96622| %_neg_is_pos: 0.02304| lr: 0.0| temp: 1.9841 | loss: 1.11442| constrast_loss: 4.38744| div_loss: 0.70264| %_mask_idx: 0.38816| ppl: 190.31171| %_neg_is_pos: 0.0473| lr: 0.0| temp: 1.98409 | loss: 1.12531| constrast_loss: 4.43595| div_loss: 0.65289| %_mask_idx: 0.37782| ppl: 222.14871| %_neg_is_pos: 0.02579| lr: 0.0| temp: 1.98409 | loss: 1.12834| constrast_loss: 4.44549| div_loss: 0.67882| %_mask_idx: 0.38753| ppl: 205.55832| %_neg_is_pos: 0.02988| lr: 0.0| temp: 1.98408 | loss: 1.11869| constrast_loss: 4.40667| div_loss: 0.68102| %_mask_idx: 0.35902| ppl: 204.14764| %_neg_is_pos: 0.03322| lr: 0.0| temp: 1.98408 | loss: 1.1195| constrast_loss: 4.40542| div_loss: 0.72583| %_mask_idx: 0.41776| ppl: 175.46918| %_neg_is_pos: 0.03247| lr: 0.0| temp: 1.98406 | loss: 1.11626| constrast_loss: 4.39613| div_loss: 0.6892| %_mask_idx: 0.39004| ppl: 198.9138| %_neg_is_pos: 0.03667| lr: 0.0| temp: 1.98406 | loss: 1.11006| constrast_loss: 4.37068| div_loss: 0.69582| %_mask_idx: 0.35464| ppl: 194.67418| %_neg_is_pos: 0.04697| lr: 0.0| temp: 1.98405 | loss: 1.1247| constrast_loss: 4.43202| div_loss: 0.66763| %_mask_idx: 0.37265| ppl: 212.71843| %_neg_is_pos: 0.0342| lr: 0.0| temp: 1.98405 | loss: 1.12908| constrast_loss: 4.44817| div_loss: 0.68152| %_mask_idx: 0.41964| ppl: 203.8271| %_neg_is_pos: 0.02689| lr: 0.0| temp: 1.98403 | loss: 1.12514| constrast_loss: 4.43048| div_loss: 0.70097| %_mask_idx: 0.41996| ppl: 191.37762| %_neg_is_pos: 0.02211| lr: 0.0| temp: 1.98403 | loss: 1.12296| constrast_loss: 4.42263| div_loss: 0.6921| %_mask_idx: 0.37954| ppl: 197.05589| %_neg_is_pos: 0.05016| lr: 0.0| temp: 1.98402 | loss: 1.10116| constrast_loss: 4.33443| div_loss: 0.70188| %_mask_idx: 0.39145| ppl: 190.79912| %_neg_is_pos: 0.06058| lr: 0.0| temp: 1.98402 | loss: 1.1306| constrast_loss: 4.45412| div_loss: 0.68264| %_mask_idx: 0.33991| ppl: 203.11266| %_neg_is_pos: 0.04679| lr: 0.0| temp: 1.98401 | loss: 1.1244| constrast_loss: 4.43254| div_loss: 0.65047| %_mask_idx: 0.40367| ppl: 223.69618| %_neg_is_pos: 0.01826| lr: 0.0| temp: 1.98401 | loss: 1.12384| constrast_loss: 4.42728| div_loss: 0.68096| %_mask_idx: 0.3844| ppl: 204.18253| %_neg_is_pos: 0.03582| lr: 0.0| temp: 1.984 | loss: 1.1389| constrast_loss: 4.4891| div_loss: 0.66514| %_mask_idx: 0.40179| ppl: 214.30774| %_neg_is_pos: 0.01715| lr: 0.0| temp: 1.984 | loss: 1.12351| constrast_loss: 4.42653| div_loss: 0.67522| %_mask_idx: 0.42043| ppl: 207.85614| %_neg_is_pos: 0.02103| lr: 0.0| temp: 1.98398 | loss: 1.11814| constrast_loss: 4.40402| div_loss: 0.68532| %_mask_idx: 0.41228| ppl: 201.39725| %_neg_is_pos: 0.03967| lr: 0.0| temp: 1.98398 | loss: 1.12662| constrast_loss: 4.43701| div_loss: 0.69488| %_mask_idx: 0.32644| ppl: 195.27487| %_neg_is_pos: 0.05389| lr: 0.0| temp: 1.98397 | loss: 1.13103| constrast_loss: 4.4584| div_loss: 0.65723| %_mask_idx: 0.41588| ppl: 219.37143| %_neg_is_pos: 0.02565| lr: 0.0| temp: 1.98397 | loss: 1.10672| constrast_loss: 4.35701| div_loss: 0.69866| %_mask_idx: 0.42278| ppl: 192.85718| %_neg_is_pos: 0.04632| lr: 0.0| temp: 1.98396 | loss: 1.12536| constrast_loss: 4.43503| div_loss: 0.66401| %_mask_idx: 0.38487| ppl: 215.03619| %_neg_is_pos: 0.02483| lr: 0.0| temp: 1.98396 | loss: 1.11619| constrast_loss: 4.39889| div_loss: 0.65873| %_mask_idx: 0.39677| ppl: 218.4133| %_neg_is_pos: 0.05217| lr: 0.0| temp: 1.98395 | loss: 1.13507| constrast_loss: 4.47342| div_loss: 0.66852| %_mask_idx: 0.35949| ppl: 212.14929| %_neg_is_pos: 0.03296| lr: 0.0| temp: 1.98395 | loss: 1.10325| constrast_loss: 4.34428| div_loss: 0.68696| %_mask_idx: 0.37375| ppl: 200.34314| %_neg_is_pos: 0.06802| lr: 0.0| temp: 1.98393 | loss: 1.13249| constrast_loss: 4.46254| div_loss: 0.67415| %_mask_idx: 0.42372| ppl: 208.54414| %_neg_is_pos: 0.02463| lr: 0.0| temp: 1.98393 | loss: 1.1361| constrast_loss: 4.47682| div_loss: 0.6756| %_mask_idx: 0.39035| ppl: 207.6179| %_neg_is_pos: 0.0317| lr: 0.0| temp: 1.98392 | loss: 1.11287| constrast_loss: 4.38297| div_loss: 0.68516| %_mask_idx: 0.40633| ppl: 201.49933| %_neg_is_pos: 0.03259| lr: 0.0| temp: 1.98392 | loss: 1.12124| constrast_loss: 4.41996| div_loss: 0.65002| %_mask_idx: 0.36028| ppl: 223.98698| %_neg_is_pos: 0.02721| lr: 0.0| temp: 1.98391 | loss: 1.11883| constrast_loss: 4.40778| div_loss: 0.67555| %_mask_idx: 0.37751| ppl: 207.64886| %_neg_is_pos: 0.02436| lr: 0.0| temp: 1.98391 | loss: 1.12489| constrast_loss: 4.43028| div_loss: 0.69273| %_mask_idx: 0.35213| ppl: 196.65137| %_neg_is_pos: 0.03869| lr: 0.0| temp: 1.98391 | loss: 1.10774| constrast_loss: 4.35998| div_loss: 0.70962| %_mask_idx: 0.38205| ppl: 185.84633| %_neg_is_pos: 0.03739| lr: 0.0| temp: 1.98391 | loss: 1.14429| constrast_loss: 4.51398| div_loss: 0.63194| %_mask_idx: 0.41432| ppl: 235.55582| %_neg_is_pos: 0.02388| lr: 0.0| temp: 1.98389 | loss: 1.09331| constrast_loss: 4.30284| div_loss: 0.70391| %_mask_idx: 0.3797| ppl: 189.50064| %_neg_is_pos: 0.05556| lr: 0.0| temp: 1.98389 | loss: 1.14927| constrast_loss: 4.53722| div_loss: 0.59861| %_mask_idx: 0.37453| ppl: 256.88791| %_neg_is_pos: 0.01745| lr: 0.0| temp: 1.98388 | loss: 1.11058| constrast_loss: 4.37371| div_loss: 0.6861| %_mask_idx: 0.37484| ppl: 200.89841| %_neg_is_pos: 0.03736| lr: 0.0| temp: 1.98388 | loss: 1.10731| constrast_loss: 4.35975| div_loss: 0.69494| %_mask_idx: 0.39505| ppl: 195.23557| %_neg_is_pos: 0.03745| lr: 0.0| temp: 1.98386 | loss: 1.13153| constrast_loss: 4.4594| div_loss: 0.6673| %_mask_idx: 0.34132| ppl: 212.92921| %_neg_is_pos: 0.03556| lr: 0.0| temp: 1.98386 | loss: 1.12708| constrast_loss: 4.44146| div_loss: 0.66854| %_mask_idx: 0.37061| ppl: 212.13516| %_neg_is_pos: 0.0325| lr: 0.0| temp: 1.98385 | loss: 1.12937| constrast_loss: 4.44926| div_loss: 0.68209| %_mask_idx: 0.35354| ppl: 203.4606| %_neg_is_pos: 0.04536| lr: 0.0| temp: 1.98385 | loss: 1.11835| constrast_loss: 4.40343| div_loss: 0.69989| %_mask_idx: 0.37202| ppl: 192.07016| %_neg_is_pos: 0.04057| lr: 0.0| temp: 1.98384 | loss: 1.12354| constrast_loss: 4.42843| div_loss: 0.65745| %_mask_idx: 0.40132| ppl: 219.23099| %_neg_is_pos: 0.02992| lr: 0.0| temp: 1.98384 | loss: 1.11405| constrast_loss: 4.3869| div_loss: 0.69316| %_mask_idx: 0.44001| ppl: 196.38023| %_neg_is_pos: 0.02734| lr: 0.0| temp: 1.98383 | loss: 1.1074| constrast_loss: 4.36019| div_loss: 0.69421| %_mask_idx: 0.39317| ppl: 195.7043| %_neg_is_pos: 0.04987| lr: 0.0| temp: 1.98383 | loss: 1.12211| constrast_loss: 4.41913| div_loss: 0.69326| %_mask_idx: 0.33709| ppl: 196.31526| %_neg_is_pos: 0.05065| lr: 0.0| temp: 1.98381 | loss: 1.13663| constrast_loss: 4.47765| div_loss: 0.68877| %_mask_idx: 0.31861| ppl: 199.18817| %_neg_is_pos: 0.04282| lr: 0.0| temp: 1.98381 | loss: 1.11792| constrast_loss: 4.40013| div_loss: 0.7157| %_mask_idx: 0.37766| ppl: 181.95251| %_neg_is_pos: 0.06013| lr: 0.0| temp: 1.9838 | loss: 1.12803| constrast_loss: 4.44404| div_loss: 0.6808| %_mask_idx: 0.33192| ppl: 204.29019| %_neg_is_pos: 0.05284| lr: 0.0| temp: 1.9838 | loss: 1.11961| constrast_loss: 4.4118| div_loss: 0.66655| %_mask_idx: 0.38393| ppl: 213.40652| %_neg_is_pos: 0.03239| lr: 0.0| temp: 1.98379 | loss: 1.14431| constrast_loss: 4.51503| div_loss: 0.62192| %_mask_idx: 0.41385| ppl: 241.97021| %_neg_is_pos: 0.02159| lr: 0.0| temp: 1.98379 | loss: 1.12042| constrast_loss: 4.41078| div_loss: 0.70911| %_mask_idx: 0.36419| ppl: 186.16882| %_neg_is_pos: 0.04466| lr: 0.0| temp: 1.98378 | loss: 1.1154| constrast_loss: 4.3941| div_loss: 0.67521| %_mask_idx: 0.41118| ppl: 207.86288| %_neg_is_pos: 0.04661| lr: 0.0| temp: 1.98378 | loss: 1.1232| constrast_loss: 4.42482| div_loss: 0.67988| %_mask_idx: 0.38064| ppl: 204.87848| %_neg_is_pos: 0.03146| lr: 0.0| temp: 1.98376 | loss: 1.13412| constrast_loss: 4.4708| div_loss: 0.65673| %_mask_idx: 0.39474| ppl: 219.6929| %_neg_is_pos: 0.03652| lr: 0.0| temp: 1.98376 | loss: 1.11639| constrast_loss: 4.39459| div_loss: 0.70989| %_mask_idx: 0.35965| ppl: 185.66751| %_neg_is_pos: 0.04304| lr: 0.0| temp: 1.98375 | loss: 1.12459| constrast_loss: 4.4331| div_loss: 0.65273| %_mask_idx: 0.40962| ppl: 222.25372| %_neg_is_pos: 0.01758| lr: 0.0| temp: 1.98375 | loss: 1.13421| constrast_loss: 4.47296| div_loss: 0.63859| %_mask_idx: 0.40899| ppl: 231.30466| %_neg_is_pos: 0.02169| lr: 0.0| temp: 1.98374 | loss: 1.11815| constrast_loss: 4.40375| div_loss: 0.68837| %_mask_idx: 0.41212| ppl: 199.44339| %_neg_is_pos: 0.03753| lr: 0.0| temp: 1.98374 | loss: 1.10413| constrast_loss: 4.34702| div_loss: 0.69512| %_mask_idx: 0.40147| ppl: 195.12349| %_neg_is_pos: 0.03206| lr: 0.0| temp: 1.98373 | loss: 1.12895| constrast_loss: 4.44791| div_loss: 0.67874| %_mask_idx: 0.37093| ppl: 205.60782| %_neg_is_pos: 0.03791| lr: 0.0| temp: 1.98373 | loss: 1.12175| constrast_loss: 4.41689| div_loss: 0.70102| %_mask_idx: 0.38737| ppl: 191.34671| %_neg_is_pos: 0.04508| lr: 0.0| temp: 1.98371 | loss: 1.11044| constrast_loss: 4.37256| div_loss: 0.69178| %_mask_idx: 0.39959| ppl: 197.26022| %_neg_is_pos: 0.04765| lr: 0.0| temp: 1.98371 | loss: 1.1315| constrast_loss: 4.45753| div_loss: 0.68458| %_mask_idx: 0.3786| ppl: 201.87155| %_neg_is_pos: 0.0293| lr: 0.0| temp: 1.9837 | loss: 1.10706| constrast_loss: 4.35789| div_loss: 0.70359| %_mask_idx: 0.35934| ppl: 189.70468| %_neg_is_pos: 0.04843| lr: 0.0| temp: 1.9837 [2021-09-02 00:48:13,696] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 2.0, reducing to 1.0 [2021-09-02 00:48:13,696] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 2.0, reducing to 1.0 | loss: 1.10187| constrast_loss: 4.33871| div_loss: 0.68753| %_mask_idx: 0.42669| ppl: 199.98157| %_neg_is_pos: 0.03185| lr: 0.0| temp: 1.98368 | loss: 1.12883| constrast_loss: 4.44648| div_loss: 0.68847| %_mask_idx: 0.40226| ppl: 199.37891| %_neg_is_pos: 0.03247| lr: 0.0| temp: 1.98368 | loss: 1.12267| constrast_loss: 4.42302| div_loss: 0.67657| %_mask_idx: 0.4198| ppl: 206.9967| %_neg_is_pos: 0.01872| lr: 0.0| temp: 1.98367 | loss: 1.12211| constrast_loss: 4.41794| div_loss: 0.70502| %_mask_idx: 0.36075| ppl: 188.78735| %_neg_is_pos: 0.03399| lr: 0.0| temp: 1.98367 | loss: 1.12282| constrast_loss: 4.42419| div_loss: 0.6708| %_mask_idx: 0.36466| ppl: 210.6898| %_neg_is_pos: 0.02973| lr: 0.0| temp: 1.98366 | loss: 1.08821| constrast_loss: 4.27951| div_loss: 0.73348| %_mask_idx: 0.36435| ppl: 170.57376| %_neg_is_pos: 0.05834| lr: 0.0| temp: 1.98366 | loss: 1.1323| constrast_loss: 4.46298| div_loss: 0.66237| %_mask_idx: 0.38753| ppl: 216.08336| %_neg_is_pos: 0.0231| lr: 0.0| temp: 1.98365 | loss: 1.13642| constrast_loss: 4.48023| div_loss: 0.65447| %_mask_idx: 0.37422| ppl: 221.13666| %_neg_is_pos: 0.02842| lr: 0.0| temp: 1.98365 | loss: 1.13019| constrast_loss: 4.45389| div_loss: 0.66858| %_mask_idx: 0.39019| ppl: 212.11151| %_neg_is_pos: 0.03232| lr: 0.0| temp: 1.98363 | loss: 1.13242| constrast_loss: 4.46497| div_loss: 0.64691| %_mask_idx: 0.40523| ppl: 225.97482| %_neg_is_pos: 0.01891| lr: 0.0| temp: 1.98363 | loss: 1.12476| constrast_loss: 4.4302| div_loss: 0.68817| %_mask_idx: 0.43014| ppl: 199.57037| %_neg_is_pos: 0.02601| lr: 0.0| temp: 1.98362 | loss: 1.10953| constrast_loss: 4.36878| div_loss: 0.69358| %_mask_idx: 0.35934| ppl: 196.10638| %_neg_is_pos: 0.05077| lr: 0.0| temp: 1.98362 | loss: 1.10677| constrast_loss: 4.35767| div_loss: 0.69391| %_mask_idx: 0.3692| ppl: 195.90042| %_neg_is_pos: 0.05203| lr: 0.0| temp: 1.98361 | loss: 1.10707| constrast_loss: 4.35874| div_loss: 0.69538| %_mask_idx: 0.3739| ppl: 194.95886| %_neg_is_pos: 0.04103| lr: 0.0| temp: 1.98361 | loss: 1.12335| constrast_loss: 4.42433| div_loss: 0.69054| %_mask_idx: 0.38565| ppl: 198.05392| %_neg_is_pos: 0.02182| lr: 0.0| temp: 1.9836 | loss: 1.13105| constrast_loss: 4.45661| div_loss: 0.67592| %_mask_idx: 0.35558| ppl: 207.41243| %_neg_is_pos: 0.03753| lr: 0.0| temp: 1.9836 | loss: 1.11745| constrast_loss: 4.40179| div_loss: 0.67994| %_mask_idx: 0.37453| ppl: 204.83914| %_neg_is_pos: 0.05024| lr: 0.0| temp: 1.98358 | loss: 1.11898| constrast_loss: 4.40706| div_loss: 0.68836| %_mask_idx: 0.39552| ppl: 199.44698| %_neg_is_pos: 0.02513| lr: 0.0| temp: 1.98358 | loss: 1.1452| constrast_loss: 4.51685| div_loss: 0.6395| %_mask_idx: 0.37892| ppl: 230.71948| %_neg_is_pos: 0.00515| lr: 0.0| temp: 1.98357 | loss: 1.13156| constrast_loss: 4.46147| div_loss: 0.64776| %_mask_idx: 0.42638| ppl: 225.43317| %_neg_is_pos: 0.01142| lr: 0.0| temp: 1.98357 | loss: 1.12662| constrast_loss: 4.43802| div_loss: 0.68458| %_mask_idx: 0.41369| ppl: 201.86899| %_neg_is_pos: 0.02911| lr: 0.0| temp: 1.98356 | loss: 1.12667| constrast_loss: 4.43929| div_loss: 0.6741| %_mask_idx: 0.40398| ppl: 208.57664| %_neg_is_pos: 0.0252| lr: 0.0| temp: 1.98356 | loss: 1.12909| constrast_loss: 4.44866| div_loss: 0.67685| %_mask_idx: 0.41635| ppl: 206.8167| %_neg_is_pos: 0.03469| lr: 0.0| temp: 1.98355 | loss: 1.12309| constrast_loss: 4.42671| div_loss: 0.65638| %_mask_idx: 0.37484| ppl: 219.91991| %_neg_is_pos: 0.02353| lr: 0.0| temp: 1.98355 | loss: 1.13987| constrast_loss: 4.49334| div_loss: 0.66154| %_mask_idx: 0.43155| ppl: 216.61235| %_neg_is_pos: 0.03532| lr: 0.0| temp: 1.98353| loss: 1.13509| constrast_loss: 4.47512| div_loss: 0.65228| %_mask_idx: 0.44173| ppl: 222.54253| %_neg_is_pos: 0.01207| lr: 0.0| temp: 1.98353 | loss: 1.14322| constrast_loss: 4.50823| div_loss: 0.64671| %_mask_idx: 0.40711| ppl: 226.10631| %_neg_is_pos: 0.01519| lr: 0.0| temp: 1.98352 | loss: 1.13277| constrast_loss: 4.46754| div_loss: 0.63554| %_mask_idx: 0.41275| ppl: 233.25137| %_neg_is_pos: 0.02593| lr: 0.0| temp: 1.98352 | loss: 1.1327| constrast_loss: 4.46562| div_loss: 0.65201| %_mask_idx: 0.36263| ppl: 222.71297| %_neg_is_pos: 0.02855| lr: 0.0| temp: 1.9835 | loss: 1.1099| constrast_loss: 4.37202| div_loss: 0.67589| %_mask_idx: 0.38299| ppl: 207.43051| %_neg_is_pos: 0.04246| lr: 0.0| temp: 1.9835 | loss: 1.11356| constrast_loss: 4.38703| div_loss: 0.67227| %_mask_idx: 0.38831| ppl: 209.74988| %_neg_is_pos: 0.05232| lr: 0.0| temp: 1.98349 | loss: 1.14263| constrast_loss: 4.50597| div_loss: 0.64554| %_mask_idx: 0.36732| ppl: 226.85193| %_neg_is_pos: 0.0134| lr: 0.0| temp: 1.98349 | loss: 1.14365| constrast_loss: 4.51183| div_loss: 0.62752| %_mask_idx: 0.40758| ppl: 238.38692| %_neg_is_pos: 0.01432| lr: 0.0| temp: 1.98348 | loss: 1.11455| constrast_loss: 4.389| div_loss: 0.69185| %_mask_idx: 0.39176| ppl: 197.21329| %_neg_is_pos: 0.02881| lr: 0.0| temp: 1.98348 | loss: 1.11884| constrast_loss: 4.40697| div_loss: 0.68388| %_mask_idx: 0.3761| ppl: 202.31563| %_neg_is_pos: 0.03884| lr: 0.0| temp: 1.98347 | loss: 1.12421| constrast_loss: 4.42877| div_loss: 0.68074| %_mask_idx: 0.35432| ppl: 204.32413| %_neg_is_pos: 0.03881| lr: 0.0| temp: 1.98347 | loss: 1.10936| constrast_loss: 4.369| div_loss: 0.68453| %_mask_idx: 0.42074| ppl: 201.89912| %_neg_is_pos: 0.02592| lr: 0.0| temp: 1.98345 | loss: 1.13834| constrast_loss: 4.48802| div_loss: 0.65354| %_mask_idx: 0.43672| ppl: 221.73724| %_neg_is_pos: 0.01137| lr: 0.0| temp: 1.98345 | loss: 1.14772| constrast_loss: 4.52494| div_loss: 0.65952| %_mask_idx: 0.40883| ppl: 217.90752| %_neg_is_pos: 0.02052| lr: 0.0| temp: 1.98344 | loss: 1.12261| constrast_loss: 4.42374| div_loss: 0.66684| %_mask_idx: 0.32174| ppl: 213.22189| %_neg_is_pos: 0.04472| lr: 0.0| temp: 1.98344 | loss: 1.1143| constrast_loss: 4.38614| div_loss: 0.71051| %_mask_idx: 0.37234| ppl: 185.27084| %_neg_is_pos: 0.03885| lr: 0.0| temp: 1.98343 | loss: 1.11706| constrast_loss: 4.40105| div_loss: 0.672| %_mask_idx: 0.31093| ppl: 209.92035| %_neg_is_pos: 0.05808| lr: 0.0| temp: 1.98343 | loss: 1.10795| constrast_loss: 4.36229| div_loss: 0.69501| %_mask_idx: 0.37077| ppl: 195.19229| %_neg_is_pos: 0.03316| lr: 0.0| temp: 1.98342 | loss: 1.12128| constrast_loss: 4.41698| div_loss: 0.68133| %_mask_idx: 0.31814| ppl: 203.95081| %_neg_is_pos: 0.02255| lr: 0.0| temp: 1.98342 | loss: 1.10727| constrast_loss: 4.3604| div_loss: 0.68687| %_mask_idx: 0.37171| ppl: 200.40234| %_neg_is_pos: 0.04202| lr: 0.0| temp: 1.9834 | loss: 1.12066| constrast_loss: 4.41285| div_loss: 0.69801| %_mask_idx: 0.3916| ppl: 193.27165| %_neg_is_pos: 0.02471| lr: 0.0| temp: 1.9834 | loss: 1.12635| constrast_loss: 4.43884| div_loss: 0.66571| %_mask_idx: 0.43985| ppl: 213.94243| %_neg_is_pos: 0.01686| lr: 0.0| temp: 1.98339 | loss: 1.12313| constrast_loss: 4.42529| div_loss: 0.67247| %_mask_idx: 0.37719| ppl: 209.6218| %_neg_is_pos: 0.01767| lr: 0.0| temp: 1.98339 | loss: 1.13969| constrast_loss: 4.49312| div_loss: 0.65661| %_mask_idx: 0.43766| ppl: 219.76849| %_neg_is_pos: 0.01543| lr: 0.0| temp: 1.98338 | loss: 1.13452| constrast_loss: 4.47235| div_loss: 0.65734| %_mask_idx: 0.42591| ppl: 219.30188| %_neg_is_pos: 0.01091| lr: 0.0| temp: 1.98338 | loss: 1.11716| constrast_loss: 4.40068| div_loss: 0.67958| %_mask_idx: 0.40006| ppl: 205.06952| %_neg_is_pos: 0.03526| lr: 0.0| temp: 1.98337 | loss: 1.12022| constrast_loss: 4.41377| div_loss: 0.67108| %_mask_idx: 0.41338| ppl: 210.50688| %_neg_is_pos: 0.01916| lr: 0.0| temp: 1.98337 | loss: 1.12731| constrast_loss: 4.44181| div_loss: 0.67434| %_mask_idx: 0.43029| ppl: 208.42149| %_neg_is_pos: 0.02322| lr: 0.0| temp: 1.98335 | loss: 1.128| constrast_loss: 4.44467| div_loss: 0.67335| %_mask_idx: 0.42497| ppl: 209.05606| %_neg_is_pos: 0.02168| lr: 0.0| temp: 1.98335 | loss: 1.13694| constrast_loss: 4.48452| div_loss: 0.63255| %_mask_idx: 0.36936| ppl: 235.16658| %_neg_is_pos: 0.02742| lr: 0.0| temp: 1.98334 | loss: 1.13286| constrast_loss: 4.46535| div_loss: 0.66076| %_mask_idx: 0.40179| ppl: 217.11652| %_neg_is_pos: 0.04228| lr: 0.0| temp: 1.98334 | loss: 1.12166| constrast_loss: 4.42146| div_loss: 0.65194| %_mask_idx: 0.40445| ppl: 222.75983| %_neg_is_pos: 0.02907| lr: 0.0| temp: 1.98332 | loss: 1.11806| constrast_loss: 4.40188| div_loss: 0.70342| %_mask_idx: 0.42685| ppl: 189.81393| %_neg_is_pos: 0.02284| lr: 0.0| temp: 1.98332 | loss: 1.12213| constrast_loss: 4.42051| div_loss: 0.67991| %_mask_idx: 0.35417| ppl: 204.85797| %_neg_is_pos: 0.02015| lr: 0.0| temp: 1.98331 | loss: 1.12954| constrast_loss: 4.45229| div_loss: 0.65888| %_mask_idx: 0.4093| ppl: 218.31396| %_neg_is_pos: 0.03085| lr: 0.0| temp: 1.98331 | loss: 1.12174| constrast_loss: 4.41965| div_loss: 0.67311| %_mask_idx: 0.36513| ppl: 209.21265| %_neg_is_pos: 0.04481| lr: 0.0| temp: 1.9833 | loss: 1.1351| constrast_loss: 4.47406| div_loss: 0.66341| %_mask_idx: 0.43405| ppl: 215.41978| %_neg_is_pos: 0.01369| lr: 0.0| temp: 1.9833 | loss: 1.13848| constrast_loss: 4.48673| div_loss: 0.67203| %_mask_idx: 0.41385| ppl: 209.89948| %_neg_is_pos: 0.01632| lr: 0.0| temp: 1.98329 | loss: 1.12537| constrast_loss: 4.43426| div_loss: 0.67223| %_mask_idx: 0.41526| ppl: 209.77136| %_neg_is_pos: 0.02707| lr: 0.0| temp: 1.98329 | loss: 1.09808| constrast_loss: 4.32094| div_loss: 0.71398| %_mask_idx: 0.39489| ppl: 183.05286| %_neg_is_pos: 0.03911| lr: 0.0| temp: 1.98327 | loss: 1.1285| constrast_loss: 4.45055| div_loss: 0.63467| %_mask_idx: 0.37657| ppl: 233.81049| %_neg_is_pos: 0.02886| lr: 0.0| temp: 1.98327 | loss: 1.10454| constrast_loss: 4.34815| div_loss: 0.69996| %_mask_idx: 0.41698| ppl: 192.02782| %_neg_is_pos: 0.0225| lr: 0.0| temp: 1.98326 | loss: 1.11982| constrast_loss: 4.41142| div_loss: 0.67858| %_mask_idx: 0.37516| ppl: 205.71039| %_neg_is_pos: 0.0382| lr: 0.0| temp: 1.98326 | loss: 1.10505| constrast_loss: 4.35283| div_loss: 0.67381| %_mask_idx: 0.37876| ppl: 208.76299| %_neg_is_pos: 0.03666| lr: 0.0| temp: 1.98325 | loss: 1.12884| constrast_loss: 4.45028| div_loss: 0.6508| %_mask_idx: 0.4162| ppl: 223.48721| %_neg_is_pos: 0.01884| lr: 0.0| temp: 1.98325 | loss: 1.14273| constrast_loss: 4.50624| div_loss: 0.64695| %_mask_idx: 0.45692| ppl: 225.95407| %_neg_is_pos: 0.01468| lr: 0.0| temp: 1.98324 | loss: 1.12255| constrast_loss: 4.42347| div_loss: 0.6674| %_mask_idx: 0.44079| ppl: 212.86345| %_neg_is_pos: 0.01859| lr: 0.0| temp: 1.98324 | loss: 1.12679| constrast_loss: 4.442| div_loss: 0.65174| %_mask_idx: 0.37954| ppl: 222.88962| %_neg_is_pos: 0.02998| lr: 0.0| temp: 1.98322 | loss: 1.11161| constrast_loss: 4.37631| div_loss: 0.70139| %_mask_idx: 0.39239| ppl: 191.11136| %_neg_is_pos: 0.03623| lr: 0.0| temp: 1.98322 | loss: 1.12161| constrast_loss: 4.41875| div_loss: 0.67702| %_mask_idx: 0.35025| ppl: 206.70584| %_neg_is_pos: 0.03374| lr: 0.0| temp: 1.98321 | loss: 1.132| constrast_loss: 4.46257| div_loss: 0.65441| %_mask_idx: 0.43452| ppl: 221.17975| %_neg_is_pos: 0.01154| lr: 0.0| temp: 1.98321 | loss: 1.12689| constrast_loss: 4.44199| div_loss: 0.65584| %_mask_idx: 0.43296| ppl: 220.26044| %_neg_is_pos: 0.03492| lr: 0.0| temp: 1.9832 | loss: 1.12182| constrast_loss: 4.41793| div_loss: 0.69357| %_mask_idx: 0.40852| ppl: 196.11404| %_neg_is_pos: 0.02359| lr: 0.0| temp: 1.9832 | loss: 1.12803| constrast_loss: 4.44628| div_loss: 0.65831| %_mask_idx: 0.42011| ppl: 218.68066| %_neg_is_pos: 0.01791| lr: 0.0| temp: 1.98319 | loss: 1.11808| constrast_loss: 4.40516| div_loss: 0.67159| %_mask_idx: 0.39176| ppl: 210.18045| %_neg_is_pos: 0.02447| lr: 0.0| temp: 1.98319 | loss: 1.13161| constrast_loss: 4.45957| div_loss: 0.66889| %_mask_idx: 0.4339| ppl: 211.91132| %_neg_is_pos: 0.0071| lr: 0.0| temp: 1.98317 | loss: 1.13508| constrast_loss: 4.47588| div_loss: 0.64433| %_mask_idx: 0.39693| ppl: 227.62891| %_neg_is_pos: 0.02367| lr: 0.0| temp: 1.98317 | loss: 1.12869| constrast_loss: 4.44903| div_loss: 0.65719| %_mask_idx: 0.41306| ppl: 219.39932| %_neg_is_pos: 0.01996| lr: 0.0| temp: 1.98316 | loss: 1.11086| constrast_loss: 4.37377| div_loss: 0.69664| %_mask_idx: 0.37453| ppl: 194.1526| %_neg_is_pos: 0.05844| lr: 0.0| temp: 1.98316 | loss: 1.13068| constrast_loss: 4.45711| div_loss: 0.65604| %_mask_idx: 0.3349| ppl: 220.13272| %_neg_is_pos: 0.02487| lr: 0.0| temp: 1.98314 | loss: 1.1222| constrast_loss: 4.42173| div_loss: 0.67089| %_mask_idx: 0.37265| ppl: 210.62802| %_neg_is_pos: 0.02685| lr: 0.0| temp: 1.98314 | loss: 1.11107| constrast_loss: 4.37599| div_loss: 0.68285| %_mask_idx: 0.37516| ppl: 202.97356| %_neg_is_pos: 0.03909| lr: 0.0| temp: 1.98313 | loss: 1.12972| constrast_loss: 4.45307| div_loss: 0.65796| %_mask_idx: 0.36529| ppl: 218.90552| %_neg_is_pos: 0.02604| lr: 0.0| temp: 1.98313 | loss: 1.13196| constrast_loss: 4.46317| div_loss: 0.64689| %_mask_idx: 0.44846| ppl: 225.98724| %_neg_is_pos: 0.02319| lr: 0.0| temp: 1.98312 | loss: 1.14333| constrast_loss: 4.50861| div_loss: 0.64688| %_mask_idx: 0.37892| ppl: 225.99643| %_neg_is_pos: 0.01815| lr: 0.0| temp: 1.98312 | loss: 1.12497| constrast_loss: 4.43546| div_loss: 0.64418| %_mask_idx: 0.41024| ppl: 227.72357| %_neg_is_pos: 0.02191| lr: 0.0| temp: 1.98311 | loss: 1.12416| constrast_loss: 4.42824| div_loss: 0.68395| %_mask_idx: 0.40304| ppl: 202.27077| %_neg_is_pos: 0.0197| lr: 0.0| temp: 1.98311 | loss: 1.12786| constrast_loss: 4.44521| div_loss: 0.66237| %_mask_idx: 0.3869| ppl: 216.0842| %_neg_is_pos: 0.01889| lr: 0.0| temp: 1.98309 | loss: 1.13048| constrast_loss: 4.45614| div_loss: 0.65782| %_mask_idx: 0.36717| ppl: 218.99683| %_neg_is_pos: 0.02529| lr: 0.0| temp: 1.98309 | loss: 1.12012| constrast_loss: 4.41482| div_loss: 0.6565| %_mask_idx: 0.39035| ppl: 219.84189| %_neg_is_pos: 0.02583| lr: 0.0| temp: 1.98308 | loss: 1.10821| constrast_loss: 4.36312| div_loss: 0.69711| %_mask_idx: 0.35041| ppl: 193.84927| %_neg_is_pos: 0.04691| lr: 0.0| temp: 1.98308 | loss: 1.12335| constrast_loss: 4.4271| div_loss: 0.66298| %_mask_idx: 0.43797| ppl: 215.69345| %_neg_is_pos: 0.01998| lr: 0.0| temp: 1.98307 | loss: 1.13999| constrast_loss: 4.49691| div_loss: 0.6304| %_mask_idx: 0.39709| ppl: 236.5444| %_neg_is_pos: 0.01851| lr: 0.0| temp: 1.98307 | loss: 1.10689| constrast_loss: 4.35948| div_loss: 0.68067| %_mask_idx: 0.37328| ppl: 204.37222| %_neg_is_pos: 0.0462| lr: 0.0| temp: 1.98306 | loss: 1.13709| constrast_loss: 4.48363| div_loss: 0.64755| %_mask_idx: 0.34743| ppl: 225.57086| %_neg_is_pos: 0.02097| lr: 0.0| temp: 1.98306 | loss: 1.11969| constrast_loss: 4.41159| div_loss: 0.67177| %_mask_idx: 0.41103| ppl: 210.06598| %_neg_is_pos: 0.01786| lr: 0.0| temp: 1.98304 | loss: 1.11793| constrast_loss: 4.40572| div_loss: 0.65981| %_mask_idx: 0.4364| ppl: 217.71973| %_neg_is_pos: 0.01643| lr: 0.0| temp: 1.98304 | loss: 1.11158| constrast_loss: 4.37696| div_loss: 0.69381| %_mask_idx: 0.41134| ppl: 195.95905| %_neg_is_pos: 0.03734| lr: 0.0| temp: 1.98303 | loss: 1.12615| constrast_loss: 4.4382| div_loss: 0.66396| %_mask_idx: 0.41306| ppl: 215.06732| %_neg_is_pos: 0.0211| lr: 0.0| temp: 1.98303 | loss: 1.12472| constrast_loss: 4.43212| div_loss: 0.66742| %_mask_idx: 0.39113| ppl: 212.85425| %_neg_is_pos: 0.01729| lr: 0.0| temp: 1.98302 | loss: 1.11362| constrast_loss: 4.38698| div_loss: 0.67483| %_mask_idx: 0.43515| ppl: 208.10638| %_neg_is_pos: 0.02488| lr: 0.0| temp: 1.98302 | loss: 1.11748| constrast_loss: 4.39911| div_loss: 0.70799| %_mask_idx: 0.33709| ppl: 186.8851| %_neg_is_pos: 0.06088| lr: 0.0| temp: 1.98301 | loss: 1.12546| constrast_loss: 4.4349| div_loss: 0.66925| %_mask_idx: 0.45708| ppl: 211.67941| %_neg_is_pos: 0.01696| lr: 0.0| temp: 1.98301 | loss: 1.13477| constrast_loss: 4.47599| div_loss: 0.6309| %_mask_idx: 0.38315| ppl: 236.22433| %_neg_is_pos: 0.03132| lr: 0.0| temp: 1.98299 | loss: 1.14519| constrast_loss: 4.51618| div_loss: 0.64584| %_mask_idx: 0.40711| ppl: 226.66055| %_neg_is_pos: 0.01471| lr: 0.0| temp: 1.98299 | loss: 1.12821| constrast_loss: 4.44579| div_loss: 0.67063| %_mask_idx: 0.39082| ppl: 210.79922| %_neg_is_pos: 0.04057| lr: 0.0| temp: 1.98298 | loss: 1.12253| constrast_loss: 4.42549| div_loss: 0.64631| %_mask_idx: 0.36419| ppl: 226.3587| %_neg_is_pos: 0.02259| lr: 0.0| temp: 1.98298 [2021-09-02 00:57:26,967] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1.0, reducing to 1 [2021-09-02 00:57:26,967] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1.0, reducing to 1 | loss: 1.12423| constrast_loss: 4.42796| div_loss: 0.68941| %_mask_idx: 0.38612| ppl: 198.77908| %_neg_is_pos: 0.03001| lr: 0.0| temp: 1.98296 | loss: 1.10182| constrast_loss: 4.3359| div_loss: 0.71363| %_mask_idx: 0.40147| ppl: 183.27484| %_neg_is_pos: 0.02972| lr: 0.0| temp: 1.98296 | loss: 1.1203| constrast_loss: 4.41346| div_loss: 0.67749| %_mask_idx: 0.34273| ppl: 206.40538| %_neg_is_pos: 0.0304| lr: 0.0| temp: 1.98295 | loss: 1.11819| constrast_loss: 4.40814| div_loss: 0.64609| %_mask_idx: 0.39646| ppl: 226.50426| %_neg_is_pos: 0.03784| lr: 0.0| temp: 1.98295 | loss: 1.11432| constrast_loss: 4.38518| div_loss: 0.72089| %_mask_idx: 0.38784| ppl: 178.63318| %_neg_is_pos: 0.04375| lr: 0.0| temp: 1.98294 | loss: 1.11462| constrast_loss: 4.38943| div_loss: 0.69037| %_mask_idx: 0.39364| ppl: 198.166| %_neg_is_pos: 0.02757| lr: 0.0| temp: 1.98294 | loss: 1.11286| constrast_loss: 4.38489| div_loss: 0.66535| %_mask_idx: 0.40962| ppl: 214.17612| %_neg_is_pos: 0.03313| lr: 0.0| temp: 1.98293 | loss: 1.12007| constrast_loss: 4.41381| div_loss: 0.66484| %_mask_idx: 0.37735| ppl: 214.50067| %_neg_is_pos: 0.0454| lr: 0.0| temp: 1.98293 | loss: 1.12394| constrast_loss: 4.4289| div_loss: 0.66874| %_mask_idx: 0.39301| ppl: 212.00954| %_neg_is_pos: 0.03545| lr: 0.0| temp: 1.98291 | loss: 1.11679| constrast_loss: 4.39994| div_loss: 0.67206| %_mask_idx: 0.4021| ppl: 209.88049| %_neg_is_pos: 0.03884| lr: 0.0| temp: 1.98291 | loss: 1.12921| constrast_loss: 4.45031| div_loss: 0.66512| %_mask_idx: 0.41228| ppl: 214.32416| %_neg_is_pos: 0.03092| lr: 0.0| temp: 1.9829 | loss: 1.1201| constrast_loss: 4.41391| div_loss: 0.66511| %_mask_idx: 0.38393| ppl: 214.32962| %_neg_is_pos: 0.02932| lr: 0.0| temp: 1.9829 | loss: 1.11444| constrast_loss: 4.38864| div_loss: 0.69139| %_mask_idx: 0.41557| ppl: 197.51221| %_neg_is_pos: 0.04815| lr: 0.0| temp: 1.98289 | loss: 1.14769| constrast_loss: 4.52672| div_loss: 0.6405| %_mask_idx: 0.41526| ppl: 230.08058| %_neg_is_pos: 0.03075| lr: 0.0| temp: 1.98289 | loss: 1.1078| constrast_loss: 4.36211| div_loss: 0.69086| %_mask_idx: 0.4187| ppl: 197.84952| %_neg_is_pos: 0.03532| lr: 0.0| temp: 1.98288 | loss: 1.12481| constrast_loss: 4.4308| div_loss: 0.68446| %_mask_idx: 0.38518| ppl: 201.94534| %_neg_is_pos: 0.03816| lr: 0.0| temp: 1.98288 | loss: 1.10203| constrast_loss: 4.34042| div_loss: 0.67708| %_mask_idx: 0.40899| ppl: 206.67076| %_neg_is_pos: 0.04425| lr: 0.0| temp: 1.98286| loss: 1.1241| constrast_loss: 4.42947| div_loss: 0.66928| %_mask_idx: 0.36184| ppl: 211.66159| %_neg_is_pos: 0.04486| lr: 0.0| temp: 1.98286 | loss: 1.10486| constrast_loss: 4.35099| div_loss: 0.68445| %_mask_idx: 0.37672| ppl: 201.95334| %_neg_is_pos: 0.04841| lr: 0.0| temp: 1.98285 | loss: 1.11099| constrast_loss: 4.37359| div_loss: 0.70377| %_mask_idx: 0.36059| ppl: 189.58517| %_neg_is_pos: 0.04446| lr: 0.0| temp: 1.98285 | loss: 1.13883| constrast_loss: 4.48678| div_loss: 0.68538| %_mask_idx: 0.33302| ppl: 201.35385| %_neg_is_pos: 0.04512| lr: 0.0| temp: 1.98284 | loss: 1.10486| constrast_loss: 4.34992| div_loss: 0.69522| %_mask_idx: 0.39693| ppl: 195.05737| %_neg_is_pos: 0.04678| lr: 0.0| temp: 1.98284 | loss: 1.11733| constrast_loss: 4.40372| div_loss: 0.65589| %_mask_idx: 0.38283| ppl: 220.23145| %_neg_is_pos: 0.0323| lr: 0.0| temp: 1.98283 | loss: 1.11312| constrast_loss: 4.3834| div_loss: 0.6906| %_mask_idx: 0.40915| ppl: 198.0184| %_neg_is_pos: 0.03322| lr: 0.0| temp: 1.98283 | loss: 1.11536| constrast_loss: 4.39559| div_loss: 0.65846| %_mask_idx: 0.41855| ppl: 218.58409| %_neg_is_pos: 0.02481| lr: 0.0| temp: 1.98281 | loss: 1.12593| constrast_loss: 4.43862| div_loss: 0.65121| %_mask_idx: 0.39724| ppl: 223.22484| %_neg_is_pos: 0.03943| lr: 0.0| temp: 1.98281 | loss: 1.09411| constrast_loss: 4.30711| div_loss: 0.69336| %_mask_idx: 0.37296| ppl: 196.25253| %_neg_is_pos: 0.04718| lr: 0.0| temp: 1.9828 | loss: 1.1024| constrast_loss: 4.33968| div_loss: 0.6991| %_mask_idx: 0.39004| ppl: 192.57629| %_neg_is_pos: 0.03429| lr: 0.0| temp: 1.9828 | loss: 1.12875| constrast_loss: 4.44946| div_loss: 0.6556| %_mask_idx: 0.38111| ppl: 220.41789| %_neg_is_pos: 0.02839| lr: 0.0| temp: 1.98278 | loss: 1.14531| constrast_loss: 4.51504| div_loss: 0.66202| %_mask_idx: 0.38346| ppl: 216.31021| %_neg_is_pos: 0.02137| lr: 0.0| temp: 1.98278 | loss: 1.11751| constrast_loss: 4.40302| div_loss: 0.6701| %_mask_idx: 0.4256| ppl: 211.13504| %_neg_is_pos: 0.02641| lr: 0.0| temp: 1.98277 | loss: 1.11025| constrast_loss: 4.37435| div_loss: 0.66655| %_mask_idx: 0.40899| ppl: 213.40662| %_neg_is_pos: 0.04049| lr: 0.0| temp: 1.98277 | loss: 1.13483| constrast_loss: 4.47599| div_loss: 0.63325| %_mask_idx: 0.42356| ppl: 234.71727| %_neg_is_pos: 0.01601| lr: 0.0| temp: 1.98276 | loss: 1.11664| constrast_loss: 4.40053| div_loss: 0.66039| %_mask_idx: 0.38393| ppl: 217.35114| %_neg_is_pos: 0.02924| lr: 0.0| temp: 1.98276 | loss: 1.09998| constrast_loss: 4.32989| div_loss: 0.70025| %_mask_idx: 0.36059| ppl: 191.84302| %_neg_is_pos: 0.05566| lr: 0.0| temp: 1.98275 | loss: 1.09925| constrast_loss: 4.3317| div_loss: 0.6531| %_mask_idx: 0.40492| ppl: 222.01649| %_neg_is_pos: 0.02565| lr: 0.0| temp: 1.98275 | loss: 1.11885| constrast_loss: 4.40758| div_loss: 0.67831| %_mask_idx: 0.36544| ppl: 205.87961| %_neg_is_pos: 0.04333| lr: 0.0| temp: 1.98273 | loss: 1.1266| constrast_loss: 4.43839| div_loss: 0.67994| %_mask_idx: 0.39709| ppl: 204.83826| %_neg_is_pos: 0.03801| lr: 0.0| temp: 1.98273 | loss: 1.12621| constrast_loss: 4.43946| div_loss: 0.65388| %_mask_idx: 0.38362| ppl: 221.51389| %_neg_is_pos: 0.03506| lr: 0.0| temp: 1.98273 | loss: 1.09985| constrast_loss: 4.33207| div_loss: 0.67327| %_mask_idx: 0.36607| ppl: 209.10529| %_neg_is_pos: 0.03626| lr: 0.0| temp: 1.98273 | loss: 1.12475| constrast_loss: 4.4348| div_loss: 0.64189| %_mask_idx: 0.36983| ppl: 229.19284| %_neg_is_pos: 0.02811| lr: 0.0| temp: 1.98272 | loss: 1.09771| constrast_loss: 4.31949| div_loss: 0.71361| %_mask_idx: 0.38409| ppl: 183.28976| %_neg_is_pos: 0.02688| lr: 0.0| temp: 1.98272 | loss: 1.13598| constrast_loss: 4.47824| div_loss: 0.65663| %_mask_idx: 0.36936| ppl: 219.75525| %_neg_is_pos: 0.01586| lr: 0.0| temp: 1.98271 | loss: 1.11385| constrast_loss: 4.38993| div_loss: 0.65459| %_mask_idx: 0.38111| ppl: 221.06149| %_neg_is_pos: 0.02705| lr: 0.0| temp: 1.98271 | loss: 1.11606| constrast_loss: 4.39683| div_loss: 0.67397| %_mask_idx: 0.36497| ppl: 208.65634| %_neg_is_pos: 0.03153| lr: 0.0| temp: 1.98269 | loss: 1.13749| constrast_loss: 4.48651| div_loss: 0.63459| %_mask_idx: 0.45865| ppl: 233.8633| %_neg_is_pos: 0.0211| lr: 0.0| temp: 1.98269 | loss: 1.12731| constrast_loss: 4.44469| div_loss: 0.64563| %_mask_idx: 0.42841| ppl: 226.79993| %_neg_is_pos: 0.02701| lr: 0.0| temp: 1.98268 | loss: 1.09112| constrast_loss: 4.29274| div_loss: 0.71758| %_mask_idx: 0.35385| ppl: 180.74731| %_neg_is_pos: 0.06987| lr: 0.0| temp: 1.98268 | loss: 1.10829| constrast_loss: 4.36461| div_loss: 0.68548| %_mask_idx: 0.36137| ppl: 201.29056| %_neg_is_pos: 0.05341| lr: 0.0| temp: 1.98267 | loss: 1.11354| constrast_loss: 4.38573| div_loss: 0.68409| %_mask_idx: 0.3844| ppl: 202.1851| %_neg_is_pos: 0.04045| lr: 0.0| temp: 1.98267 | loss: 1.12247| constrast_loss: 4.42096| div_loss: 0.68913| %_mask_idx: 0.34774| ppl: 198.95779| %_neg_is_pos: 0.02383| lr: 0.0| temp: 1.98266 | loss: 1.12254| constrast_loss: 4.42389| div_loss: 0.66274| %_mask_idx: 0.39928| ppl: 215.84865| %_neg_is_pos: 0.02702| lr: 0.0| temp: 1.98266 | loss: 1.10936| constrast_loss: 4.36759| div_loss: 0.69861| %_mask_idx: 0.37547| ppl: 192.89227| %_neg_is_pos: 0.03472| lr: 0.0| temp: 1.98264 | loss: 1.12912| constrast_loss: 4.45124| div_loss: 0.65224| %_mask_idx: 0.42027| ppl: 222.56754| %_neg_is_pos: 0.01481| lr: 0.0| temp: 1.98264 | loss: 1.11713| constrast_loss: 4.40275| div_loss: 0.65776| %_mask_idx: 0.37719| ppl: 219.03609| %_neg_is_pos: 0.02517| lr: 0.0| temp: 1.98263 | loss: 1.13921| constrast_loss: 4.49288| div_loss: 0.63961| %_mask_idx: 0.41588| ppl: 230.6489| %_neg_is_pos: 0.03112| lr: 0.0| temp: 1.98263 | loss: 1.11539| constrast_loss: 4.39377| div_loss: 0.67812| %_mask_idx: 0.36936| ppl: 206.00452| %_neg_is_pos: 0.04226| lr: 0.0| temp: 1.98261 | loss: 1.13509| constrast_loss: 4.47581| div_loss: 0.6454| %_mask_idx: 0.38737| ppl: 226.94229| %_neg_is_pos: 0.02884| lr: 0.0| temp: 1.98261 | loss: 1.12745| constrast_loss: 4.44463| div_loss: 0.65184| %_mask_idx: 0.39787| ppl: 222.82217| %_neg_is_pos: 0.0201| lr: 0.0| temp: 1.9826 | loss: 1.10713| constrast_loss: 4.35899| div_loss: 0.69533| %_mask_idx: 0.36341| ppl: 194.99187| %_neg_is_pos: 0.03454| lr: 0.0| temp: 1.9826 | loss: 1.11356| constrast_loss: 4.38622| div_loss: 0.68025| %_mask_idx: 0.41964| ppl: 204.6424| %_neg_is_pos: 0.02264| lr: 0.0| temp: 1.98259 | loss: 1.11875| constrast_loss: 4.40677| div_loss: 0.6825| %_mask_idx: 0.40132| ppl: 203.19785| %_neg_is_pos: 0.03252| lr: 0.0| temp: 1.98259 | loss: 1.11612| constrast_loss: 4.39885| div_loss: 0.65643| %_mask_idx: 0.43734| ppl: 219.88728| %_neg_is_pos: 0.02467| lr: 0.0| temp: 1.98258 | loss: 1.12715| constrast_loss: 4.44215| div_loss: 0.66466| %_mask_idx: 0.38972| ppl: 214.61829| %_neg_is_pos: 0.03294| lr: 0.0| temp: 1.98258 | loss: 1.11963| constrast_loss: 4.41093| div_loss: 0.6759| %_mask_idx: 0.35182| ppl: 207.42331| %_neg_is_pos: 0.04378| lr: 0.0| temp: 1.98256 | loss: 1.10885| constrast_loss: 4.36554| div_loss: 0.69856| %_mask_idx: 0.39709| ppl: 192.92163| %_neg_is_pos: 0.03191| lr: 0.0| temp: 1.98256 | loss: 1.11706| constrast_loss: 4.40204| div_loss: 0.66179| %_mask_idx: 0.42622| ppl: 216.45358| %_neg_is_pos: 0.01488| lr: 0.0| temp: 1.98255 | loss: 1.12098| constrast_loss: 4.41632| div_loss: 0.67582| %_mask_idx: 0.4021| ppl: 207.47218| %_neg_is_pos: 0.0365| lr: 0.0| temp: 1.98255 | loss: 1.12862| constrast_loss: 4.44788| div_loss: 0.66612| %_mask_idx: 0.42888| ppl: 213.68633| %_neg_is_pos: 0.02209| lr: 0.0| temp: 1.98254 | loss: 1.12312| constrast_loss: 4.42566| div_loss: 0.66811| %_mask_idx: 0.41369| ppl: 212.41142| %_neg_is_pos: 0.02288| lr: 0.0| temp: 1.98254 | loss: 1.12339| constrast_loss: 4.42996| div_loss: 0.63579| %_mask_idx: 0.36169| ppl: 233.09256| %_neg_is_pos: 0.05043| lr: 0.0| temp: 1.98253 | loss: 1.12254| constrast_loss: 4.42467| div_loss: 0.65487| %_mask_idx: 0.38503| ppl: 220.88235| %_neg_is_pos: 0.03284| lr: 0.0| temp: 1.98253 | loss: 1.10834| constrast_loss: 4.36398| div_loss: 0.6937| %_mask_idx: 0.36591| ppl: 196.03467| %_neg_is_pos: 0.04954| lr: 0.0| temp: 1.98251 | loss: 1.12379| constrast_loss: 4.42453| div_loss: 0.70644| %_mask_idx: 0.40774| ppl: 187.87857| %_neg_is_pos: 0.03526| lr: 0.0| temp: 1.98251 | loss: 1.10836| constrast_loss: 4.36274| div_loss: 0.70723| %_mask_idx: 0.34305| ppl: 187.37518| %_neg_is_pos: 0.04656| lr: 0.0| temp: 1.9825 | loss: 1.10818| constrast_loss: 4.36276| div_loss: 0.69957| %_mask_idx: 0.37829| ppl: 192.27521| %_neg_is_pos: 0.03696| lr: 0.0| temp: 1.9825 | loss: 1.12583| constrast_loss: 4.43631| div_loss: 0.67017| %_mask_idx: 0.40711| ppl: 211.09396| %_neg_is_pos: 0.02168| lr: 0.0| temp: 1.98249 | loss: 1.1137| constrast_loss: 4.3854| div_loss: 0.69388| %_mask_idx: 0.39301| ppl: 195.91597| %_neg_is_pos: 0.03233| lr: 0.0| temp: 1.98249 | loss: 1.13316| constrast_loss: 4.46582| div_loss: 0.66808| %_mask_idx: 0.36544| ppl: 212.42885| %_neg_is_pos: 0.00935| lr: 0.0| temp: 1.98248 | loss: 1.09659| constrast_loss: 4.318| div_loss: 0.68363| %_mask_idx: 0.40648| ppl: 202.47684| %_neg_is_pos: 0.04767| lr: 0.0| temp: 1.98248 | loss: 1.13185| constrast_loss: 4.46263| div_loss: 0.64752| %_mask_idx: 0.39897| ppl: 225.58534| %_neg_is_pos: 0.0255| lr: 0.0| temp: 1.98246 | loss: 1.1071| constrast_loss: 4.35826| div_loss: 0.70141| %_mask_idx: 0.40523| ppl: 191.09503| %_neg_is_pos: 0.0455| lr: 0.0| temp: 1.98246 | loss: 1.12974| constrast_loss: 4.45348| div_loss: 0.65482| %_mask_idx: 0.35401| ppl: 220.91666| %_neg_is_pos: 0.03036| lr: 0.0| temp: 1.98245 | loss: 1.11425| constrast_loss: 4.38777| div_loss: 0.6924| %_mask_idx: 0.35902| ppl: 196.86288| %_neg_is_pos: 0.0313| lr: 0.0| temp: 1.98245 | loss: 1.13395| constrast_loss: 4.4693| div_loss: 0.66521| %_mask_idx: 0.39756| ppl: 214.26459| %_neg_is_pos: 0.03105| lr: 0.0| temp: 1.98243 | loss: 1.10252| constrast_loss: 4.33727| div_loss: 0.7281| %_mask_idx: 0.34289| ppl: 174.0173| %_neg_is_pos: 0.05743| lr: 0.0| temp: 1.98243 | loss: 1.13543| constrast_loss: 4.47636| div_loss: 0.65343| %_mask_idx: 0.34211| ppl: 221.80464| %_neg_is_pos: 0.05005| lr: 0.0| temp: 1.98242 | loss: 1.10082| constrast_loss: 4.33172| div_loss: 0.71548| %_mask_idx: 0.41134| ppl: 182.09073| %_neg_is_pos: 0.04448| lr: 0.0| temp: 1.98242 | loss: 1.12702| constrast_loss: 4.44198| div_loss: 0.66083| %_mask_idx: 0.38221| ppl: 217.06989| %_neg_is_pos: 0.02229| lr: 0.0| temp: 1.98241 | loss: 1.11737| constrast_loss: 4.40279| div_loss: 0.66681| %_mask_idx: 0.37845| ppl: 213.23976| %_neg_is_pos: 0.0589| lr: 0.0| temp: 1.98241 | loss: 1.09837| constrast_loss: 4.32603| div_loss: 0.67468| %_mask_idx: 0.32926| ppl: 208.20258| %_neg_is_pos: 0.04869| lr: 0.0| temp: 1.9824 | loss: 1.13117| constrast_loss: 4.45853| div_loss: 0.66141| %_mask_idx: 0.46272| ppl: 216.6998| %_neg_is_pos: 0.01381| lr: 0.0| temp: 1.9824 | loss: 1.13056| constrast_loss: 4.45629| div_loss: 0.65932| %_mask_idx: 0.3869| ppl: 218.03746| %_neg_is_pos: 0.03354| lr: 0.0| temp: 1.98238 | loss: 1.12553| constrast_loss: 4.43361| div_loss: 0.68511| %_mask_idx: 0.40711| ppl: 201.52711| %_neg_is_pos: 0.02242| lr: 0.0| temp: 1.98238 | loss: 1.13118| constrast_loss: 4.45901| div_loss: 0.65705| %_mask_idx: 0.43311| ppl: 219.48856| %_neg_is_pos: 0.02509| lr: 0.0| temp: 1.98237 | loss: 1.11854| constrast_loss: 4.40694| div_loss: 0.67223| %_mask_idx: 0.36169| ppl: 209.77225| %_neg_is_pos: 0.03561| lr: 0.0| temp: 1.98237 | loss: 1.12844| constrast_loss: 4.44736| div_loss: 0.6638| %_mask_idx: 0.38142| ppl: 215.1673| %_neg_is_pos: 0.0294| lr: 0.0| temp: 1.98236 | loss: 1.13188| constrast_loss: 4.46393| div_loss: 0.63602| %_mask_idx: 0.37547| ppl: 232.94719| %_neg_is_pos: 0.02566| lr: 0.0| temp: 1.98236 | loss: 1.11565| constrast_loss: 4.39402| div_loss: 0.68595| %_mask_idx: 0.34539| ppl: 200.99467| %_neg_is_pos: 0.03407| lr: 0.0| temp: 1.98235 | loss: 1.11091| constrast_loss: 4.37391| div_loss: 0.69721| %_mask_idx: 0.40351| ppl: 193.78687| %_neg_is_pos: 0.03743| lr: 0.0| temp: 1.98235 | loss: 1.11301| constrast_loss: 4.38242| div_loss: 0.69631| %_mask_idx: 0.44001| ppl: 194.36436| %_neg_is_pos: 0.0334| lr: 0.0| temp: 1.98233 | loss: 1.12343| constrast_loss: 4.42792| div_loss: 0.65811| %_mask_idx: 0.34211| ppl: 218.80899| %_neg_is_pos: 0.04087| lr: 0.0| temp: 1.98233 | loss: 1.11228| constrast_loss: 4.38062| div_loss: 0.68503| %_mask_idx: 0.38518| ppl: 201.58023| %_neg_is_pos: 0.02438| lr: 0.0| temp: 1.98232 | loss: 1.14285| constrast_loss: 4.50531| div_loss: 0.66081| %_mask_idx: 0.39756| ppl: 217.08276| %_neg_is_pos: 0.02951| lr: 0.0| temp: 1.98232 | loss: 1.12268| constrast_loss: 4.42549| div_loss: 0.65227| %_mask_idx: 0.40836| ppl: 222.54478| %_neg_is_pos: 0.01988| lr: 0.0| temp: 1.98231 | loss: 1.10868| constrast_loss: 4.36539| div_loss: 0.69343| %_mask_idx: 0.36513| ppl: 196.20383| %_neg_is_pos: 0.06038| lr: 0.0| temp: 1.98231 | loss: 1.10876| constrast_loss: 4.36875| div_loss: 0.66291| %_mask_idx: 0.42403| ppl: 215.73715| %_neg_is_pos: 0.03135| lr: 0.0| temp: 1.9823 | loss: 1.1065| constrast_loss: 4.35453| div_loss: 0.71453| %_mask_idx: 0.37014| ppl: 182.69885| %_neg_is_pos: 0.04616| lr: 0.0| temp: 1.9823 | loss: 1.11893| constrast_loss: 4.40916| div_loss: 0.66556| %_mask_idx: 0.36263| ppl: 214.04256| %_neg_is_pos: 0.04477| lr: 0.0| temp: 1.98228 | loss: 1.12546| constrast_loss: 4.43625| div_loss: 0.65588| %_mask_idx: 0.37155| ppl: 220.23956| %_neg_is_pos: 0.02329| lr: 0.0| temp: 1.98228 | loss: 1.11566| constrast_loss: 4.39705| div_loss: 0.65578| %_mask_idx: 0.37218| ppl: 220.30304| %_neg_is_pos: 0.05357| lr: 0.0| temp: 1.98227 | loss: 1.13642| constrast_loss: 4.48254| div_loss: 0.63119| %_mask_idx: 0.3985| ppl: 236.03708| %_neg_is_pos: 0.02478| lr: 0.0| temp: 1.98227 [2021-09-02 01:06:39,530] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 01:06:39,530] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.14242| constrast_loss: 4.50045| div_loss: 0.69233| %_mask_idx: 0.3833| ppl: 196.90973| %_neg_is_pos: 0.02307| lr: 0.0| temp: 1.98225 | loss: 1.11337| constrast_loss: 4.38344| div_loss: 0.70052| %_mask_idx: 0.40883| ppl: 191.66483| %_neg_is_pos: 0.02664| lr: 0.0| temp: 1.98225 | loss: 1.12569| constrast_loss: 4.43396| div_loss: 0.688| %_mask_idx: 0.40398| ppl: 199.67853| %_neg_is_pos: 0.02773| lr: 0.0| temp: 1.98224 | loss: 1.10909| constrast_loss: 4.36598| div_loss: 0.70399| %_mask_idx: 0.42857| ppl: 189.44618| %_neg_is_pos: 0.04366| lr: 0.0| temp: 1.98224 | loss: 1.12968| constrast_loss: 4.45176| div_loss: 0.66951| %_mask_idx: 0.32425| ppl: 211.51089| %_neg_is_pos: 0.05319| lr: 0.0| temp: 1.98223 | loss: 1.12301| constrast_loss: 4.42557| div_loss: 0.66463| %_mask_idx: 0.39192| ppl: 214.63406| %_neg_is_pos: 0.02823| lr: 0.0| temp: 1.98223 | loss: 1.12359| constrast_loss: 4.42798| div_loss: 0.66392| %_mask_idx: 0.40727| ppl: 215.09048| %_neg_is_pos: 0.02385| lr: 0.0| temp: 1.98222 | loss: 1.10816| constrast_loss: 4.3633| div_loss: 0.69331| %_mask_idx: 0.40586| ppl: 196.27991| %_neg_is_pos: 0.031| lr: 0.0| temp: 1.98222 | loss: 1.11989| constrast_loss: 4.41478| div_loss: 0.64801| %_mask_idx: 0.40633| ppl: 225.27127| %_neg_is_pos: 0.0354| lr: 0.0| temp: 1.9822 | loss: 1.13058| constrast_loss: 4.45577| div_loss: 0.66566| %_mask_idx: 0.39395| ppl: 213.97858| %_neg_is_pos: 0.02299| lr: 0.0| temp: 1.9822 | loss: 1.12567| constrast_loss: 4.43421| div_loss: 0.68483| %_mask_idx: 0.39035| ppl: 201.70631| %_neg_is_pos: 0.02743| lr: 0.0| temp: 1.98219 | loss: 1.09465| constrast_loss: 4.30971| div_loss: 0.68905| %_mask_idx: 0.40868| ppl: 199.00743| %_neg_is_pos: 0.03153| lr: 0.0| temp: 1.98219 | loss: 1.12589| constrast_loss: 4.43727| div_loss: 0.66277| %_mask_idx: 0.41557| ppl: 215.82443| %_neg_is_pos: 0.04713| lr: 0.0| temp: 1.98218 | loss: 1.12568| constrast_loss: 4.43432| div_loss: 0.68417| %_mask_idx: 0.42747| ppl: 202.12941| %_neg_is_pos: 0.01832| lr: 0.0| temp: 1.98218 | loss: 1.12053| constrast_loss: 4.41338| div_loss: 0.68754| %_mask_idx: 0.32299| ppl: 199.97124| %_neg_is_pos: 0.05383| lr: 0.0| temp: 1.98217 | loss: 1.10414| constrast_loss: 4.34856| div_loss: 0.68007| %_mask_idx: 0.38722| ppl: 204.75726| %_neg_is_pos: 0.04133| lr: 0.0| temp: 1.98217 | loss: 1.11172| constrast_loss: 4.37904| div_loss: 0.67848| %_mask_idx: 0.39239| ppl: 205.77063| %_neg_is_pos: 0.0438| lr: 0.0| temp: 1.98215 | loss: 1.12613| constrast_loss: 4.43838| div_loss: 0.66154| %_mask_idx: 0.44862| ppl: 216.61438| %_neg_is_pos: 0.03141| lr: 0.0| temp: 1.98215 | loss: 1.13096| constrast_loss: 4.46096| div_loss: 0.62866| %_mask_idx: 0.43045| ppl: 237.65579| %_neg_is_pos: 0.01785| lr: 0.0| temp: 1.98214 | loss: 1.13405| constrast_loss: 4.47345| div_loss: 0.62752| %_mask_idx: 0.41259| ppl: 238.38403| %_neg_is_pos: 0.00848| lr: 0.0| temp: 1.98214 | loss: 1.10822| constrast_loss: 4.3618| div_loss: 0.71091| %_mask_idx: 0.34445| ppl: 185.02057| %_neg_is_pos: 0.04699| lr: 0.0| temp: 1.98213 | loss: 1.13786| constrast_loss: 4.48907| div_loss: 0.62392| %_mask_idx: 0.40053| ppl: 240.69214| %_neg_is_pos: 0.0169| lr: 0.0| temp: 1.98213 | loss: 1.13562| constrast_loss: 4.4793| div_loss: 0.63166| %_mask_idx: 0.39145| ppl: 235.73816| %_neg_is_pos: 0.02254| lr: 0.0| temp: 1.98212 | loss: 1.13076| constrast_loss: 4.45935| div_loss: 0.63693| %_mask_idx: 0.39646| ppl: 232.36523| %_neg_is_pos: 0.02516| lr: 0.0| temp: 1.98212 | loss: 1.11017| constrast_loss: 4.37405| div_loss: 0.66614| %_mask_idx: 0.36043| ppl: 213.67313| %_neg_is_pos: 0.02474| lr: 0.0| temp: 1.9821 | loss: 1.10523| constrast_loss: 4.35405| div_loss: 0.66858| %_mask_idx: 0.36341| ppl: 212.10616| %_neg_is_pos: 0.0582| lr: 0.0| temp: 1.9821 | loss: 1.10163| constrast_loss: 4.33932| div_loss: 0.67213| %_mask_idx: 0.39489| ppl: 209.83701| %_neg_is_pos: 0.04123| lr: 0.0| temp: 1.98209 | loss: 1.10733| constrast_loss: 4.3596| div_loss: 0.69725| %_mask_idx: 0.39301| ppl: 193.76253| %_neg_is_pos: 0.049| lr: 0.0| temp: 1.98209 | loss: 1.1251| constrast_loss: 4.43272| div_loss: 0.67676| %_mask_idx: 0.42575| ppl: 206.87604| %_neg_is_pos: 0.03448| lr: 0.0| temp: 1.98207 | loss: 1.11542| constrast_loss: 4.39867| div_loss: 0.63016| %_mask_idx: 0.35464| ppl: 236.69977| %_neg_is_pos: 0.02927| lr: 0.0| temp: 1.98207 | loss: 1.10766| constrast_loss: 4.36298| div_loss: 0.67644| %_mask_idx: 0.39489| ppl: 207.0788| %_neg_is_pos: 0.04549| lr: 0.0| temp: 1.98206 | loss: 1.10051| constrast_loss: 4.33355| div_loss: 0.68509| %_mask_idx: 0.38487| ppl: 201.54401| %_neg_is_pos: 0.04124| lr: 0.0| temp: 1.98206 | loss: 1.11708| constrast_loss: 4.40175| div_loss: 0.66562| %_mask_idx: 0.39599| ppl: 214.00549| %_neg_is_pos: 0.03957| lr: 0.0| temp: 1.98205 | loss: 1.09568| constrast_loss: 4.31395| div_loss: 0.68754| %_mask_idx: 0.40648| ppl: 199.9745| %_neg_is_pos: 0.05135| lr: 0.0| temp: 1.98205 | loss: 1.14737| constrast_loss: 4.52692| div_loss: 0.62558| %_mask_idx: 0.39881| ppl: 239.62991| %_neg_is_pos: 0.02969| lr: 0.0| temp: 1.98204 | loss: 1.07851| constrast_loss: 4.24278| div_loss: 0.71268| %_mask_idx: 0.3833| ppl: 183.88745| %_neg_is_pos: 0.04972| lr: 0.0| temp: 1.98204 | loss: 1.11817| constrast_loss: 4.40753| div_loss: 0.65169| %_mask_idx: 0.4364| ppl: 222.91861| %_neg_is_pos: 0.01989| lr: 0.0| temp: 1.98202 | loss: 1.13121| constrast_loss: 4.46205| div_loss: 0.62782| %_mask_idx: 0.33615| ppl: 238.19754| %_neg_is_pos: 0.02745| lr: 0.0| temp: 1.98202 | loss: 1.10568| constrast_loss: 4.35348| div_loss: 0.69259| %_mask_idx: 0.40633| ppl: 196.74503| %_neg_is_pos: 0.04385| lr: 0.0| temp: 1.98201 | loss: 1.12263| constrast_loss: 4.42586| div_loss: 0.64654| %_mask_idx: 0.35182| ppl: 226.21123| %_neg_is_pos: 0.04385| lr: 0.0| temp: 1.98201 | loss: 1.11008| constrast_loss: 4.37027| div_loss: 0.70039| %_mask_idx: 0.41118| ppl: 191.7518| %_neg_is_pos: 0.04583| lr: 0.0| temp: 1.982 | loss: 1.11207| constrast_loss: 4.38127| div_loss: 0.6699| %_mask_idx: 0.36012| ppl: 211.2654| %_neg_is_pos: 0.04053| lr: 0.0| temp: 1.982 | loss: 1.11831| constrast_loss: 4.40849| div_loss: 0.64752| %_mask_idx: 0.36482| ppl: 225.58728| %_neg_is_pos: 0.03632| lr: 0.0| temp: 1.98199 | loss: 1.10345| constrast_loss: 4.34647| div_loss: 0.67335| %_mask_idx: 0.38878| ppl: 209.05753| %_neg_is_pos: 0.03283| lr: 0.0| temp: 1.98199 | loss: 1.11879| constrast_loss: 4.40911| div_loss: 0.66058| %_mask_idx: 0.41244| ppl: 217.22917| %_neg_is_pos: 0.03089| lr: 0.0| temp: 1.98197 | loss: 1.14103| constrast_loss: 4.49902| div_loss: 0.65102| %_mask_idx: 0.42419| ppl: 223.34708| %_neg_is_pos: 0.02908| lr: 0.0| temp: 1.98197 | loss: 1.10409| constrast_loss: 4.35016| div_loss: 0.66201| %_mask_idx: 0.36826| ppl: 216.31154| %_neg_is_pos: 0.05148| lr: 0.0| temp: 1.98196 | loss: 1.12642| constrast_loss: 4.44107| div_loss: 0.64595| %_mask_idx: 0.42779| ppl: 226.5929| %_neg_is_pos: 0.02294| lr: 0.0| temp: 1.98196 | loss: 1.12271| constrast_loss: 4.42269| div_loss: 0.68155| %_mask_idx: 0.48105| ppl: 203.81116| %_neg_is_pos: 0.01394| lr: 0.0| temp: 1.98195 | loss: 1.11101| constrast_loss: 4.37822| div_loss: 0.65828| %_mask_idx: 0.42027| ppl: 218.7023| %_neg_is_pos: 0.0218| lr: 0.0| temp: 1.98195 | loss: 1.12855| constrast_loss: 4.45149| div_loss: 0.62694| %_mask_idx: 0.43014| ppl: 238.75964| %_neg_is_pos: 0.02524| lr: 0.0| temp: 1.98194 | loss: 1.10728| constrast_loss: 4.36034| div_loss: 0.6877| %_mask_idx: 0.39066| ppl: 199.87253| %_neg_is_pos: 0.03089| lr: 0.0| temp: 1.98194 | loss: 1.13253| constrast_loss: 4.46704| div_loss: 0.63094| %_mask_idx: 0.42747| ppl: 236.19919| %_neg_is_pos: 0.02194| lr: 0.0| temp: 1.98192 | loss: 1.12192| constrast_loss: 4.42118| div_loss: 0.66512| %_mask_idx: 0.40116| ppl: 214.32114| %_neg_is_pos: 0.04346| lr: 0.0| temp: 1.98192 | loss: 1.09881| constrast_loss: 4.32829| div_loss: 0.6696| %_mask_idx: 0.40805| ppl: 211.4577| %_neg_is_pos: 0.04421| lr: 0.0| temp: 1.98191 | loss: 1.11348| constrast_loss: 4.38877| div_loss: 0.65168| %_mask_idx: 0.40132| ppl: 222.9278| %_neg_is_pos: 0.02252| lr: 0.0| temp: 1.98191 | loss: 1.09615| constrast_loss: 4.31397| div_loss: 0.70628| %_mask_idx: 0.34727| ppl: 187.97784| %_neg_is_pos: 0.04294| lr: 0.0| temp: 1.98189 | loss: 1.11632| constrast_loss: 4.39843| div_loss: 0.66852| %_mask_idx: 0.42982| ppl: 212.14809| %_neg_is_pos: 0.02835| lr: 0.0| temp: 1.98189 | loss: 1.11828| constrast_loss: 4.4067| div_loss: 0.66404| %_mask_idx: 0.39709| ppl: 215.01663| %_neg_is_pos: 0.02702| lr: 0.0| temp: 1.98188 | loss: 1.10378| constrast_loss: 4.34982| div_loss: 0.65321| %_mask_idx: 0.3761| ppl: 221.94443| %_neg_is_pos: 0.03894| lr: 0.0| temp: 1.98188 | loss: 1.1048| constrast_loss: 4.34736| div_loss: 0.71864| %_mask_idx: 0.375| ppl: 180.07111| %_neg_is_pos: 0.06583| lr: 0.0| temp: 1.98187 | loss: 1.12555| constrast_loss: 4.43571| div_loss: 0.66479| %_mask_idx: 0.36043| ppl: 214.53696| %_neg_is_pos: 0.05111| lr: 0.0| temp: 1.98187 | loss: 1.12394| constrast_loss: 4.4317| div_loss: 0.64062| %_mask_idx: 0.40758| ppl: 230.0033| %_neg_is_pos: 0.03088| lr: 0.0| temp: 1.98186 | loss: 1.12631| constrast_loss: 4.43828| div_loss: 0.66955| %_mask_idx: 0.37563| ppl: 211.48953| %_neg_is_pos: 0.02409| lr: 0.0| temp: 1.98186 | loss: 1.12727| constrast_loss: 4.44313| div_loss: 0.65954| %_mask_idx: 0.34211| ppl: 217.89282| %_neg_is_pos: 0.04211| lr: 0.0| temp: 1.98184 | loss: 1.11904| constrast_loss: 4.40856| div_loss: 0.67587| %_mask_idx: 0.33474| ppl: 207.44284| %_neg_is_pos: 0.0393| lr: 0.0| temp: 1.98184 | loss: 1.12128| constrast_loss: 4.42076| div_loss: 0.64365| %_mask_idx: 0.36685| ppl: 228.06665| %_neg_is_pos: 0.03642| lr: 0.0| temp: 1.98183 | loss: 1.12294| constrast_loss: 4.4226| div_loss: 0.69176| %_mask_idx: 0.39975| ppl: 197.27136| %_neg_is_pos: 0.02753| lr: 0.0| temp: 1.98183 | loss: 1.11862| constrast_loss: 4.40806| div_loss: 0.66413| %_mask_idx: 0.35996| ppl: 214.95633| %_neg_is_pos: 0.03376| lr: 0.0| temp: 1.98182 | loss: 1.07882| constrast_loss: 4.24375| div_loss: 0.71511| %_mask_idx: 0.36999| ppl: 182.32678| %_neg_is_pos: 0.04582| lr: 0.0| temp: 1.98182 | loss: 1.09825| constrast_loss: 4.32445| div_loss: 0.68554| %_mask_idx: 0.42701| ppl: 201.25262| %_neg_is_pos: 0.02773| lr: 0.0| temp: 1.98181 | loss: 1.12376| constrast_loss: 4.42892| div_loss: 0.66115| %_mask_idx: 0.37093| ppl: 216.86716| %_neg_is_pos: 0.02803| lr: 0.0| temp: 1.98181 | loss: 1.10284| constrast_loss: 4.3408| div_loss: 0.70575| %_mask_idx: 0.37798| ppl: 188.32219| %_neg_is_pos: 0.04092| lr: 0.0| temp: 1.98179 | loss: 1.09502| constrast_loss: 4.31246| div_loss: 0.6762| %_mask_idx: 0.32127| ppl: 207.23016| %_neg_is_pos: 0.04614| lr: 0.0| temp: 1.98179 | loss: 1.11812| constrast_loss: 4.40523| div_loss: 0.67258| %_mask_idx: 0.40038| ppl: 209.54623| %_neg_is_pos: 0.01949| lr: 0.0| temp: 1.98178 | loss: 1.10691| constrast_loss: 4.36077| div_loss: 0.66887| %_mask_idx: 0.37014| ppl: 211.92325| %_neg_is_pos: 0.04218| lr: 0.0| temp: 1.98178 | loss: 1.12029| constrast_loss: 4.41362| div_loss: 0.67525| %_mask_idx: 0.3338| ppl: 207.83856| %_neg_is_pos: 0.05122| lr: 0.0| temp: 1.98177 | loss: 1.12757| constrast_loss: 4.44651| div_loss: 0.63775| %_mask_idx: 0.45332| ppl: 231.83829| %_neg_is_pos: 0.01601| lr: 0.0| temp: 1.98177 | loss: 1.09902| constrast_loss: 4.32769| div_loss: 0.68386| %_mask_idx: 0.37171| ppl: 202.33139| %_neg_is_pos: 0.04461| lr: 0.0| temp: 1.98176 | loss: 1.09256| constrast_loss: 4.29966| div_loss: 0.70569| %_mask_idx: 0.3786| ppl: 188.35822| %_neg_is_pos: 0.04536| lr: 0.0| temp: 1.98176 | loss: 1.11293| constrast_loss: 4.38474| div_loss: 0.66981| %_mask_idx: 0.38784| ppl: 211.32047| %_neg_is_pos: 0.04874| lr: 0.0| temp: 1.98174 | loss: 1.10162| constrast_loss: 4.33872| div_loss: 0.67748| %_mask_idx: 0.35542| ppl: 206.40985| %_neg_is_pos: 0.03391| lr: 0.0| temp: 1.98174 | loss: 1.12573| constrast_loss: 4.43637| div_loss: 0.66544| %_mask_idx: 0.38612| ppl: 214.11578| %_neg_is_pos: 0.0309| lr: 0.0| temp: 1.98173 | loss: 1.12915| constrast_loss: 4.45219| div_loss: 0.64411| %_mask_idx: 0.43311| ppl: 227.77063| %_neg_is_pos: 0.02193| lr: 0.0| temp: 1.98173 | loss: 1.10472| constrast_loss: 4.35001| div_loss: 0.68859| %_mask_idx: 0.39834| ppl: 199.30063| %_neg_is_pos: 0.01958| lr: 0.0| temp: 1.98171 | loss: 1.12866| constrast_loss: 4.44845| div_loss: 0.66205| %_mask_idx: 0.39818| ppl: 216.2912| %_neg_is_pos: 0.02909| lr: 0.0| temp: 1.98171 | loss: 1.10231| constrast_loss: 4.34193| div_loss: 0.67314| %_mask_idx: 0.37234| ppl: 209.19266| %_neg_is_pos: 0.06049| lr: 0.0| temp: 1.9817 | loss: 1.12099| constrast_loss: 4.41999| div_loss: 0.63964| %_mask_idx: 0.39959| ppl: 230.63287| %_neg_is_pos: 0.02906| lr: 0.0| temp: 1.9817 | loss: 1.12337| constrast_loss: 4.43148| div_loss: 0.61992| %_mask_idx: 0.38205| ppl: 243.24846| %_neg_is_pos: 0.02631| lr: 0.0| temp: 1.98169 | loss: 1.1086| constrast_loss: 4.3677| div_loss: 0.6671| %_mask_idx: 0.44157| ppl: 213.05563| %_neg_is_pos: 0.02946| lr: 0.0| temp: 1.98169 | loss: 1.12351| constrast_loss: 4.42204| div_loss: 0.72014| %_mask_idx: 0.39693| ppl: 179.1113| %_neg_is_pos: 0.04593| lr: 0.0| temp: 1.98168 | loss: 1.10176| constrast_loss: 4.33675| div_loss: 0.70303| %_mask_idx: 0.38174| ppl: 190.06122| %_neg_is_pos: 0.04666| lr: 0.0| temp: 1.98168 | loss: 1.12945| constrast_loss: 4.45416| div_loss: 0.63643| %_mask_idx: 0.39881| ppl: 232.6864| %_neg_is_pos: 0.02219| lr: 0.0| temp: 1.98166 | loss: 1.13294| constrast_loss: 4.46843| div_loss: 0.63314| %_mask_idx: 0.41087| ppl: 234.79196| %_neg_is_pos: 0.03019| lr: 0.0| temp: 1.98166 | loss: 1.12176| constrast_loss: 4.42| div_loss: 0.67036| %_mask_idx: 0.39865| ppl: 210.97241| %_neg_is_pos: 0.02071| lr: 0.0| temp: 1.98165 | loss: 1.11659| constrast_loss: 4.39666| div_loss: 0.69677| %_mask_idx: 0.36873| ppl: 194.06924| %_neg_is_pos: 0.0663| lr: 0.0| temp: 1.98165 | loss: 1.10194| constrast_loss: 4.34011| div_loss: 0.67652| %_mask_idx: 0.36419| ppl: 207.02505| %_neg_is_pos: 0.04924| lr: 0.0| temp: 1.98164 | loss: 1.12968| constrast_loss: 4.45521| div_loss: 0.63495| %_mask_idx: 0.41369| ppl: 233.63049| %_neg_is_pos: 0.01713| lr: 0.0| temp: 1.98164 | loss: 1.11151| constrast_loss: 4.37868| div_loss: 0.67346| %_mask_idx: 0.34821| ppl: 208.98508| %_neg_is_pos: 0.03953| lr: 0.0| temp: 1.98163 | loss: 1.10033| constrast_loss: 4.33281| div_loss: 0.6852| %_mask_idx: 0.32722| ppl: 201.47021| %_neg_is_pos: 0.04589| lr: 0.0| temp: 1.98163 | loss: 1.07976| constrast_loss: 4.2512| div_loss: 0.67852| %_mask_idx: 0.36153| ppl: 205.7475| %_neg_is_pos: 0.07139| lr: 0.0| temp: 1.98161 | loss: 1.1335| constrast_loss: 4.46919| div_loss: 0.64805| %_mask_idx: 0.40257| ppl: 225.24709| %_neg_is_pos: 0.04521| lr: 0.0| temp: 1.98161 | loss: 1.11052| constrast_loss: 4.37288| div_loss: 0.69215| %_mask_idx: 0.35307| ppl: 197.02582| %_neg_is_pos: 0.05145| lr: 0.0| temp: 1.98161 | loss: 1.11479| constrast_loss: 4.38943| div_loss: 0.69725| %_mask_idx: 0.38299| ppl: 193.76065| %_neg_is_pos: 0.0417| lr: 0.0| temp: 1.98161 | loss: 1.12678| constrast_loss: 4.4424| div_loss: 0.64732| %_mask_idx: 0.41087| ppl: 225.71738| %_neg_is_pos: 0.02847| lr: 0.0| temp: 1.9816 | loss: 1.10616| constrast_loss: 4.35824| div_loss: 0.66419| %_mask_idx: 0.42701| ppl: 214.91937| %_neg_is_pos: 0.03151| lr: 0.0| temp: 1.9816 | loss: 1.10945| constrast_loss: 4.37051| div_loss: 0.67279| %_mask_idx: 0.37704| ppl: 209.41696| %_neg_is_pos: 0.03698| lr: 0.0| temp: 1.98159 | loss: 1.1237| constrast_loss: 4.4282| div_loss: 0.66601| %_mask_idx: 0.37234| ppl: 213.75214| %_neg_is_pos: 0.04602| lr: 0.0| temp: 1.98159 | loss: 1.12847| constrast_loss: 4.44821| div_loss: 0.65661| %_mask_idx: 0.37516| ppl: 219.76785| %_neg_is_pos: 0.05682| lr: 0.0| temp: 1.98157 | loss: 1.11877| constrast_loss: 4.40828| div_loss: 0.66784| %_mask_idx: 0.40038| ppl: 212.58517| %_neg_is_pos: 0.04254| lr: 0.0| temp: 1.98157 | loss: 1.12292| constrast_loss: 4.42451| div_loss: 0.67153| %_mask_idx: 0.36936| ppl: 210.21909| %_neg_is_pos: 0.0312| lr: 0.0| temp: 1.98156 | loss: 1.12853| constrast_loss: 4.44698| div_loss: 0.67133| %_mask_idx: 0.3808| ppl: 210.34628| %_neg_is_pos: 0.05348| lr: 0.0| temp: 1.98156 [2021-09-02 01:15:53,021] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 01:15:53,021] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.10386| constrast_loss: 4.34758| div_loss: 0.67866| %_mask_idx: 0.33506| ppl: 205.65891| %_neg_is_pos: 0.05388| lr: 0.0| temp: 1.98154 | loss: 1.10687| constrast_loss: 4.35816| div_loss: 0.6934| %_mask_idx: 0.41651| ppl: 196.2215| %_neg_is_pos: 0.04703| lr: 0.0| temp: 1.98154 | loss: 1.12371| constrast_loss: 4.42944| div_loss: 0.654| %_mask_idx: 0.38221| ppl: 221.43878| %_neg_is_pos: 0.04132| lr: 0.0| temp: 1.98153 | loss: 1.12917| constrast_loss: 4.44772| div_loss: 0.6896| %_mask_idx: 0.41228| ppl: 198.65514| %_neg_is_pos: 0.03993| lr: 0.0| temp: 1.98153 | loss: 1.13663| constrast_loss: 4.48288| div_loss: 0.63653| %_mask_idx: 0.42998| ppl: 232.62299| %_neg_is_pos: 0.01411| lr: 0.0| temp: 1.98152 | loss: 1.12268| constrast_loss: 4.42436| div_loss: 0.66366| %_mask_idx: 0.43264| ppl: 215.25623| %_neg_is_pos: 0.03127| lr: 0.0| temp: 1.98152 | loss: 1.12926| constrast_loss: 4.45284| div_loss: 0.64188| %_mask_idx: 0.37328| ppl: 229.19923| %_neg_is_pos: 0.01588| lr: 0.0| temp: 1.98151 | loss: 1.12667| constrast_loss: 4.44323| div_loss: 0.6344| %_mask_idx: 0.38095| ppl: 233.98238| %_neg_is_pos: 0.02861| lr: 0.0| temp: 1.98151 | loss: 1.12124| constrast_loss: 4.41825| div_loss: 0.66709| %_mask_idx: 0.36858| ppl: 213.06207| %_neg_is_pos: 0.02858| lr: 0.0| temp: 1.98149| loss: 1.12136| constrast_loss: 4.42| div_loss: 0.65422| %_mask_idx: 0.38847| ppl: 221.30188| %_neg_is_pos: 0.02766| lr: 0.0| temp: 1.98149 | loss: 1.15258| constrast_loss: 4.54687| div_loss: 0.63451| %_mask_idx: 0.37077| ppl: 233.91655| %_neg_is_pos: 0.01569| lr: 0.0| temp: 1.98148 | loss: 1.13147| constrast_loss: 4.46073| div_loss: 0.65164| %_mask_idx: 0.42262| ppl: 222.94919| %_neg_is_pos: 0.01226| lr: 0.0| temp: 1.98148 | loss: 1.11637| constrast_loss: 4.39856| div_loss: 0.66916| %_mask_idx: 0.3526| ppl: 211.73889| %_neg_is_pos: 0.01726| lr: 0.0| temp: 1.98147 | loss: 1.12977| constrast_loss: 4.45281| div_loss: 0.66276| %_mask_idx: 0.35135| ppl: 215.83469| %_neg_is_pos: 0.02602| lr: 0.0| temp: 1.98147 | loss: 1.11854| constrast_loss: 4.4064| div_loss: 0.67745| %_mask_idx: 0.40899| ppl: 206.43314| %_neg_is_pos: 0.01038| lr: 0.0| temp: 1.98146 | loss: 1.11864| constrast_loss: 4.4084| div_loss: 0.6617| %_mask_idx: 0.33302| ppl: 216.51056| %_neg_is_pos: 0.02274| lr: 0.0| temp: 1.98146 | loss: 1.11963| constrast_loss: 4.41271| div_loss: 0.65791| %_mask_idx: 0.37531| ppl: 218.93613| %_neg_is_pos: 0.01372| lr: 0.0| temp: 1.98144 | loss: 1.14162| constrast_loss: 4.50379| div_loss: 0.62702| %_mask_idx: 0.33506| ppl: 238.70891| %_neg_is_pos: 0.01727| lr: 0.0| temp: 1.98144 | loss: 1.12584| constrast_loss: 4.43666| div_loss: 0.66705| %_mask_idx: 0.41024| ppl: 213.08485| %_neg_is_pos: 0.00982| lr: 0.0| temp: 1.98143 | loss: 1.13645| constrast_loss: 4.48078| div_loss: 0.65024| %_mask_idx: 0.4292| ppl: 223.84669| %_neg_is_pos: 0.00815| lr: 0.0| temp: 1.98143 | loss: 1.14108| constrast_loss: 4.49963| div_loss: 0.6469| %_mask_idx: 0.39192| ppl: 225.98706| %_neg_is_pos: 0.01015| lr: 0.0| temp: 1.98142 | loss: 1.12902| constrast_loss: 4.45078| div_loss: 0.65276| %_mask_idx: 0.41933| ppl: 222.2326| %_neg_is_pos: 0.00789| lr: 0.0| temp: 1.98142 | loss: 1.13731| constrast_loss: 4.48171| div_loss: 0.67523| %_mask_idx: 0.40304| ppl: 207.85085| %_neg_is_pos: 0.01825| lr: 0.0| temp: 1.98141 | loss: 1.13325| constrast_loss: 4.46767| div_loss: 0.65339| %_mask_idx: 0.43202| ppl: 221.82846| %_neg_is_pos: 0.00742| lr: 0.0| temp: 1.98141 | loss: 1.1465| constrast_loss: 4.52269| div_loss: 0.63286| %_mask_idx: 0.38706| ppl: 234.96826| %_neg_is_pos: 0.00845| lr: 0.0| temp: 1.98139 | loss: 1.12805| constrast_loss: 4.44362| div_loss: 0.68572| %_mask_idx: 0.36764| ppl: 201.14166| %_neg_is_pos: 0.00722| lr: 0.0| temp: 1.98139 | loss: 1.13695| constrast_loss: 4.48223| div_loss: 0.65558| %_mask_idx: 0.34743| ppl: 220.42915| %_neg_is_pos: 0.01264| lr: 0.0| temp: 1.98138 | loss: 1.13851| constrast_loss: 4.49308| div_loss: 0.60952| %_mask_idx: 0.37813| ppl: 249.90973| %_neg_is_pos: 0.00375| lr: 0.0| temp: 1.98138 | loss: 1.12678| constrast_loss: 4.44015| div_loss: 0.66955| %_mask_idx: 0.34085| ppl: 211.48909| %_neg_is_pos: 0.01245| lr: 0.0| temp: 1.98136 | loss: 1.12498| constrast_loss: 4.43537| div_loss: 0.64558| %_mask_idx: 0.41369| ppl: 226.82581| %_neg_is_pos: 0.00676| lr: 0.0| temp: 1.98136 | loss: 1.12984| constrast_loss: 4.45381| div_loss: 0.6553| %_mask_idx: 0.36028| ppl: 220.60747| %_neg_is_pos: 0.00945| lr: 0.0| temp: 1.98135 | loss: 1.13243| constrast_loss: 4.46421| div_loss: 0.65505| %_mask_idx: 0.37892| ppl: 220.76874| %_neg_is_pos: 0.00906| lr: 0.0| temp: 1.98135 | loss: 1.14371| constrast_loss: 4.51178| div_loss: 0.63073| %_mask_idx: 0.37187| ppl: 236.33046| %_neg_is_pos: 0.00664| lr: 0.0| temp: 1.98134 | loss: 1.12733| constrast_loss: 4.44424| div_loss: 0.65079| %_mask_idx: 0.35573| ppl: 223.49387| %_neg_is_pos: 0.01362| lr: 0.0| temp: 1.98134 | loss: 1.12553| constrast_loss: 4.43628| div_loss: 0.65856| %_mask_idx: 0.37798| ppl: 218.52034| %_neg_is_pos: 0.00966| lr: 0.0| temp: 1.98133 | loss: 1.13729| constrast_loss: 4.48349| div_loss: 0.6568| %_mask_idx: 0.40241| ppl: 219.64767| %_neg_is_pos: 0.01| lr: 0.0| temp: 1.98133 | loss: 1.13377| constrast_loss: 4.46963| div_loss: 0.65435| %_mask_idx: 0.38831| ppl: 221.21472| %_neg_is_pos: 0.01148| lr: 0.0| temp: 1.98131 | loss: 1.13717| constrast_loss: 4.48438| div_loss: 0.64317| %_mask_idx: 0.36278| ppl: 228.37163| %_neg_is_pos: 0.01029| lr: 0.0| temp: 1.98131 | loss: 1.12825| constrast_loss: 4.44651| div_loss: 0.66501| %_mask_idx: 0.36513| ppl: 214.39264| %_neg_is_pos: 0.01334| lr: 0.0| temp: 1.9813 | loss: 1.12746| constrast_loss: 4.44499| div_loss: 0.64867| %_mask_idx: 0.40977| ppl: 224.84911| %_neg_is_pos: 0.01169| lr: 0.0| temp: 1.9813 | loss: 1.13913| constrast_loss: 4.49424| div_loss: 0.62264| %_mask_idx: 0.4068| ppl: 241.51129| %_neg_is_pos: 0.01371| lr: 0.0| temp: 1.98129 | loss: 1.13187| constrast_loss: 4.46313| div_loss: 0.64336| %_mask_idx: 0.40226| ppl: 228.25143| %_neg_is_pos: 0.00912| lr: 0.0| temp: 1.98129 | loss: 1.13784| constrast_loss: 4.48747| div_loss: 0.6387| %_mask_idx: 0.38816| ppl: 231.23471| %_neg_is_pos: 0.00869| lr: 0.0| temp: 1.98128 | loss: 1.14296| constrast_loss: 4.51099| div_loss: 0.6086| %_mask_idx: 0.43813| ppl: 250.49788| %_neg_is_pos: 0.00719| lr: 0.0| temp: 1.98128 | loss: 1.13941| constrast_loss: 4.49397| div_loss: 0.63678| %_mask_idx: 0.38628| ppl: 232.46094| %_neg_is_pos: 0.0087| lr: 0.0| temp: 1.98126 | loss: 1.12857| constrast_loss: 4.44754| div_loss: 0.6676| %_mask_idx: 0.39709| ppl: 212.73338| %_neg_is_pos: 0.00926| lr: 0.0| temp: 1.98126 | loss: 1.13253| constrast_loss: 4.46628| div_loss: 0.6382| %_mask_idx: 0.33177| ppl: 231.5498| %_neg_is_pos: 0.01476| lr: 0.0| temp: 1.98125 | loss: 1.11986| constrast_loss: 4.41219| div_loss: 0.67269| %_mask_idx: 0.35511| ppl: 209.48021| %_neg_is_pos: 0.0158| lr: 0.0| temp: 1.98125 | loss: 1.12979| constrast_loss: 4.45352| div_loss: 0.65658| %_mask_idx: 0.38863| ppl: 219.78912| %_neg_is_pos: 0.00876| lr: 0.0| temp: 1.98124 | loss: 1.13881| constrast_loss: 4.49157| div_loss: 0.63667| %_mask_idx: 0.40836| ppl: 232.52951| %_neg_is_pos: 0.00459| lr: 0.0| temp: 1.98124 | loss: 1.13146| constrast_loss: 4.4591| div_loss: 0.66758| %_mask_idx: 0.35213| ppl: 212.75095| %_neg_is_pos: 0.01015| lr: 0.0| temp: 1.98123 | loss: 1.13962| constrast_loss: 4.49286| div_loss: 0.656| %_mask_idx: 0.40492| ppl: 220.16054| %_neg_is_pos: 0.00733| lr: 0.0| temp: 1.98123 | loss: 1.13164| constrast_loss: 4.46117| div_loss: 0.65402| %_mask_idx: 0.37672| ppl: 221.42575| %_neg_is_pos: 0.01741| lr: 0.0| temp: 1.98121 | loss: 1.13941| constrast_loss: 4.49336| div_loss: 0.64274| %_mask_idx: 0.39944| ppl: 228.64523| %_neg_is_pos: 0.01043| lr: 0.0| temp: 1.98121 | loss: 1.12185| constrast_loss: 4.42045| div_loss: 0.66935| %_mask_idx: 0.38471| ppl: 211.6189| %_neg_is_pos: 0.01146| lr: 0.0| temp: 1.9812 | loss: 1.13206| constrast_loss: 4.46704| div_loss: 0.61194| %_mask_idx: 0.40711| ppl: 248.3604| %_neg_is_pos: 0.00463| lr: 0.0| temp: 1.9812 | loss: 1.12963| constrast_loss: 4.45351| div_loss: 0.6501| %_mask_idx: 0.37657| ppl: 223.93578| %_neg_is_pos: 0.01393| lr: 0.0| temp: 1.98118 | loss: 1.13847| constrast_loss: 4.48928| div_loss: 0.64594| %_mask_idx: 0.37594| ppl: 226.59698| %_neg_is_pos: 0.01388| lr: 0.0| temp: 1.98118 | loss: 1.13741| constrast_loss: 4.48747| div_loss: 0.62169| %_mask_idx: 0.40382| ppl: 242.11804| %_neg_is_pos: 0.01573| lr: 0.0| temp: 1.98117 | loss: 1.1368| constrast_loss: 4.48507| div_loss: 0.62147| %_mask_idx: 0.39756| ppl: 242.25931| %_neg_is_pos: 0.00738| lr: 0.0| temp: 1.98117 | loss: 1.12837| constrast_loss: 4.45007| div_loss: 0.63416| %_mask_idx: 0.39991| ppl: 234.13936| %_neg_is_pos: 0.00579| lr: 0.0| temp: 1.98116 | loss: 1.13842| constrast_loss: 4.49128| div_loss: 0.62402| %_mask_idx: 0.39176| ppl: 240.62532| %_neg_is_pos: 0.00891| lr: 0.0| temp: 1.98116 | loss: 1.12122| constrast_loss: 4.41554| div_loss: 0.69325| %_mask_idx: 0.30482| ppl: 196.31934| %_neg_is_pos: 0.01984| lr: 0.0| temp: 1.98115 | loss: 1.12507| constrast_loss: 4.4337| div_loss: 0.66599| %_mask_idx: 0.3927| ppl: 213.76773| %_neg_is_pos: 0.00962| lr: 0.0| temp: 1.98115 | loss: 1.13435| constrast_loss: 4.47167| div_loss: 0.65737| %_mask_idx: 0.37672| ppl: 219.28494| %_neg_is_pos: 0.00848| lr: 0.0| temp: 1.98113 | loss: 1.12844| constrast_loss: 4.44536| div_loss: 0.68392| %_mask_idx: 0.36028| ppl: 202.29053| %_neg_is_pos: 0.01935| lr: 0.0| temp: 1.98113 | loss: 1.12099| constrast_loss: 4.41737| div_loss: 0.66577| %_mask_idx: 0.40445| ppl: 213.90453| %_neg_is_pos: 0.02134| lr: 0.0| temp: 1.98112 | loss: 1.1374| constrast_loss: 4.48855| div_loss: 0.61031| %_mask_idx: 0.38142| ppl: 249.4024| %_neg_is_pos: 0.00511| lr: 0.0| temp: 1.98112 | loss: 1.12991| constrast_loss: 4.45369| div_loss: 0.6596| %_mask_idx: 0.44549| ppl: 217.85693| %_neg_is_pos: 0.01075| lr: 0.0| temp: 1.98111 | loss: 1.13496| constrast_loss: 4.47435| div_loss: 0.65473| %_mask_idx: 0.38456| ppl: 220.97269| %_neg_is_pos: 0.00933| lr: 0.0| temp: 1.98111 | loss: 1.14535| constrast_loss: 4.52038| div_loss: 0.61| %_mask_idx: 0.41933| ppl: 249.59906| %_neg_is_pos: 0.00456| lr: 0.0| temp: 1.9811 | loss: 1.13012| constrast_loss: 4.45543| div_loss: 0.6506| %_mask_idx: 0.36873| ppl: 223.61682| %_neg_is_pos: 0.02118| lr: 0.0| temp: 1.9811 | loss: 1.13748| constrast_loss: 4.48476| div_loss: 0.65177| %_mask_idx: 0.40758| ppl: 222.86896| %_neg_is_pos: 0.00787| lr: 0.0| temp: 1.98108 | loss: 1.12439| constrast_loss: 4.43177| div_loss: 0.65775| %_mask_idx: 0.3432| ppl: 219.03796| %_neg_is_pos: 0.01215| lr: 0.0| temp: 1.98108 | loss: 1.14353| constrast_loss: 4.50975| div_loss: 0.64372| %_mask_idx: 0.42716| ppl: 228.01611| %_neg_is_pos: 0.00535| lr: 0.0| temp: 1.98107 | loss: 1.12506| constrast_loss: 4.4356| div_loss: 0.6466| %_mask_idx: 0.33866| ppl: 226.17709| %_neg_is_pos: 0.01271| lr: 0.0| temp: 1.98107 | loss: 1.13309| constrast_loss: 4.46736| div_loss: 0.64995| %_mask_idx: 0.36873| ppl: 224.03235| %_neg_is_pos: 0.01585| lr: 0.0| temp: 1.98106 | loss: 1.13142| constrast_loss: 4.46057| div_loss: 0.65126| %_mask_idx: 0.38017| ppl: 223.1924| %_neg_is_pos: 0.00717| lr: 0.0| temp: 1.98106 | loss: 1.13761| constrast_loss: 4.48658| div_loss: 0.63841| %_mask_idx: 0.42716| ppl: 231.41669| %_neg_is_pos: 0.00722| lr: 0.0| temp: 1.98105 | loss: 1.13425| constrast_loss: 4.47037| div_loss: 0.66619| %_mask_idx: 0.40602| ppl: 213.63531| %_neg_is_pos: 0.01666| lr: 0.0| temp: 1.98105 | loss: 1.12522| constrast_loss: 4.43401| div_loss: 0.66871| %_mask_idx: 0.41761| ppl: 212.02719| %_neg_is_pos: 0.01397| lr: 0.0| temp: 1.98103 | loss: 1.12419| constrast_loss: 4.43055| div_loss: 0.66219| %_mask_idx: 0.35009| ppl: 216.20161| %_neg_is_pos: 0.01787| lr: 0.0| temp: 1.98103 | loss: 1.13626| constrast_loss: 4.4821| div_loss: 0.62958| %_mask_idx: 0.41761| ppl: 237.06567| %_neg_is_pos: 0.00439| lr: 0.0| temp: 1.98102 | loss: 1.13078| constrast_loss: 4.45826| div_loss: 0.64877| %_mask_idx: 0.41244| ppl: 224.78848| %_neg_is_pos: 0.00731| lr: 0.0| temp: 1.98102 | loss: 1.12973| constrast_loss: 4.45299| div_loss: 0.6594| %_mask_idx: 0.40335| ppl: 217.9863| %_neg_is_pos: 0.00746| lr: 0.0| temp: 1.981 | loss: 1.12231| constrast_loss: 4.42361| div_loss: 0.65621| %_mask_idx: 0.39348| ppl: 220.02492| %_neg_is_pos: 0.01229| lr: 0.0| temp: 1.981 | loss: 1.13452| constrast_loss: 4.47243| div_loss: 0.65641| %_mask_idx: 0.35761| ppl: 219.89447| %_neg_is_pos: 0.01451| lr: 0.0| temp: 1.98099 | loss: 1.13872| constrast_loss: 4.49015| div_loss: 0.64726| %_mask_idx: 0.40069| ppl: 225.75403| %_neg_is_pos: 0.00862| lr: 0.0| temp: 1.98099 | loss: 1.12989| constrast_loss: 4.4537| div_loss: 0.65877| %_mask_idx: 0.40633| ppl: 218.38487| %_neg_is_pos: 0.00961| lr: 0.0| temp: 1.98098 | loss: 1.13389| constrast_loss: 4.47185| div_loss: 0.63724| %_mask_idx: 0.39975| ppl: 232.16376| %_neg_is_pos: 0.00856| lr: 0.0| temp: 1.98098 | loss: 1.13046| constrast_loss: 4.45502| div_loss: 0.66802| %_mask_idx: 0.34868| ppl: 212.46902| %_neg_is_pos: 0.01283| lr: 0.0| temp: 1.98097 | loss: 1.11997| constrast_loss: 4.41243| div_loss: 0.67442| %_mask_idx: 0.37202| ppl: 208.37325| %_neg_is_pos: 0.0163| lr: 0.0| temp: 1.98097 | loss: 1.12247| constrast_loss: 4.4207| div_loss: 0.69187| %_mask_idx: 0.37406| ppl: 197.20578| %_neg_is_pos: 0.02016| lr: 0.0| temp: 1.98095 | loss: 1.15429| constrast_loss: 4.55631| div_loss: 0.60851| %_mask_idx: 0.42779| ppl: 250.55119| %_neg_is_pos: 0.00481| lr: 0.0| temp: 1.98095 | loss: 1.14442| constrast_loss: 4.51645| div_loss: 0.61209| %_mask_idx: 0.37187| ppl: 248.26076| %_neg_is_pos: 0.00727| lr: 0.0| temp: 1.98094 | loss: 1.14827| constrast_loss: 4.53071| div_loss: 0.6238| %_mask_idx: 0.41322| ppl: 240.76544| %_neg_is_pos: 0.00921| lr: 0.0| temp: 1.98094 | loss: 1.1337| constrast_loss: 4.46854| div_loss: 0.66253| %_mask_idx: 0.41197| ppl: 215.9826| %_neg_is_pos: 0.01522| lr: 0.0| temp: 1.98093 | loss: 1.12943| constrast_loss: 4.45273| div_loss: 0.64987| %_mask_idx: 0.37093| ppl: 224.08109| %_neg_is_pos: 0.00477| lr: 0.0| temp: 1.98093 | loss: 1.13873| constrast_loss: 4.49308| div_loss: 0.61823| %_mask_idx: 0.39411| ppl: 244.33304| %_neg_is_pos: 0.0039| lr: 0.0| temp: 1.98092 | loss: 1.14836| constrast_loss: 4.5303| div_loss: 0.63144| %_mask_idx: 0.45348| ppl: 235.87898| %_neg_is_pos: 0.0049| lr: 0.0| temp: 1.98092 | loss: 1.13178| constrast_loss: 4.4618| div_loss: 0.65325| %_mask_idx: 0.39411| ppl: 221.91763| %_neg_is_pos: 0.0101| lr: 0.0| temp: 1.9809 | loss: 1.14431| constrast_loss: 4.51506| div_loss: 0.6218| %_mask_idx: 0.4162| ppl: 242.04831| %_neg_is_pos: 0.00682| lr: 0.0| temp: 1.9809 | loss: 1.13763| constrast_loss: 4.48598| div_loss: 0.64545| %_mask_idx: 0.37547| ppl: 226.91423| %_neg_is_pos: 0.00507| lr: 0.0| temp: 1.98089 | loss: 1.13512| constrast_loss: 4.47435| div_loss: 0.66142| %_mask_idx: 0.40602| ppl: 216.6902| %_neg_is_pos: 0.01324| lr: 0.0| temp: 1.98089 | loss: 1.1292| constrast_loss: 4.45112| div_loss: 0.65692| %_mask_idx: 0.41823| ppl: 219.57178| %_neg_is_pos: 0.00841| lr: 0.0| temp: 1.98088 | loss: 1.13694| constrast_loss: 4.48337| div_loss: 0.64385| %_mask_idx: 0.38737| ppl: 227.93314| %_neg_is_pos: 0.00929| lr: 0.0| temp: 1.98088 | loss: 1.13386| constrast_loss: 4.46946| div_loss: 0.65995| %_mask_idx: 0.44189| ppl: 217.63388| %_neg_is_pos: 0.00701| lr: 0.0| temp: 1.98087 | loss: 1.13215| constrast_loss: 4.46262| div_loss: 0.65965| %_mask_idx: 0.41165| ppl: 217.8231| %_neg_is_pos: 0.006| lr: 0.0| temp: 1.98087 | loss: 1.13409| constrast_loss: 4.47084| div_loss: 0.65502| %_mask_idx: 0.33678| ppl: 220.78458| %_neg_is_pos: 0.0181| lr: 0.0| temp: 1.98085 | loss: 1.13516| constrast_loss: 4.47588| div_loss: 0.64745| %_mask_idx: 0.41494| ppl: 225.63356| %_neg_is_pos: 0.00808| lr: 0.0| temp: 1.98085 | loss: 1.13963| constrast_loss: 4.49426| div_loss: 0.64272| %_mask_idx: 0.42904| ppl: 228.65739| %_neg_is_pos: 0.00496| lr: 0.0| temp: 1.98084 | loss: 1.1216| constrast_loss: 4.42057| div_loss: 0.65819| %_mask_idx: 0.42058| ppl: 218.75528| %_neg_is_pos: 0.00691| lr: 0.0| temp: 1.98084 [2021-09-02 01:25:07,237] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 01:25:07,237] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.1384| constrast_loss: 4.48822| div_loss: 0.65379| %_mask_idx: 0.41463| ppl: 221.57219| %_neg_is_pos: 0.01595| lr: 0.0| temp: 1.98082 | loss: 1.13945| constrast_loss: 4.49566| div_loss: 0.62155| %_mask_idx: 0.41714| ppl: 242.20599| %_neg_is_pos: 0.0031| lr: 0.0| temp: 1.98082 | loss: 1.12889| constrast_loss: 4.44989| div_loss: 0.6565| %_mask_idx: 0.35197| ppl: 219.84152| %_neg_is_pos: 0.01702| lr: 0.0| temp: 1.98081 | loss: 1.13176| constrast_loss: 4.46161| div_loss: 0.65417| %_mask_idx: 0.39301| ppl: 221.33189| %_neg_is_pos: 0.01575| lr: 0.0| temp: 1.98081 | loss: 1.13102| constrast_loss: 4.45619| div_loss: 0.67883| %_mask_idx: 0.39756| ppl: 205.55| %_neg_is_pos: 0.01148| lr: 0.0| temp: 1.9808 | loss: 1.13088| constrast_loss: 4.45665| div_loss: 0.66851| %_mask_idx: 0.35558| ppl: 212.15268| %_neg_is_pos: 0.01714| lr: 0.0| temp: 1.9808 | loss: 1.13137| constrast_loss: 4.46062| div_loss: 0.64866| %_mask_idx: 0.37359| ppl: 224.85925| %_neg_is_pos: 0.01989| lr: 0.0| temp: 1.98079 | loss: 1.1381| constrast_loss: 4.48719| div_loss: 0.65208| %_mask_idx: 0.38643| ppl: 222.67065| %_neg_is_pos: 0.01069| lr: 0.0| temp: 1.98079 | loss: 1.12565| constrast_loss: 4.436| div_loss: 0.66605| %_mask_idx: 0.38346| ppl: 213.72499| %_neg_is_pos: 0.01561| lr: 0.0| temp: 1.98077 | loss: 1.12286| constrast_loss: 4.42717| div_loss: 0.64273| %_mask_idx: 0.42904| ppl: 228.65118| %_neg_is_pos: 0.01052| lr: 0.0| temp: 1.98077 | loss: 1.11135| constrast_loss: 4.37596| div_loss: 0.6944| %_mask_idx: 0.32691| ppl: 195.58643| %_neg_is_pos: 0.01959| lr: 0.0| temp: 1.98076 | loss: 1.12131| constrast_loss: 4.41956| div_loss: 0.65658| %_mask_idx: 0.37343| ppl: 219.78958| %_neg_is_pos: 0.0208| lr: 0.0| temp: 1.98076 | loss: 1.13685| constrast_loss: 4.48392| div_loss: 0.63492| %_mask_idx: 0.3963| ppl: 233.6481| %_neg_is_pos: 0.02436| lr: 0.0| temp: 1.98075 | loss: 1.13776| constrast_loss: 4.48871| div_loss: 0.6232| %_mask_idx: 0.41949| ppl: 241.15324| %_neg_is_pos: 0.01089| lr: 0.0| temp: 1.98075 | loss: 1.1256| constrast_loss: 4.43846| div_loss: 0.63957| %_mask_idx: 0.4187| ppl: 230.67657| %_neg_is_pos: 0.00907| lr: 0.0| temp: 1.98074 | loss: 1.13971| constrast_loss: 4.49495| div_loss: 0.63896| %_mask_idx: 0.40445| ppl: 231.06433| %_neg_is_pos: 0.01819| lr: 0.0| temp: 1.98074 | loss: 1.13277| constrast_loss: 4.46865| div_loss: 0.62419| %_mask_idx: 0.33427| ppl: 240.52011| %_neg_is_pos: 0.01147| lr: 0.0| temp: 1.98072| loss: 1.13102| constrast_loss: 4.45829| div_loss: 0.65806| %_mask_idx: 0.34633| ppl: 218.84274| %_neg_is_pos: 0.01576| lr: 0.0| temp: 1.98072 | loss: 1.12648| constrast_loss: 4.44038| div_loss: 0.65557| %_mask_idx: 0.349| ppl: 220.43428| %_neg_is_pos: 0.01846| lr: 0.0| temp: 1.98071 | loss: 1.1319| constrast_loss: 4.46273| div_loss: 0.64871| %_mask_idx: 0.39192| ppl: 224.82263| %_neg_is_pos: 0.01034| lr: 0.0| temp: 1.98071 | loss: 1.1168| constrast_loss: 4.40071| div_loss: 0.66475| %_mask_idx: 0.42466| ppl: 214.55763| %_neg_is_pos: 0.02331| lr: 0.0| temp: 1.9807 | loss: 1.12036| constrast_loss: 4.4159| div_loss: 0.65546| %_mask_idx: 0.36544| ppl: 220.50488| %_neg_is_pos: 0.02872| lr: 0.0| temp: 1.9807 | loss: 1.12196| constrast_loss: 4.41982| div_loss: 0.68026| %_mask_idx: 0.40304| ppl: 204.63675| %_neg_is_pos: 0.02322| lr: 0.0| temp: 1.98069 | loss: 1.13076| constrast_loss: 4.4575| div_loss: 0.65526| %_mask_idx: 0.3869| ppl: 220.63358| %_neg_is_pos: 0.01643| lr: 0.0| temp: 1.98069 | loss: 1.1037| constrast_loss: 4.34464| div_loss: 0.70177| %_mask_idx: 0.36137| ppl: 190.86679| %_neg_is_pos: 0.03005| lr: 0.0| temp: 1.98067 | loss: 1.12401| constrast_loss: 4.43137| div_loss: 0.64685| %_mask_idx: 0.37782| ppl: 226.01562| %_neg_is_pos: 0.01167| lr: 0.0| temp: 1.98067 | loss: 1.13281| constrast_loss: 4.46437| div_loss: 0.66856| %_mask_idx: 0.41494| ppl: 212.12367| %_neg_is_pos: 0.02223| lr: 0.0| temp: 1.98066 | loss: 1.11429| constrast_loss: 4.39224| div_loss: 0.64932| %_mask_idx: 0.3808| ppl: 224.43234| %_neg_is_pos: 0.02989| lr: 0.0| temp: 1.98066 | loss: 1.13592| constrast_loss: 4.47723| div_loss: 0.66458| %_mask_idx: 0.4281| ppl: 214.66576| %_neg_is_pos: 0.03107| lr: 0.0| temp: 1.98064 | loss: 1.12391| constrast_loss: 4.43004| div_loss: 0.65599| %_mask_idx: 0.38033| ppl: 220.16383| %_neg_is_pos: 0.01987| lr: 0.0| temp: 1.98064 | loss: 1.13941| constrast_loss: 4.49296| div_loss: 0.64691| %_mask_idx: 0.4281| ppl: 225.97476| %_neg_is_pos: 0.01047| lr: 0.0| temp: 1.98063 | loss: 1.12723| constrast_loss: 4.44466| div_loss: 0.64246| %_mask_idx: 0.38534| ppl: 228.82558| %_neg_is_pos: 0.01256| lr: 0.0| temp: 1.98063 | loss: 1.12925| constrast_loss: 4.45337| div_loss: 0.63635| %_mask_idx: 0.33349| ppl: 232.73399| %_neg_is_pos: 0.01861| lr: 0.0| temp: 1.98062 | loss: 1.11028| constrast_loss: 4.37468| div_loss: 0.6644| %_mask_idx: 0.37719| ppl: 214.78159| %_neg_is_pos: 0.02333| lr: 0.0| temp: 1.98062 | loss: 1.12081| constrast_loss: 4.41636| div_loss: 0.66864| %_mask_idx: 0.42011| ppl: 212.07086| %_neg_is_pos: 0.02053| lr: 0.0| temp: 1.98061 | loss: 1.13618| constrast_loss: 4.48169| div_loss: 0.63024| %_mask_idx: 0.3916| ppl: 236.6485| %_neg_is_pos: 0.01935| lr: 0.0| temp: 1.98061 | loss: 1.10815| constrast_loss: 4.36523| div_loss: 0.67364| %_mask_idx: 0.38095| ppl: 208.86749| %_neg_is_pos: 0.04482| lr: 0.0| temp: 1.98059 | loss: 1.12662| constrast_loss: 4.44312| div_loss: 0.63357| %_mask_idx: 0.36732| ppl: 234.51385| %_neg_is_pos: 0.02271| lr: 0.0| temp: 1.98059 | loss: 1.1324| constrast_loss: 4.46515| div_loss: 0.64456| %_mask_idx: 0.38377| ppl: 227.48218| %_neg_is_pos: 0.01331| lr: 0.0| temp: 1.98058 | loss: 1.10348| constrast_loss: 4.34508| div_loss: 0.68849| %_mask_idx: 0.388| ppl: 199.36624| %_neg_is_pos: 0.02537| lr: 0.0| temp: 1.98058 | loss: 1.11009| constrast_loss: 4.37202| div_loss: 0.68335| %_mask_idx: 0.37014| ppl: 202.65582| %_neg_is_pos: 0.03685| lr: 0.0| temp: 1.98057 | loss: 1.12652| constrast_loss: 4.44034| div_loss: 0.65721| %_mask_idx: 0.37939| ppl: 219.38815| %_neg_is_pos: 0.02602| lr: 0.0| temp: 1.98057 | loss: 1.13285| constrast_loss: 4.46675| div_loss: 0.64663| %_mask_idx: 0.38111| ppl: 226.15979| %_neg_is_pos: 0.00956| lr: 0.0| temp: 1.98056 | loss: 1.12199| constrast_loss: 4.42151| div_loss: 0.66451| %_mask_idx: 0.38534| ppl: 214.71667| %_neg_is_pos: 0.02994| lr: 0.0| temp: 1.98056 | loss: 1.11927| constrast_loss: 4.4135| div_loss: 0.63572| %_mask_idx: 0.33568| ppl: 233.14038| %_neg_is_pos: 0.04256| lr: 0.0| temp: 1.98055 | loss: 1.13255| constrast_loss: 4.46416| div_loss: 0.66026| %_mask_idx: 0.39333| ppl: 217.43109| %_neg_is_pos: 0.01505| lr: 0.0| temp: 1.98055 | loss: 1.10631| constrast_loss: 4.35839| div_loss: 0.6684| %_mask_idx: 0.33286| ppl: 212.22202| %_neg_is_pos: 0.03732| lr: 0.0| temp: 1.98054 | loss: 1.1402| constrast_loss: 4.49838| div_loss: 0.62442| %_mask_idx: 0.38737| ppl: 240.37077| %_neg_is_pos: 0.01409| lr: 0.0| temp: 1.98054 | loss: 1.12425| constrast_loss: 4.43133| div_loss: 0.65654| %_mask_idx: 0.39317| ppl: 219.81157| %_neg_is_pos: 0.01169| lr: 0.0| temp: 1.98053 | loss: 1.12164| constrast_loss: 4.42007| div_loss: 0.66473| %_mask_idx: 0.40257| ppl: 214.57452| %_neg_is_pos: 0.01512| lr: 0.0| temp: 1.98053 | loss: 1.13217| constrast_loss: 4.46254| div_loss: 0.66123| %_mask_idx: 0.38017| ppl: 216.81592| %_neg_is_pos: 0.02015| lr: 0.0| temp: 1.98052 | loss: 1.10611| constrast_loss: 4.35244| div_loss: 0.71999| %_mask_idx: 0.34085| ppl: 179.20392| %_neg_is_pos: 0.04671| lr: 0.0| temp: 1.98052 | loss: 1.12014| constrast_loss: 4.41434| div_loss: 0.66223| %_mask_idx: 0.41416| ppl: 216.1712| %_neg_is_pos: 0.02785| lr: 0.0| temp: 1.9805 | loss: 1.11528| constrast_loss: 4.39203| div_loss: 0.69101| %_mask_idx: 0.41165| ppl: 197.75076| %_neg_is_pos: 0.03312| lr: 0.0| temp: 1.9805 | loss: 1.1347| constrast_loss: 4.47532| div_loss: 0.63471| %_mask_idx: 0.38299| ppl: 233.78638| %_neg_is_pos: 0.01728| lr: 0.0| temp: 1.98049 | loss: 1.12946| constrast_loss: 4.45333| div_loss: 0.64526| %_mask_idx: 0.4057| ppl: 227.03073| %_neg_is_pos: 0.01668| lr: 0.0| temp: 1.98049 | loss: 1.12036| constrast_loss: 4.41596| div_loss: 0.65483| %_mask_idx: 0.37437| ppl: 220.91037| %_neg_is_pos: 0.04587| lr: 0.0| temp: 1.98047 | loss: 1.13482| constrast_loss: 4.47351| div_loss: 0.65755| %_mask_idx: 0.35229| ppl: 219.16989| %_neg_is_pos: 0.02885| lr: 0.0| temp: 1.98047 | loss: 1.10701| constrast_loss: 4.36098| div_loss: 0.67067| %_mask_idx: 0.34164| ppl: 210.77217| %_neg_is_pos: 0.06439| lr: 0.0| temp: 1.98046 | loss: 1.1221| constrast_loss: 4.42457| div_loss: 0.63817| %_mask_idx: 0.349| ppl: 231.57007| %_neg_is_pos: 0.02709| lr: 0.0| temp: 1.98046 | loss: 1.10347| constrast_loss: 4.34729| div_loss: 0.66581| %_mask_idx: 0.30435| ppl: 213.88007| %_neg_is_pos: 0.04439| lr: 0.0| temp: 1.98045 | loss: 1.12805| constrast_loss: 4.44597| div_loss: 0.66222| %_mask_idx: 0.3891| ppl: 216.17972| %_neg_is_pos: 0.02136| lr: 0.0| temp: 1.98045 | loss: 1.14086| constrast_loss: 4.49951| div_loss: 0.63947| %_mask_idx: 0.40476| ppl: 230.74088| %_neg_is_pos: 0.00987| lr: 0.0| temp: 1.98044 | loss: 1.12399| constrast_loss: 4.43005| div_loss: 0.65911| %_mask_idx: 0.40648| ppl: 218.17142| %_neg_is_pos: 0.01904| lr: 0.0| temp: 1.98044 | loss: 1.13633| constrast_loss: 4.4814| div_loss: 0.63923| %_mask_idx: 0.3833| ppl: 230.8902| %_neg_is_pos: 0.02412| lr: 0.0| temp: 1.98042 | loss: 1.13537| constrast_loss: 4.47716| div_loss: 0.64343| %_mask_idx: 0.39145| ppl: 228.20795| %_neg_is_pos: 0.011| lr: 0.0| temp: 1.98042 | loss: 1.13218| constrast_loss: 4.4637| div_loss: 0.65039| %_mask_idx: 0.41463| ppl: 223.75262| %_neg_is_pos: 0.01357| lr: 0.0| temp: 1.98041 | loss: 1.12232| constrast_loss: 4.42608| div_loss: 0.6322| %_mask_idx: 0.41181| ppl: 235.39029| %_neg_is_pos: 0.01195| lr: 0.0| temp: 1.98041 | loss: 1.11685| constrast_loss: 4.40334| div_loss: 0.64057| %_mask_idx: 0.33882| ppl: 230.03294| %_neg_is_pos: 0.02389| lr: 0.0| temp: 1.9804 | loss: 1.14124| constrast_loss: 4.5033| div_loss: 0.61655| %_mask_idx: 0.40273| ppl: 245.40617| %_neg_is_pos: 0.0211| lr: 0.0| temp: 1.9804 | loss: 1.1271| constrast_loss: 4.44596| div_loss: 0.62436| %_mask_idx: 0.401| ppl: 240.40926| %_neg_is_pos: 0.01642| lr: 0.0| temp: 1.98039 | loss: 1.12132| constrast_loss: 4.41861| div_loss: 0.66651| %_mask_idx: 0.36795| ppl: 213.43073| %_neg_is_pos: 0.05061| lr: 0.0| temp: 1.98039 | loss: 1.11012| constrast_loss: 4.37158| div_loss: 0.68908| %_mask_idx: 0.34555| ppl: 198.98901| %_neg_is_pos: 0.02941| lr: 0.0| temp: 1.98037 | loss: 1.10549| constrast_loss: 4.35379| div_loss: 0.68173| %_mask_idx: 0.38142| ppl: 203.69495| %_neg_is_pos: 0.03212| lr: 0.0| temp: 1.98037 | loss: 1.13207| constrast_loss: 4.46338| div_loss: 0.64921| %_mask_idx: 0.41291| ppl: 224.50708| %_neg_is_pos: 0.0159| lr: 0.0| temp: 1.98036 | loss: 1.12166| constrast_loss: 4.42259| div_loss: 0.64031| %_mask_idx: 0.4032| ppl: 230.20023| %_neg_is_pos: 0.01308| lr: 0.0| temp: 1.98036 | loss: 1.12158| constrast_loss: 4.42036| div_loss: 0.65981| %_mask_idx: 0.36732| ppl: 217.72037| %_neg_is_pos: 0.02496| lr: 0.0| temp: 1.98035 | loss: 1.121| constrast_loss: 4.41633| div_loss: 0.67685| %_mask_idx: 0.36795| ppl: 206.81769| %_neg_is_pos: 0.04194| lr: 0.0| temp: 1.98035 | loss: 1.1155| constrast_loss: 4.39648| div_loss: 0.65533| %_mask_idx: 0.42497| ppl: 220.5878| %_neg_is_pos: 0.02413| lr: 0.0| temp: 1.98034 | loss: 1.12296| constrast_loss: 4.4283| div_loss: 0.6352| %_mask_idx: 0.35965| ppl: 233.47195| %_neg_is_pos: 0.03143| lr: 0.0| temp: 1.98034 | loss: 1.13422| constrast_loss: 4.47201| div_loss: 0.64857| %_mask_idx: 0.44142| ppl: 224.91277| %_neg_is_pos: 0.00932| lr: 0.0| temp: 1.98032 | loss: 1.11765| constrast_loss: 4.40218| div_loss: 0.68417| %_mask_idx: 0.33631| ppl: 202.13077| %_neg_is_pos: 0.0348| lr: 0.0| temp: 1.98032 | loss: 1.11953| constrast_loss: 4.41088| div_loss: 0.67263| %_mask_idx: 0.39489| ppl: 209.51962| %_neg_is_pos: 0.04197| lr: 0.0| temp: 1.98031 | loss: 1.10764| constrast_loss: 4.3635| div_loss: 0.67067| %_mask_idx: 0.37751| ppl: 210.77194| %_neg_is_pos: 0.037| lr: 0.0| temp: 1.98031 | loss: 1.11726| constrast_loss: 4.40178| div_loss: 0.67276| %_mask_idx: 0.40273| ppl: 209.43661| %_neg_is_pos: 0.02306| lr: 0.0| temp: 1.98029 | loss: 1.11879| constrast_loss: 4.41031| div_loss: 0.64842| %_mask_idx: 0.388| ppl: 225.00845| %_neg_is_pos: 0.02847| lr: 0.0| temp: 1.98029 | loss: 1.1034| constrast_loss: 4.34383| div_loss: 0.69755| %_mask_idx: 0.35135| ppl: 193.57007| %_neg_is_pos: 0.03775| lr: 0.0| temp: 1.98028 | loss: 1.10913| constrast_loss: 4.3704| div_loss: 0.66133| %_mask_idx: 0.35855| ppl: 216.75076| %_neg_is_pos: 0.02708| lr: 0.0| temp: 1.98028 | loss: 1.12256| constrast_loss: 4.42609| div_loss: 0.64131| %_mask_idx: 0.39223| ppl: 229.55855| %_neg_is_pos: 0.02071| lr: 0.0| temp: 1.98027 | loss: 1.1208| constrast_loss: 4.41925| div_loss: 0.63962| %_mask_idx: 0.39897| ppl: 230.64426| %_neg_is_pos: 0.021| lr: 0.0| temp: 1.98027 | loss: 1.10541| constrast_loss: 4.35426| div_loss: 0.67382| %_mask_idx: 0.42873| ppl: 208.75757| %_neg_is_pos: 0.02584| lr: 0.0| temp: 1.98026 | loss: 1.10329| constrast_loss: 4.34284| div_loss: 0.70312| %_mask_idx: 0.35918| ppl: 190.00449| %_neg_is_pos: 0.04641| lr: 0.0| temp: 1.98026 | loss: 1.12578| constrast_loss: 4.43899| div_loss: 0.64134| %_mask_idx: 0.39521| ppl: 229.54286| %_neg_is_pos: 0.03899| lr: 0.0| temp: 1.98024 | loss: 1.14239| constrast_loss: 4.50539| div_loss: 0.64172| %_mask_idx: 0.39474| ppl: 229.29897| %_neg_is_pos: 0.01797| lr: 0.0| temp: 1.98024 | loss: 1.12527| constrast_loss: 4.43671| div_loss: 0.64382| %_mask_idx: 0.36717| ppl: 227.95258| %_neg_is_pos: 0.0182| lr: 0.0| temp: 1.98023 | loss: 1.11749| constrast_loss: 4.40305| div_loss: 0.66919| %_mask_idx: 0.40398| ppl: 211.72034| %_neg_is_pos: 0.01513| lr: 0.0| temp: 1.98023 | loss: 1.12874| constrast_loss: 4.45041| div_loss: 0.64531| %_mask_idx: 0.34618| ppl: 227.00473| %_neg_is_pos: 0.02243| lr: 0.0| temp: 1.98022 | loss: 1.12312| constrast_loss: 4.42508| div_loss: 0.67411| %_mask_idx: 0.39912| ppl: 208.57254| %_neg_is_pos: 0.02497| lr: 0.0| temp: 1.98022 | loss: 1.13325| constrast_loss: 4.46791| div_loss: 0.65068| %_mask_idx: 0.34853| ppl: 223.56165| %_neg_is_pos: 0.0275| lr: 0.0| temp: 1.98021 | loss: 1.11711| constrast_loss: 4.40078| div_loss: 0.67655| %_mask_idx: 0.41667| ppl: 207.00989| %_neg_is_pos: 0.01551| lr: 0.0| temp: 1.98021 | loss: 1.1068| constrast_loss: 4.36033| div_loss: 0.66865| %_mask_idx: 0.39881| ppl: 212.061| %_neg_is_pos: 0.03342| lr: 0.0| temp: 1.98019 | loss: 1.12075| constrast_loss: 4.41576| div_loss: 0.67243| %_mask_idx: 0.36967| ppl: 209.64442| %_neg_is_pos: 0.03248| lr: 0.0| temp: 1.98019 | loss: 1.10778| constrast_loss: 4.36148| div_loss: 0.69635| %_mask_idx: 0.40523| ppl: 194.33557| %_neg_is_pos: 0.01628| lr: 0.0| temp: 1.98018 | loss: 1.12126| constrast_loss: 4.4198| div_loss: 0.65235| %_mask_idx: 0.35667| ppl: 222.49396| %_neg_is_pos: 0.02587| lr: 0.0| temp: 1.98018 | loss: 1.12946| constrast_loss: 4.45295| div_loss: 0.64894| %_mask_idx: 0.42325| ppl: 224.67947| %_neg_is_pos: 0.01896| lr: 0.0| temp: 1.98017 | loss: 1.12396| constrast_loss: 4.42745| div_loss: 0.68391| %_mask_idx: 0.36435| ppl: 202.29971| %_neg_is_pos: 0.01801| lr: 0.0| temp: 1.98017 | loss: 1.13446| constrast_loss: 4.47608| div_loss: 0.61753| %_mask_idx: 0.37265| ppl: 244.77945| %_neg_is_pos: 0.01599| lr: 0.0| temp: 1.98016 | loss: 1.12395| constrast_loss: 4.43081| div_loss: 0.64999| %_mask_idx: 0.41181| ppl: 224.00461| %_neg_is_pos: 0.01645| lr: 0.0| temp: 1.98016 | loss: 1.12297| constrast_loss: 4.42644| div_loss: 0.65443| %_mask_idx: 0.41557| ppl: 221.16553| %_neg_is_pos: 0.02712| lr: 0.0| temp: 1.98014 | loss: 1.13378| constrast_loss: 4.47102| div_loss: 0.64122| %_mask_idx: 0.36372| ppl: 229.62198| %_neg_is_pos: 0.01823| lr: 0.0| temp: 1.98014 | loss: 1.11728| constrast_loss: 4.40313| div_loss: 0.66| %_mask_idx: 0.38409| ppl: 217.60013| %_neg_is_pos: 0.0349| lr: 0.0| temp: 1.98013 | loss: 1.13153| constrast_loss: 4.46203| div_loss: 0.64073| %_mask_idx: 0.42763| ppl: 229.93228| %_neg_is_pos: 0.01151| lr: 0.0| temp: 1.98013 [2021-09-02 01:34:20,407] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 01:34:20,407] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.13618| constrast_loss: 4.48218| div_loss: 0.62526| %_mask_idx: 0.41009| ppl: 239.83534| %_neg_is_pos: 0.01948| lr: 0.0| temp: 1.98011 | loss: 1.10269| constrast_loss: 4.34138| div_loss: 0.69384| %_mask_idx: 0.34884| ppl: 195.94006| %_neg_is_pos: 0.03317| lr: 0.0| temp: 1.98011 | loss: 1.09735| constrast_loss: 4.31895| div_loss: 0.70432| %_mask_idx: 0.34305| ppl: 189.23325| %_neg_is_pos: 0.03478| lr: 0.0| temp: 1.9801 | loss: 1.1191| constrast_loss: 4.40976| div_loss: 0.66636| %_mask_idx: 0.36811| ppl: 213.52803| %_neg_is_pos: 0.01684| lr: 0.0| temp: 1.9801 | loss: 1.13431| constrast_loss: 4.47628| div_loss: 0.60943| %_mask_idx: 0.42152| ppl: 249.9624| %_neg_is_pos: 0.02607| lr: 0.0| temp: 1.98009 | loss: 1.11596| constrast_loss: 4.39607| div_loss: 0.67783| %_mask_idx: 0.39787| ppl: 206.18571| %_neg_is_pos: 0.02589| lr: 0.0| temp: 1.98009 | loss: 1.11438| constrast_loss: 4.39072| div_loss: 0.66782| %_mask_idx: 0.45207| ppl: 212.59264| %_neg_is_pos: 0.02492| lr: 0.0| temp: 1.98008 | loss: 1.11749| constrast_loss: 4.40486| div_loss: 0.65107| %_mask_idx: 0.38236| ppl: 223.31757| %_neg_is_pos: 0.03905| lr: 0.0| temp: 1.98008 | loss: 1.12159| constrast_loss: 4.42229| div_loss: 0.64077| %_mask_idx: 0.34477| ppl: 229.90567| %_neg_is_pos: 0.03811| lr: 0.0| temp: 1.98006| loss: 1.11657| constrast_loss: 4.40143| div_loss: 0.6486| %_mask_idx: 0.39615| ppl: 224.89307| %_neg_is_pos: 0.03123| lr: 0.0| temp: 1.98006 | loss: 1.1174| constrast_loss: 4.40452| div_loss: 0.65095| %_mask_idx: 0.4162| ppl: 223.38922| %_neg_is_pos: 0.02931| lr: 0.0| temp: 1.98005 | loss: 1.12619| constrast_loss: 4.43855| div_loss: 0.66195| %_mask_idx: 0.38581| ppl: 216.35388| %_neg_is_pos: 0.02735| lr: 0.0| temp: 1.98005 | loss: 1.1167| constrast_loss: 4.39888| div_loss: 0.67916| %_mask_idx: 0.35761| ppl: 205.34019| %_neg_is_pos: 0.04173| lr: 0.0| temp: 1.98004 | loss: 1.1393| constrast_loss: 4.49532| div_loss: 0.61873| %_mask_idx: 0.42434| ppl: 244.01501| %_neg_is_pos: 0.0117| lr: 0.0| temp: 1.98004 | loss: 1.123| constrast_loss: 4.42557| div_loss: 0.6642| %_mask_idx: 0.40273| ppl: 214.91324| %_neg_is_pos: 0.03792| lr: 0.0| temp: 1.98003 | loss: 1.14423| constrast_loss: 4.51297| div_loss: 0.63961| %_mask_idx: 0.36497| ppl: 230.64746| %_neg_is_pos: 0.02527| lr: 0.0| temp: 1.98003 | loss: 1.11759| constrast_loss: 4.40369| div_loss: 0.66682| %_mask_idx: 0.37296| ppl: 213.23502| %_neg_is_pos: 0.04105| lr: 0.0| temp: 1.98001| loss: 1.13882| constrast_loss: 4.49219| div_loss: 0.631| %_mask_idx: 0.3609| ppl: 236.16074| %_neg_is_pos: 0.02801| lr: 0.0| temp: 1.98001 | loss: 1.11521| constrast_loss: 4.39492| div_loss: 0.65918| %_mask_idx: 0.36638| ppl: 218.12265| %_neg_is_pos: 0.03486| lr: 0.0| temp: 1.98 | loss: 1.1225| constrast_loss: 4.42352| div_loss: 0.66487| %_mask_idx: 0.39082| ppl: 214.4848| %_neg_is_pos: 0.03625| lr: 0.0| temp: 1.98 | loss: 1.08789| constrast_loss: 4.28453| div_loss: 0.67037| %_mask_idx: 0.35558| ppl: 210.96286| %_neg_is_pos: 0.04035| lr: 0.0| temp: 1.97999 | loss: 1.1076| constrast_loss: 4.3627| div_loss: 0.67685| %_mask_idx: 0.35307| ppl: 206.81581| %_neg_is_pos: 0.0401| lr: 0.0| temp: 1.97999 | loss: 1.1213| constrast_loss: 4.41945| div_loss: 0.6577| %_mask_idx: 0.37719| ppl: 219.07404| %_neg_is_pos: 0.03242| lr: 0.0| temp: 1.97998 | loss: 1.13451| constrast_loss: 4.47751| div_loss: 0.60522| %_mask_idx: 0.41494| ppl: 252.66232| %_neg_is_pos: 0.01853| lr: 0.0| temp: 1.97998 | loss: 1.12628| constrast_loss: 4.44094| div_loss: 0.64199| %_mask_idx: 0.36623| ppl: 229.12717| %_neg_is_pos: 0.03035| lr: 0.0| temp: 1.97996 | loss: 1.1259| constrast_loss: 4.44065| div_loss: 0.62928| %_mask_idx: 0.44095| ppl: 237.25877| %_neg_is_pos: 0.00863| lr: 0.0| temp: 1.97996 | loss: 1.11928| constrast_loss: 4.41178| div_loss: 0.65355| %_mask_idx: 0.37484| ppl: 221.73117| %_neg_is_pos: 0.02867| lr: 0.0| temp: 1.97995 | loss: 1.11218| constrast_loss: 4.38101| div_loss: 0.67706| %_mask_idx: 0.3479| ppl: 206.68463| %_neg_is_pos: 0.02978| lr: 0.0| temp: 1.97995 | loss: 1.10668| constrast_loss: 4.35929| div_loss: 0.67414| %_mask_idx: 0.44126| ppl: 208.54782| %_neg_is_pos: 0.03786| lr: 0.0| temp: 1.97993 | loss: 1.13165| constrast_loss: 4.46341| div_loss: 0.63187| %_mask_idx: 0.45144| ppl: 235.60414| %_neg_is_pos: 0.00563| lr: 0.0| temp: 1.97993 | loss: 1.12786| constrast_loss: 4.44691| div_loss: 0.6453| %_mask_idx: 0.38737| ppl: 227.00822| %_neg_is_pos: 0.03146| lr: 0.0| temp: 1.97992 | loss: 1.13306| constrast_loss: 4.46831| div_loss: 0.63941| %_mask_idx: 0.34445| ppl: 230.77612| %_neg_is_pos: 0.02372| lr: 0.0| temp: 1.97992 | loss: 1.13776| constrast_loss: 4.48766| div_loss: 0.63364| %_mask_idx: 0.42137| ppl: 234.46915| %_neg_is_pos: 0.00857| lr: 0.0| temp: 1.97991 | loss: 1.12411| constrast_loss: 4.42982| div_loss: 0.6662| %_mask_idx: 0.43813| ppl: 213.63112| %_neg_is_pos: 0.01584| lr: 0.0| temp: 1.97991 | loss: 1.12426| constrast_loss: 4.42931| div_loss: 0.67718| %_mask_idx: 0.38784| ppl: 206.60493| %_neg_is_pos: 0.0327| lr: 0.0| temp: 1.9799 | loss: 1.10467| constrast_loss: 4.35218| div_loss: 0.66503| %_mask_idx: 0.32832| ppl: 214.38011| %_neg_is_pos: 0.03544| lr: 0.0| temp: 1.9799 | loss: 1.13511| constrast_loss: 4.47705| div_loss: 0.63377| %_mask_idx: 0.41823| ppl: 234.39006| %_neg_is_pos: 0.0336| lr: 0.0| temp: 1.97988 | loss: 1.10907| constrast_loss: 4.36814| div_loss: 0.68141| %_mask_idx: 0.38612| ppl: 203.89522| %_neg_is_pos: 0.03361| lr: 0.0| temp: 1.97988 | loss: 1.13983| constrast_loss: 4.4969| div_loss: 0.62407| %_mask_idx: 0.40445| ppl: 240.59636| %_neg_is_pos: 0.01697| lr: 0.0| temp: 1.97987 | loss: 1.13395| constrast_loss: 4.47214| div_loss: 0.63676| %_mask_idx: 0.3938| ppl: 232.47134| %_neg_is_pos: 0.02846| lr: 0.0| temp: 1.97987 | loss: 1.11256| constrast_loss: 4.38379| div_loss: 0.66431| %_mask_idx: 0.42246| ppl: 214.8392| %_neg_is_pos: 0.02483| lr: 0.0| temp: 1.97986 | loss: 1.10835| constrast_loss: 4.36713| div_loss: 0.66263| %_mask_idx: 0.40868| ppl: 215.91896| %_neg_is_pos: 0.03388| lr: 0.0| temp: 1.97986 | loss: 1.13159| constrast_loss: 4.46486| div_loss: 0.61479| %_mask_idx: 0.38048| ppl: 246.53387| %_neg_is_pos: 0.03458| lr: 0.0| temp: 1.97985 | loss: 1.11108| constrast_loss: 4.37688| div_loss: 0.67435| %_mask_idx: 0.3266| ppl: 208.41615| %_neg_is_pos: 0.04027| lr: 0.0| temp: 1.97985 | loss: 1.12622| constrast_loss: 4.44175| div_loss: 0.63123| %_mask_idx: 0.40163| ppl: 236.01535| %_neg_is_pos: 0.01826| lr: 0.0| temp: 1.97983 | loss: 1.12601| constrast_loss: 4.43901| div_loss: 0.65045| %_mask_idx: 0.36685| ppl: 223.71126| %_neg_is_pos: 0.01722| lr: 0.0| temp: 1.97983 | loss: 1.12761| constrast_loss: 4.44739| div_loss: 0.63051| %_mask_idx: 0.38471| ppl: 236.47546| %_neg_is_pos: 0.02413| lr: 0.0| temp: 1.97982 | loss: 1.12675| constrast_loss: 4.44256| div_loss: 0.64457| %_mask_idx: 0.42419| ppl: 227.47578| %_neg_is_pos: 0.01122| lr: 0.0| temp: 1.97982 | loss: 1.12432| constrast_loss: 4.43048| div_loss: 0.6678| %_mask_idx: 0.36216| ppl: 212.61067| %_neg_is_pos: 0.04531| lr: 0.0| temp: 1.97981 | loss: 1.13886| constrast_loss: 4.49615| div_loss: 0.59284| %_mask_idx: 0.40789| ppl: 260.58337| %_neg_is_pos: 0.01272| lr: 0.0| temp: 1.97981 | loss: 1.13119| constrast_loss: 4.4619| div_loss: 0.62849| %_mask_idx: 0.41933| ppl: 237.76523| %_neg_is_pos: 0.00753| lr: 0.0| temp: 1.9798 | loss: 1.14339| constrast_loss: 4.51156| div_loss: 0.62003| %_mask_idx: 0.37782| ppl: 243.18362| %_neg_is_pos: 0.01006| lr: 0.0| temp: 1.9798 | loss: 1.12644| constrast_loss: 4.44049| div_loss: 0.65272| %_mask_idx: 0.37751| ppl: 222.25897| %_neg_is_pos: 0.01829| lr: 0.0| temp: 1.97978 | loss: 1.11913| constrast_loss: 4.41289| div_loss: 0.63625| %_mask_idx: 0.42184| ppl: 232.80147| %_neg_is_pos: 0.02141| lr: 0.0| temp: 1.97978 | loss: 1.12565| constrast_loss: 4.43795| div_loss: 0.6465| %_mask_idx: 0.40962| ppl: 226.23956| %_neg_is_pos: 0.02809| lr: 0.0| temp: 1.97977 | loss: 1.10257| constrast_loss: 4.34405| div_loss: 0.66214| %_mask_idx: 0.36388| ppl: 216.22754| %_neg_is_pos: 0.04229| lr: 0.0| temp: 1.97977 | loss: 1.11741| constrast_loss: 4.40469| div_loss: 0.64941| %_mask_idx: 0.3609| ppl: 224.37502| %_neg_is_pos: 0.03208| lr: 0.0| temp: 1.97975 | loss: 1.12367| constrast_loss: 4.43047| div_loss: 0.64225| %_mask_idx: 0.35166| ppl: 228.95854| %_neg_is_pos: 0.01829| lr: 0.0| temp: 1.97975 | loss: 1.13048| constrast_loss: 4.45849| div_loss: 0.63444| %_mask_idx: 0.40523| ppl: 233.95779| %_neg_is_pos: 0.01482| lr: 0.0| temp: 1.97974 | loss: 1.12579| constrast_loss: 4.44072| div_loss: 0.62442| %_mask_idx: 0.36043| ppl: 240.37146| %_neg_is_pos: 0.02175| lr: 0.0| temp: 1.97974 | loss: 1.13581| constrast_loss: 4.48174| div_loss: 0.61483| %_mask_idx: 0.34445| ppl: 246.50845| %_neg_is_pos: 0.03495| lr: 0.0| temp: 1.97973 | loss: 1.12003| constrast_loss: 4.41201| div_loss: 0.68107| %_mask_idx: 0.37719| ppl: 204.11636| %_neg_is_pos: 0.02716| lr: 0.0| temp: 1.97973 | loss: 1.13903| constrast_loss: 4.48911| div_loss: 0.66996| %_mask_idx: 0.43593| ppl: 211.22314| %_neg_is_pos: 0.00904| lr: 0.0| temp: 1.97972 | loss: 1.13053| constrast_loss: 4.45764| div_loss: 0.645| %_mask_idx: 0.414| ppl: 227.20047| %_neg_is_pos: 0.01996| lr: 0.0| temp: 1.97972 | loss: 1.13758| constrast_loss: 4.48862| div_loss: 0.61702| %_mask_idx: 0.40617| ppl: 245.10446| %_neg_is_pos: 0.01294| lr: 0.0| temp: 1.9797 | loss: 1.12952| constrast_loss: 4.45464| div_loss: 0.63438| %_mask_idx: 0.36513| ppl: 234.0| %_neg_is_pos: 0.01788| lr: 0.0| temp: 1.9797 | loss: 1.1236| constrast_loss: 4.42799| div_loss: 0.66392| %_mask_idx: 0.39489| ppl: 215.09013| %_neg_is_pos: 0.0223| lr: 0.0| temp: 1.97969 | loss: 1.13544| constrast_loss: 4.48169| div_loss: 0.60085| %_mask_idx: 0.42434| ppl: 255.4545| %_neg_is_pos: 0.00427| lr: 0.0| temp: 1.97969 | loss: 1.12194| constrast_loss: 4.42177| div_loss: 0.65986| %_mask_idx: 0.36325| ppl: 217.69186| %_neg_is_pos: 0.06326| lr: 0.0| temp: 1.97968 | loss: 1.1373| constrast_loss: 4.48703| div_loss: 0.62182| %_mask_idx: 0.42105| ppl: 242.03496| %_neg_is_pos: 0.01532| lr: 0.0| temp: 1.97968 | loss: 1.13397| constrast_loss: 4.473| div_loss: 0.62894| %_mask_idx: 0.3963| ppl: 237.47849| %_neg_is_pos: 0.00962| lr: 0.0| temp: 1.97967 | loss: 1.12583| constrast_loss: 4.43668| div_loss: 0.66645| %_mask_idx: 0.40132| ppl: 213.47153| %_neg_is_pos: 0.02614| lr: 0.0| temp: 1.97967 | loss: 1.13398| constrast_loss: 4.4731| div_loss: 0.62837| %_mask_idx: 0.38033| ppl: 237.84146| %_neg_is_pos: 0.02272| lr: 0.0| temp: 1.97965 | loss: 1.1059| constrast_loss: 4.35848| div_loss: 0.65098| %_mask_idx: 0.38221| ppl: 223.37405| %_neg_is_pos: 0.01686| lr: 0.0| temp: 1.97965 | loss: 1.12907| constrast_loss: 4.45298| div_loss: 0.63289| %_mask_idx: 0.4588| ppl: 234.95097| %_neg_is_pos: 0.01085| lr: 0.0| temp: 1.97964 | loss: 1.11381| constrast_loss: 4.39206| div_loss: 0.63175| %_mask_idx: 0.35385| ppl: 235.67789| %_neg_is_pos: 0.02523| lr: 0.0| temp: 1.97964 | loss: 1.12582| constrast_loss: 4.43781| div_loss: 0.65477| %_mask_idx: 0.42074| ppl: 220.94984| %_neg_is_pos: 0.01321| lr: 0.0| temp: 1.97963 | loss: 1.13279| constrast_loss: 4.4646| div_loss: 0.66575| %_mask_idx: 0.41134| ppl: 213.91763| %_neg_is_pos: 0.02064| lr: 0.0| temp: 1.97963 | loss: 1.13924| constrast_loss: 4.49692| div_loss: 0.60053| %_mask_idx: 0.39286| ppl: 255.6618| %_neg_is_pos: 0.015| lr: 0.0| temp: 1.97962 | loss: 1.13713| constrast_loss: 4.48824| div_loss: 0.60298| %_mask_idx: 0.4115| ppl: 254.09163| %_neg_is_pos: 0.0113| lr: 0.0| temp: 1.97962 | loss: 1.11911| constrast_loss: 4.41103| div_loss: 0.65399| %_mask_idx: 0.39677| ppl: 221.44534| %_neg_is_pos: 0.03259| lr: 0.0| temp: 1.9796 | loss: 1.1317| constrast_loss: 4.46393| div_loss: 0.6288| %_mask_idx: 0.41776| ppl: 237.57111| %_neg_is_pos: 0.01098| lr: 0.0| temp: 1.9796 | loss: 1.11997| constrast_loss: 4.41355| div_loss: 0.66316| %_mask_idx: 0.42622| ppl: 215.57529| %_neg_is_pos: 0.01221| lr: 0.0| temp: 1.97959 | loss: 1.12626| constrast_loss: 4.44189| div_loss: 0.6313| %_mask_idx: 0.36779| ppl: 235.96678| %_neg_is_pos: 0.01348| lr: 0.0| temp: 1.97959 | loss: 1.14086| constrast_loss: 4.50245| div_loss: 0.60995| %_mask_idx: 0.39787| ppl: 249.63083| %_neg_is_pos: 0.01208| lr: 0.0| temp: 1.97957 | loss: 1.11439| constrast_loss: 4.3917| div_loss: 0.65848| %_mask_idx: 0.45567| ppl: 218.57092| %_neg_is_pos: 0.0199| lr: 0.0| temp: 1.97957 | loss: 1.13437| constrast_loss: 4.4753| div_loss: 0.62181| %_mask_idx: 0.3869| ppl: 242.04063| %_neg_is_pos: 0.01846| lr: 0.0| temp: 1.97957 | loss: 1.12552| constrast_loss: 4.43657| div_loss: 0.65499| %_mask_idx: 0.38127| ppl: 220.80539| %_neg_is_pos: 0.01226| lr: 0.0| temp: 1.97957 | loss: 1.12999| constrast_loss: 4.45622| div_loss: 0.63744| %_mask_idx: 0.40821| ppl: 232.03664| %_neg_is_pos: 0.01658| lr: 0.0| temp: 1.97956 | loss: 1.14293| constrast_loss: 4.50843| div_loss: 0.63307| %_mask_idx: 0.38487| ppl: 234.83603| %_neg_is_pos: 0.00707| lr: 0.0| temp: 1.97956 | loss: 1.10394| constrast_loss: 4.34585| div_loss: 0.69908| %_mask_idx: 0.3313| ppl: 192.58679| %_neg_is_pos: 0.05213| lr: 0.0| temp: 1.97955 | loss: 1.11879| constrast_loss: 4.4088| div_loss: 0.66374| %_mask_idx: 0.37563| ppl: 215.20789| %_neg_is_pos: 0.03003| lr: 0.0| temp: 1.97955 | loss: 1.13734| constrast_loss: 4.48805| div_loss: 0.6132| %_mask_idx: 0.40695| ppl: 247.55145| %_neg_is_pos: 0.01795| lr: 0.0| temp: 1.97953 | loss: 1.12609| constrast_loss: 4.44008| div_loss: 0.64301| %_mask_idx: 0.41056| ppl: 228.47615| %_neg_is_pos: 0.01344| lr: 0.0| temp: 1.97953 | loss: 1.12803| constrast_loss: 4.4492| div_loss: 0.62911| %_mask_idx: 0.38628| ppl: 237.36647| %_neg_is_pos: 0.02145| lr: 0.0| temp: 1.97952 | loss: 1.11874| constrast_loss: 4.41046| div_loss: 0.64518| %_mask_idx: 0.41635| ppl: 227.08742| %_neg_is_pos: 0.02567| lr: 0.0| temp: 1.97952 | loss: 1.10869| constrast_loss: 4.3683| div_loss: 0.66449| %_mask_idx: 0.40523| ppl: 214.72534| %_neg_is_pos: 0.02162| lr: 0.0| temp: 1.97951 | loss: 1.12937| constrast_loss: 4.45297| div_loss: 0.645| %_mask_idx: 0.4104| ppl: 227.1974| %_neg_is_pos: 0.01917| lr: 0.0| temp: 1.97951 | loss: 1.11373| constrast_loss: 4.38788| div_loss: 0.67055| %_mask_idx: 0.35135| ppl: 210.84494| %_neg_is_pos: 0.0313| lr: 0.0| temp: 1.9795 | loss: 1.13286| constrast_loss: 4.46624| div_loss: 0.65191| %_mask_idx: 0.42058| ppl: 222.77457| %_neg_is_pos: 0.01962| lr: 0.0| temp: 1.9795 | loss: 1.11527| constrast_loss: 4.39335| div_loss: 0.67733| %_mask_idx: 0.39834| ppl: 206.50735| %_neg_is_pos: 0.02209| lr: 0.0| temp: 1.97948 | loss: 1.1428| constrast_loss: 4.50879| div_loss: 0.62394| %_mask_idx: 0.36842| ppl: 240.68134| %_neg_is_pos: 0.01134| lr: 0.0| temp: 1.97948 | loss: 1.12553| constrast_loss: 4.43964| div_loss: 0.62486| %_mask_idx: 0.40053| ppl: 240.08929| %_neg_is_pos: 0.02583| lr: 0.0| temp: 1.97947 | loss: 1.11865| constrast_loss: 4.40899| div_loss: 0.65625| %_mask_idx: 0.40429| ppl: 220.0031| %_neg_is_pos: 0.02203| lr: 0.0| temp: 1.97947 | loss: 1.11841| constrast_loss: 4.40656| div_loss: 0.67089| %_mask_idx: 0.34837| ppl: 210.62823| %_neg_is_pos: 0.02731| lr: 0.0| temp: 1.97946 | loss: 1.13844| constrast_loss: 4.49379| div_loss: 0.59958| %_mask_idx: 0.43327| ppl: 256.26746| %_neg_is_pos: 0.02115| lr: 0.0| temp: 1.97946 | loss: 1.1254| constrast_loss: 4.43633| div_loss: 0.65291| %_mask_idx: 0.38628| ppl: 222.1377| %_neg_is_pos: 0.02535| lr: 0.0| temp: 1.97945 | loss: 1.13006| constrast_loss: 4.45612| div_loss: 0.64115| %_mask_idx: 0.36263| ppl: 229.66296| %_neg_is_pos: 0.00957| lr: 0.0| temp: 1.97945 | loss: 1.12299| constrast_loss: 4.42818| div_loss: 0.63765| %_mask_idx: 0.37343| ppl: 231.90187| %_neg_is_pos: 0.03019| lr: 0.0| temp: 1.97943 | loss: 1.12103| constrast_loss: 4.42101| div_loss: 0.63116| %_mask_idx: 0.38784| ppl: 236.05838| %_neg_is_pos: 0.02977| lr: 0.0| temp: 1.97943 | loss: 1.12403| constrast_loss: 4.4318| div_loss: 0.64307| %_mask_idx: 0.35323| ppl: 228.43233| %_neg_is_pos: 0.0312| lr: 0.0| temp: 1.97942 | loss: 1.13541| constrast_loss: 4.4801| div_loss: 0.61558| %_mask_idx: 0.35667| ppl: 246.02731| %_neg_is_pos: 0.00598| lr: 0.0| temp: 1.97942 [2021-09-02 01:43:34,383] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 01:43:34,383] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.14111| constrast_loss: 4.50272| div_loss: 0.61713| %_mask_idx: 0.38487| ppl: 245.03612| %_neg_is_pos: 0.01034| lr: 0.0| temp: 1.9794| loss: 1.14042| constrast_loss: 4.5014| div_loss: 0.60285| %_mask_idx: 0.41714| ppl: 254.17532| %_neg_is_pos: 0.0208| lr: 0.0| temp: 1.9794 | loss: 1.12677| constrast_loss: 4.44292| div_loss: 0.64165| %_mask_idx: 0.38001| ppl: 229.34467| %_neg_is_pos: 0.02583| lr: 0.0| temp: 1.97939 | loss: 1.12415| constrast_loss: 4.43293| div_loss: 0.63667| %_mask_idx: 0.44737| ppl: 232.52872| %_neg_is_pos: 0.0089| lr: 0.0| temp: 1.97939 | loss: 1.11969| constrast_loss: 4.4142| div_loss: 0.64565| %_mask_idx: 0.36999| ppl: 226.78275| %_neg_is_pos: 0.03405| lr: 0.0| temp: 1.97938 | loss: 1.10982| constrast_loss: 4.37189| div_loss: 0.67412| %_mask_idx: 0.37249| ppl: 208.5654| %_neg_is_pos: 0.02976| lr: 0.0| temp: 1.97938 | loss: 1.11029| constrast_loss: 4.37656| div_loss: 0.64602| %_mask_idx: 0.29887| ppl: 226.54549| %_neg_is_pos: 0.02734| lr: 0.0| temp: 1.97937 | loss: 1.12388| constrast_loss: 4.43037| div_loss: 0.65143| %_mask_idx: 0.39066| ppl: 223.08414| %_neg_is_pos: 0.02131| lr: 0.0| temp: 1.97937 | loss: 1.12749| constrast_loss: 4.4468| div_loss: 0.63161| %_mask_idx: 0.36576| ppl: 235.7706| %_neg_is_pos: 0.02258| lr: 0.0| temp: 1.97935 | loss: 1.11136| constrast_loss: 4.37782| div_loss: 0.67602| %_mask_idx: 0.4162| ppl: 207.34851| %_neg_is_pos: 0.02876| lr: 0.0| temp: 1.97935 | loss: 1.14251| constrast_loss: 4.50659| div_loss: 0.63461| %_mask_idx: 0.41604| ppl: 233.84689| %_neg_is_pos: 0.00576| lr: 0.0| temp: 1.97934 | loss: 1.14002| constrast_loss: 4.49938| div_loss: 0.607| %_mask_idx: 0.34258| ppl: 251.51894| %_neg_is_pos: 0.01395| lr: 0.0| temp: 1.97934 | loss: 1.12712| constrast_loss: 4.44413| div_loss: 0.64362| %_mask_idx: 0.30404| ppl: 228.08261| %_neg_is_pos: 0.01061| lr: 0.0| temp: 1.97933 | loss: 1.14169| constrast_loss: 4.5032| div_loss: 0.63575| %_mask_idx: 0.45583| ppl: 233.11823| %_neg_is_pos: 0.00387| lr: 0.0| temp: 1.97933 | loss: 1.14361| constrast_loss: 4.51402| div_loss: 0.60424| %_mask_idx: 0.38925| ppl: 253.28787| %_neg_is_pos: 0.00647| lr: 0.0| temp: 1.97932 | loss: 1.1375| constrast_loss: 4.48523| div_loss: 0.64786| %_mask_idx: 0.40821| ppl: 225.37103| %_neg_is_pos: 0.00758| lr: 0.0| temp: 1.97932 | loss: 1.13184| constrast_loss: 4.46387| div_loss: 0.63473| %_mask_idx: 0.41541| ppl: 233.77429| %_neg_is_pos: 0.00557| lr: 0.0| temp: 1.9793 | loss: 1.11929| constrast_loss: 4.41267| div_loss: 0.64469| %_mask_idx: 0.36607| ppl: 227.39954| %_neg_is_pos: 0.00778| lr: 0.0| temp: 1.9793 | loss: 1.12967| constrast_loss: 4.45525| div_loss: 0.63431| %_mask_idx: 0.39693| ppl: 234.04443| %_neg_is_pos: 0.00995| lr: 0.0| temp: 1.97929 | loss: 1.13806| constrast_loss: 4.48876| div_loss: 0.63487| %_mask_idx: 0.41949| ppl: 233.68263| %_neg_is_pos: 0.00776| lr: 0.0| temp: 1.97929 | loss: 1.14372| constrast_loss: 4.51427| div_loss: 0.60626| %_mask_idx: 0.40304| ppl: 251.99442| %_neg_is_pos: 0.00544| lr: 0.0| temp: 1.97928 | loss: 1.13242| constrast_loss: 4.46397| div_loss: 0.65724| %_mask_idx: 0.36826| ppl: 219.36334| %_neg_is_pos: 0.00781| lr: 0.0| temp: 1.97928 | loss: 1.13041| constrast_loss: 4.45871| div_loss: 0.62939| %_mask_idx: 0.36325| ppl: 237.1875| %_neg_is_pos: 0.00635| lr: 0.0| temp: 1.97927 | loss: 1.13255| constrast_loss: 4.46283| div_loss: 0.67364| %_mask_idx: 0.41322| ppl: 208.87192| %_neg_is_pos: 0.00971| lr: 0.0| temp: 1.97927 | loss: 1.13608| constrast_loss: 4.48018| div_loss: 0.64146| %_mask_idx: 0.39771| ppl: 229.46667| %_neg_is_pos: 0.00807| lr: 0.0| temp: 1.97925 | loss: 1.12605| constrast_loss: 4.43712| div_loss: 0.67094| %_mask_idx: 0.36638| ppl: 210.59991| %_neg_is_pos: 0.01121| lr: 0.0| temp: 1.97925 | loss: 1.13215| constrast_loss: 4.46276| div_loss: 0.65825| %_mask_idx: 0.38362| ppl: 218.71727| %_neg_is_pos: 0.00654| lr: 0.0| temp: 1.97924 | loss: 1.13136| constrast_loss: 4.4613| div_loss: 0.64139| %_mask_idx: 0.41933| ppl: 229.51176| %_neg_is_pos: 0.00816| lr: 0.0| temp: 1.97924 | loss: 1.12649| constrast_loss: 4.4406| div_loss: 0.65355| %_mask_idx: 0.37939| ppl: 221.72604| %_neg_is_pos: 0.00631| lr: 0.0| temp: 1.97922 | loss: 1.13233| constrast_loss: 4.46579| div_loss: 0.6352| %_mask_idx: 0.3797| ppl: 233.47156| %_neg_is_pos: 0.01065| lr: 0.0| temp: 1.97922 | loss: 1.14541| constrast_loss: 4.51883| div_loss: 0.62806| %_mask_idx: 0.4552| ppl: 238.0433| %_neg_is_pos: 0.00771| lr: 0.0| temp: 1.97921 | loss: 1.12502| constrast_loss: 4.43304| div_loss: 0.67043| %_mask_idx: 0.34978| ppl: 210.92662| %_neg_is_pos: 0.01021| lr: 0.0| temp: 1.97921 | loss: 1.11688| constrast_loss: 4.40129| div_loss: 0.66215| %_mask_idx: 0.36936| ppl: 216.22694| %_neg_is_pos: 0.01378| lr: 0.0| temp: 1.9792 | loss: 1.11134| constrast_loss: 4.37805| div_loss: 0.67288| %_mask_idx: 0.35417| ppl: 209.3588| %_neg_is_pos: 0.02173| lr: 0.0| temp: 1.9792 | loss: 1.13487| constrast_loss: 4.4749| div_loss: 0.64582| %_mask_idx: 0.42215| ppl: 226.67242| %_neg_is_pos: 0.00503| lr: 0.0| temp: 1.97919 | loss: 1.12895| constrast_loss: 4.45199| div_loss: 0.63822| %_mask_idx: 0.34038| ppl: 231.53809| %_neg_is_pos: 0.00882| lr: 0.0| temp: 1.97919 | loss: 1.12535| constrast_loss: 4.43773| div_loss: 0.63666| %_mask_idx: 0.38831| ppl: 232.5365| %_neg_is_pos: 0.00757| lr: 0.0| temp: 1.97917 | loss: 1.13716| constrast_loss: 4.48719| div_loss: 0.61452| %_mask_idx: 0.41244| ppl: 246.70966| %_neg_is_pos: 0.00701| lr: 0.0| temp: 1.97917 | loss: 1.1485| constrast_loss: 4.53383| div_loss: 0.60161| %_mask_idx: 0.38534| ppl: 254.97116| %_neg_is_pos: 0.0064| lr: 0.0| temp: 1.97916 | loss: 1.14406| constrast_loss: 4.51527| div_loss: 0.6099| %_mask_idx: 0.3584| ppl: 249.66266| %_neg_is_pos: 0.01093| lr: 0.0| temp: 1.97916 | loss: 1.13691| constrast_loss: 4.48491| div_loss: 0.62736| %_mask_idx: 0.37124| ppl: 238.48795| %_neg_is_pos: 0.00339| lr: 0.0| temp: 1.97915 | loss: 1.13256| constrast_loss: 4.4665| div_loss: 0.63747| %_mask_idx: 0.40539| ppl: 232.01996| %_neg_is_pos: 0.01521| lr: 0.0| temp: 1.97915 | loss: 1.13219| constrast_loss: 4.46405| div_loss: 0.64721| %_mask_idx: 0.38299| ppl: 225.78574| %_neg_is_pos: 0.00929| lr: 0.0| temp: 1.97914 | loss: 1.13375| constrast_loss: 4.47357| div_loss: 0.61433| %_mask_idx: 0.45614| ppl: 246.82935| %_neg_is_pos: 0.00458| lr: 0.0| temp: 1.97914 | loss: 1.1164| constrast_loss: 4.40187| div_loss: 0.63713| %_mask_idx: 0.35432| ppl: 232.23785| %_neg_is_pos: 0.01252| lr: 0.0| temp: 1.97912 | loss: 1.13806| constrast_loss: 4.48761| div_loss: 0.64648| %_mask_idx: 0.37531| ppl: 226.2516| %_neg_is_pos: 0.01193| lr: 0.0| temp: 1.97912 | loss: 1.11955| constrast_loss: 4.41062| div_loss: 0.67598| %_mask_idx: 0.40288| ppl: 207.36996| %_neg_is_pos: 0.01827| lr: 0.0| temp: 1.97911 | loss: 1.11197| constrast_loss: 4.38107| div_loss: 0.66821| %_mask_idx: 0.37218| ppl: 212.34297| %_neg_is_pos: 0.01687| lr: 0.0| temp: 1.97911 | loss: 1.12948| constrast_loss: 4.4557| div_loss: 0.62234| %_mask_idx: 0.39552| ppl: 241.69968| %_neg_is_pos: 0.01044| lr: 0.0| temp: 1.9791 | loss: 1.13838| constrast_loss: 4.49055| div_loss: 0.62982| %_mask_idx: 0.37422| ppl: 236.91296| %_neg_is_pos: 0.00437| lr: 0.0| temp: 1.9791 | loss: 1.14339| constrast_loss: 4.50905| div_loss: 0.64495| %_mask_idx: 0.39395| ppl: 227.23438| %_neg_is_pos: 0.00382| lr: 0.0| temp: 1.97909 | loss: 1.13202| constrast_loss: 4.46511| div_loss: 0.62978| %_mask_idx: 0.39207| ppl: 236.9386| %_neg_is_pos: 0.02183| lr: 0.0| temp: 1.97909 | loss: 1.13016| constrast_loss: 4.45665| div_loss: 0.63978| %_mask_idx: 0.37876| ppl: 230.54181| %_neg_is_pos: 0.00565| lr: 0.0| temp: 1.97907 | loss: 1.13318| constrast_loss: 4.46909| div_loss: 0.63644| %_mask_idx: 0.3869| ppl: 232.6796| %_neg_is_pos: 0.00392| lr: 0.0| temp: 1.97907 | loss: 1.13063| constrast_loss: 4.45894| div_loss: 0.63594| %_mask_idx: 0.43687| ppl: 233.00136| %_neg_is_pos: 0.00462| lr: 0.0| temp: 1.97906 | loss: 1.13316| constrast_loss: 4.4671| div_loss: 0.65531| %_mask_idx: 0.3656| ppl: 220.60101| %_neg_is_pos: 0.00834| lr: 0.0| temp: 1.97906 | loss: 1.12397| constrast_loss: 4.43235| div_loss: 0.63539| %_mask_idx: 0.38503| ppl: 233.34871| %_neg_is_pos: 0.01328| lr: 0.0| temp: 1.97904 | loss: 1.14533| constrast_loss: 4.52061| div_loss: 0.60712| %_mask_idx: 0.40351| ppl: 251.44231| %_neg_is_pos: 0.00442| lr: 0.0| temp: 1.97904 | loss: 1.13563| constrast_loss: 4.48009| div_loss: 0.62427| %_mask_idx: 0.43484| ppl: 240.46936| %_neg_is_pos: 0.00455| lr: 0.0| temp: 1.97903 | loss: 1.13305| constrast_loss: 4.46883| div_loss: 0.63377| %_mask_idx: 0.36952| ppl: 234.38443| %_neg_is_pos: 0.00839| lr: 0.0| temp: 1.97903 | loss: 1.13356| constrast_loss: 4.47263| div_loss: 0.61603| %_mask_idx: 0.38847| ppl: 245.74138| %_neg_is_pos: 0.0071| lr: 0.0| temp: 1.97902 | loss: 1.14492| constrast_loss: 4.51777| div_loss: 0.61892| %_mask_idx: 0.41541| ppl: 243.89417| %_neg_is_pos: 0.00257| lr: 0.0| temp: 1.97902 | loss: 1.12954| constrast_loss: 4.45507| div_loss: 0.63103| %_mask_idx: 0.40977| ppl: 236.14157| %_neg_is_pos: 0.00609| lr: 0.0| temp: 1.97901 | loss: 1.13262| constrast_loss: 4.46853| div_loss: 0.61963| %_mask_idx: 0.45818| ppl: 243.4339| %_neg_is_pos: 0.00609| lr: 0.0| temp: 1.97901 | loss: 1.12889| constrast_loss: 4.45213| div_loss: 0.63447| %_mask_idx: 0.38503| ppl: 233.93896| %_neg_is_pos: 0.01491| lr: 0.0| temp: 1.97899 | loss: 1.12833| constrast_loss: 4.44798| div_loss: 0.65324| %_mask_idx: 0.39521| ppl: 221.92467| %_neg_is_pos: 0.0095| lr: 0.0| temp: 1.97899 | loss: 1.1371| constrast_loss: 4.48475| div_loss: 0.63639| %_mask_idx: 0.33333| ppl: 232.70801| %_neg_is_pos: 0.00876| lr: 0.0| temp: 1.97898 | loss: 1.13863| constrast_loss: 4.49005| div_loss: 0.64492| %_mask_idx: 0.41729| ppl: 227.24953| %_neg_is_pos: 0.01078| lr: 0.0| temp: 1.97898 | loss: 1.1397| constrast_loss: 4.49635| div_loss: 0.62472| %_mask_idx: 0.36372| ppl: 240.17715| %_neg_is_pos: 0.01114| lr: 0.0| temp: 1.97897 | loss: 1.12866| constrast_loss: 4.44936| div_loss: 0.6528| %_mask_idx: 0.34555| ppl: 222.21011| %_neg_is_pos: 0.00698| lr: 0.0| temp: 1.97897 | loss: 1.13891| constrast_loss: 4.49201| div_loss: 0.63637| %_mask_idx: 0.39897| ppl: 232.72351| %_neg_is_pos: 0.01227| lr: 0.0| temp: 1.97896 | loss: 1.13668| constrast_loss: 4.48452| div_loss: 0.62187| %_mask_idx: 0.36122| ppl: 242.00369| %_neg_is_pos: 0.00862| lr: 0.0| temp: 1.97896 | loss: 1.13342| constrast_loss: 4.47047| div_loss: 0.63224| %_mask_idx: 0.40523| ppl: 235.36473| %_neg_is_pos: 0.00842| lr: 0.0| temp: 1.97894 | loss: 1.14065| constrast_loss: 4.50043| div_loss: 0.6217| %_mask_idx: 0.37171| ppl: 242.11197| %_neg_is_pos: 0.00646| lr: 0.0| temp: 1.97894 | loss: 1.13298| constrast_loss: 4.46791| div_loss: 0.64014| %_mask_idx: 0.31172| ppl: 230.31259| %_neg_is_pos: 0.01179| lr: 0.0| temp: 1.97893 | loss: 1.12718| constrast_loss: 4.44505| div_loss: 0.63654| %_mask_idx: 0.39098| ppl: 232.61687| %_neg_is_pos: 0.01071| lr: 0.0| temp: 1.97893 | loss: 1.12527| constrast_loss: 4.43672| div_loss: 0.64372| %_mask_idx: 0.38643| ppl: 228.01865| %_neg_is_pos: 0.00796| lr: 0.0| temp: 1.97892 | loss: 1.1344| constrast_loss: 4.4743| div_loss: 0.63309| %_mask_idx: 0.40461| ppl: 234.82062| %_neg_is_pos: 0.00424| lr: 0.0| temp: 1.97892 | loss: 1.11487| constrast_loss: 4.39469| div_loss: 0.64769| %_mask_idx: 0.37782| ppl: 225.47903| %_neg_is_pos: 0.00754| lr: 0.0| temp: 1.97891 | loss: 1.12858| constrast_loss: 4.45022| div_loss: 0.64106| %_mask_idx: 0.34931| ppl: 229.72455| %_neg_is_pos: 0.00846| lr: 0.0| temp: 1.97891 | loss: 1.12874| constrast_loss: 4.45089| div_loss: 0.64054| %_mask_idx: 0.37406| ppl: 230.05618| %_neg_is_pos: 0.00807| lr: 0.0| temp: 1.97889 | loss: 1.13682| constrast_loss: 4.48414| div_loss: 0.63135| %_mask_idx: 0.42011| ppl: 235.93578| %_neg_is_pos: 0.00545| lr: 0.0| temp: 1.97889 | loss: 1.13458| constrast_loss: 4.47678| div_loss: 0.61523| %_mask_idx: 0.37171| ppl: 246.25043| %_neg_is_pos: 0.00706| lr: 0.0| temp: 1.97888 | loss: 1.14122| constrast_loss: 4.50104| div_loss: 0.63854| %_mask_idx: 0.38377| ppl: 231.33458| %_neg_is_pos: 0.00514| lr: 0.0| temp: 1.97888 | loss: 1.13725| constrast_loss: 4.48686| div_loss: 0.6214| %_mask_idx: 0.41776| ppl: 242.30409| %_neg_is_pos: 0.00997| lr: 0.0| temp: 1.97886 | loss: 1.12837| constrast_loss: 4.44983| div_loss: 0.63636| %_mask_idx: 0.34414| ppl: 232.73083| %_neg_is_pos: 0.01527| lr: 0.0| temp: 1.97886 | loss: 1.11523| constrast_loss: 4.3941| div_loss: 0.6683| %_mask_idx: 0.40116| ppl: 212.28986| %_neg_is_pos: 0.01821| lr: 0.0| temp: 1.97885 | loss: 1.12762| constrast_loss: 4.44483| div_loss: 0.65663| %_mask_idx: 0.36122| ppl: 219.75842| %_neg_is_pos: 0.01198| lr: 0.0| temp: 1.97885 | loss: 1.14176| constrast_loss: 4.50421| div_loss: 0.6282| %_mask_idx: 0.36122| ppl: 237.95233| %_neg_is_pos: 0.01188| lr: 0.0| temp: 1.97884 | loss: 1.13342| constrast_loss: 4.4706| div_loss: 0.63095| %_mask_idx: 0.36325| ppl: 236.18916| %_neg_is_pos: 0.00439| lr: 0.0| temp: 1.97884 | loss: 1.13765| constrast_loss: 4.48891| div_loss: 0.61699| %_mask_idx: 0.42215| ppl: 245.12808| %_neg_is_pos: 0.00501| lr: 0.0| temp: 1.97883 | loss: 1.12804| constrast_loss: 4.44821| div_loss: 0.63954| %_mask_idx: 0.39207| ppl: 230.69136| %_neg_is_pos: 0.00915| lr: 0.0| temp: 1.97883 | loss: 1.12567| constrast_loss: 4.43961| div_loss: 0.63077| %_mask_idx: 0.41306| ppl: 236.30833| %_neg_is_pos: 0.01086| lr: 0.0| temp: 1.97881 | loss: 1.13835| constrast_loss: 4.48965| div_loss: 0.63756| %_mask_idx: 0.37124| ppl: 231.96196| %_neg_is_pos: 0.00737| lr: 0.0| temp: 1.97881 | loss: 1.13559| constrast_loss: 4.47884| div_loss: 0.63514| %_mask_idx: 0.41197| ppl: 233.51189| %_neg_is_pos: 0.00457| lr: 0.0| temp: 1.9788 | loss: 1.13893| constrast_loss: 4.49277| div_loss: 0.62957| %_mask_idx: 0.38581| ppl: 237.07611| %_neg_is_pos: 0.00401| lr: 0.0| temp: 1.9788 | loss: 1.12453| constrast_loss: 4.43313| div_loss: 0.64992| %_mask_idx: 0.35636| ppl: 224.05066| %_neg_is_pos: 0.01467| lr: 0.0| temp: 1.97879 | loss: 1.14352| constrast_loss: 4.5109| div_loss: 0.63183| %_mask_idx: 0.40555| ppl: 235.63011| %_neg_is_pos: 0.00687| lr: 0.0| temp: 1.97879 | loss: 1.11248| constrast_loss: 4.38354| div_loss: 0.66381| %_mask_idx: 0.35965| ppl: 215.15898| %_neg_is_pos: 0.0121| lr: 0.0| temp: 1.97878 | loss: 1.11789| constrast_loss: 4.40834| div_loss: 0.63214| %_mask_idx: 0.34931| ppl: 235.43221| %_neg_is_pos: 0.01911| lr: 0.0| temp: 1.97878 | loss: 1.13221| constrast_loss: 4.46322| div_loss: 0.65626| %_mask_idx: 0.39004| ppl: 219.99277| %_neg_is_pos: 0.01014| lr: 0.0| temp: 1.97876 | loss: 1.12281| constrast_loss: 4.42615| div_loss: 0.65096| %_mask_idx: 0.34352| ppl: 223.38641| %_neg_is_pos: 0.00958| lr: 0.0| temp: 1.97876 | loss: 1.14094| constrast_loss: 4.50085| div_loss: 0.62925| %_mask_idx: 0.4093| ppl: 237.28061| %_neg_is_pos: 0.0046| lr: 0.0| temp: 1.97875 | loss: 1.14037| constrast_loss: 4.49842| div_loss: 0.6307| %_mask_idx: 0.35542| ppl: 236.35303| %_neg_is_pos: 0.01272| lr: 0.0| temp: 1.97875 | loss: 1.1309| constrast_loss: 4.45944| div_loss: 0.6417| %_mask_idx: 0.41776| ppl: 229.31407| %_neg_is_pos: 0.01292| lr: 0.0| temp: 1.97874 | loss: 1.12858| constrast_loss: 4.45097| div_loss: 0.63367| %_mask_idx: 0.38596| ppl: 234.44852| %_neg_is_pos: 0.00531| lr: 0.0| temp: 1.97874 | loss: 1.12542| constrast_loss: 4.4355| div_loss: 0.66183| %_mask_idx: 0.4021| ppl: 216.42981| %_neg_is_pos: 0.01067| lr: 0.0| temp: 1.97873 | loss: 1.12842| constrast_loss: 4.4483| div_loss: 0.6539| %_mask_idx: 0.38299| ppl: 221.50661| %_neg_is_pos: 0.01662| lr: 0.0| temp: 1.97873 | loss: 1.14177| constrast_loss: 4.5052| div_loss: 0.61874| %_mask_idx: 0.38033| ppl: 244.00412| %_neg_is_pos: 0.00391| lr: 0.0| temp: 1.97871 | loss: 1.12474| constrast_loss: 4.4342| div_loss: 0.64738| %_mask_idx: 0.38001| ppl: 225.67615| %_neg_is_pos: 0.0076| lr: 0.0| temp: 1.97871 | loss: 1.14553| constrast_loss: 4.51913| div_loss: 0.62994| %_mask_idx: 0.45912| ppl: 236.83929| %_neg_is_pos: 0.0119| lr: 0.0| temp: 1.9787 | loss: 1.11771| constrast_loss: 4.40848| div_loss: 0.62347| %_mask_idx: 0.39411| ppl: 240.97763| %_neg_is_pos: 0.0142| lr: 0.0| temp: 1.9787 [2021-09-02 01:52:47,800] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 01:52:47,800] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.12464| constrast_loss: 4.43394| div_loss: 0.64636| %_mask_idx: 0.35385| ppl: 226.33144| %_neg_is_pos: 0.02221| lr: 0.0| temp: 1.97868 | loss: 1.12829| constrast_loss: 4.45063| div_loss: 0.62549| %_mask_idx: 0.40038| ppl: 239.68794| %_neg_is_pos: 0.00768| lr: 0.0| temp: 1.97868 | loss: 1.14302| constrast_loss: 4.50945| div_loss: 0.62618| %_mask_idx: 0.4115| ppl: 239.24271| %_neg_is_pos: 0.00964| lr: 0.0| temp: 1.97867 | loss: 1.13013| constrast_loss: 4.45739| div_loss: 0.63121| %_mask_idx: 0.38017| ppl: 236.02477| %_neg_is_pos: 0.00735| lr: 0.0| temp: 1.97867 | loss: 1.14301| constrast_loss: 4.5126| div_loss: 0.59433| %_mask_idx: 0.39066| ppl: 259.62875| %_neg_is_pos: 0.00779| lr: 0.0| temp: 1.97866 | loss: 1.13539| constrast_loss: 4.47987| div_loss: 0.61702| %_mask_idx: 0.41165| ppl: 245.11041| %_neg_is_pos: 0.01099| lr: 0.0| temp: 1.97866 | loss: 1.13794| constrast_loss: 4.48745| div_loss: 0.6432| %_mask_idx: 0.36106| ppl: 228.35216| %_neg_is_pos: 0.01369| lr: 0.0| temp: 1.97865 | loss: 1.12137| constrast_loss: 4.41972| div_loss: 0.65771| %_mask_idx: 0.31626| ppl: 219.06329| %_neg_is_pos: 0.02031| lr: 0.0| temp: 1.97865 | loss: 1.12632| constrast_loss: 4.44121| div_loss: 0.64055| %_mask_idx: 0.36529| ppl: 230.04491| %_neg_is_pos: 0.01186| lr: 0.0| temp: 1.97863| loss: 1.13214| constrast_loss: 4.46416| div_loss: 0.64399| %_mask_idx: 0.37892| ppl: 227.8461| %_neg_is_pos: 0.01796| lr: 0.0| temp: 1.97863 | loss: 1.11963| constrast_loss: 4.41181| div_loss: 0.66692| %_mask_idx: 0.34712| ppl: 213.17361| %_neg_is_pos: 0.0184| lr: 0.0| temp: 1.97862 | loss: 1.13244| constrast_loss: 4.46882| div_loss: 0.60934| %_mask_idx: 0.40132| ppl: 250.02283| %_neg_is_pos: 0.0097| lr: 0.0| temp: 1.97862 | loss: 1.13073| constrast_loss: 4.45961| div_loss: 0.63298| %_mask_idx: 0.37077| ppl: 234.8898| %_neg_is_pos: 0.00569| lr: 0.0| temp: 1.97862 | loss: 1.13147| constrast_loss: 4.46147| div_loss: 0.64426| %_mask_idx: 0.36717| ppl: 227.6748| %_neg_is_pos: 0.00985| lr: 0.0| temp: 1.97862 | loss: 1.12943| constrast_loss: 4.45565| div_loss: 0.62057| %_mask_idx: 0.38471| ppl: 242.83398| %_neg_is_pos: 0.00953| lr: 0.0| temp: 1.97861 | loss: 1.12478| constrast_loss: 4.43654| div_loss: 0.6257| %_mask_idx: 0.37657| ppl: 239.55402| %_neg_is_pos: 0.0069| lr: 0.0| temp: 1.97861 | loss: 1.14024| constrast_loss: 4.49721| div_loss: 0.63747| %_mask_idx: 0.42841| ppl: 232.01727| %_neg_is_pos: 0.0048| lr: 0.0| temp: 1.97859 | loss: 1.14226| constrast_loss: 4.50631| div_loss: 0.62724| %_mask_idx: 0.38503| ppl: 238.56931| %_neg_is_pos: 0.00939| lr: 0.0| temp: 1.97859 | loss: 1.13216| constrast_loss: 4.46593| div_loss: 0.62723| %_mask_idx: 0.36591| ppl: 238.57494| %_neg_is_pos: 0.01079| lr: 0.0| temp: 1.97858 | loss: 1.12892| constrast_loss: 4.45287| div_loss: 0.62821| %_mask_idx: 0.42575| ppl: 237.94534| %_neg_is_pos: 0.00767| lr: 0.0| temp: 1.97858 | loss: 1.12667| constrast_loss: 4.44315| div_loss: 0.63526| %_mask_idx: 0.37907| ppl: 233.43344| %_neg_is_pos: 0.00925| lr: 0.0| temp: 1.97857 | loss: 1.14389| constrast_loss: 4.51305| div_loss: 0.62492| %_mask_idx: 0.40664| ppl: 240.05118| %_neg_is_pos: 0.00369| lr: 0.0| temp: 1.97857 | loss: 1.11875| constrast_loss: 4.41099| div_loss: 0.64002| %_mask_idx: 0.35495| ppl: 230.38463| %_neg_is_pos: 0.01596| lr: 0.0| temp: 1.97856 | loss: 1.12829| constrast_loss: 4.45032| div_loss: 0.62855| %_mask_idx: 0.39583| ppl: 237.73032| %_neg_is_pos: 0.01175| lr: 0.0| temp: 1.97856 | loss: 1.13088| constrast_loss: 4.46194| div_loss: 0.61562| %_mask_idx: 0.39411| ppl: 246.00209| %_neg_is_pos: 0.00426| lr: 0.0| temp: 1.97854 | loss: 1.11271| constrast_loss: 4.38301| div_loss: 0.67825| %_mask_idx: 0.32613| ppl: 205.92291| %_neg_is_pos: 0.01846| lr: 0.0| temp: 1.97854 | loss: 1.12783| constrast_loss: 4.44727| div_loss: 0.64045| %_mask_idx: 0.39333| ppl: 230.11407| %_neg_is_pos: 0.01751| lr: 0.0| temp: 1.97853 | loss: 1.10639| constrast_loss: 4.3581| div_loss: 0.67463| %_mask_idx: 0.3562| ppl: 208.23753| %_neg_is_pos: 0.02143| lr: 0.0| temp: 1.97853 | loss: 1.13507| constrast_loss: 4.47849| div_loss: 0.61791| %_mask_idx: 0.43515| ppl: 244.53871| %_neg_is_pos: 0.01809| lr: 0.0| temp: 1.97851 | loss: 1.11386| constrast_loss: 4.38805| div_loss: 0.67378| %_mask_idx: 0.36764| ppl: 208.77991| %_neg_is_pos: 0.02124| lr: 0.0| temp: 1.97851 | loss: 1.1075| constrast_loss: 4.36481| div_loss: 0.65175| %_mask_idx: 0.36435| ppl: 222.87955| %_neg_is_pos: 0.02291| lr: 0.0| temp: 1.9785 | loss: 1.12148| constrast_loss: 4.42316| div_loss: 0.62769| %_mask_idx: 0.36967| ppl: 238.27863| %_neg_is_pos: 0.01771| lr: 0.0| temp: 1.9785 | loss: 1.09103| constrast_loss: 4.29522| div_loss: 0.68878| %_mask_idx: 0.37187| ppl: 199.17873| %_neg_is_pos: 0.03697| lr: 0.0| temp: 1.97849 | loss: 1.11877| constrast_loss: 4.40959| div_loss: 0.65502| %_mask_idx: 0.38957| ppl: 220.78874| %_neg_is_pos: 0.03851| lr: 0.0| temp: 1.97849 | loss: 1.13187| constrast_loss: 4.46637| div_loss: 0.61112| %_mask_idx: 0.33772| ppl: 248.88055| %_neg_is_pos: 0.00322| lr: 0.0| temp: 1.97848 | loss: 1.10582| constrast_loss: 4.35635| div_loss: 0.66936| %_mask_idx: 0.31814| ppl: 211.61163| %_neg_is_pos: 0.03004| lr: 0.0| temp: 1.97848 | loss: 1.12273| constrast_loss: 4.42835| div_loss: 0.62556| %_mask_idx: 0.42293| ppl: 239.64297| %_neg_is_pos: 0.02053| lr: 0.0| temp: 1.97846 | loss: 1.13223| constrast_loss: 4.46677| div_loss: 0.62134| %_mask_idx: 0.37594| ppl: 242.34033| %_neg_is_pos: 0.01977| lr: 0.0| temp: 1.97846 | loss: 1.11832| constrast_loss: 4.40952| div_loss: 0.63765| %_mask_idx: 0.39442| ppl: 231.9054| %_neg_is_pos: 0.01494| lr: 0.0| temp: 1.97845 | loss: 1.13939| constrast_loss: 4.49552| div_loss: 0.62061| %_mask_idx: 0.45457| ppl: 242.80692| %_neg_is_pos: 0.01228| lr: 0.0| temp: 1.97845 | loss: 1.14075| constrast_loss: 4.49978| div_loss: 0.63223| %_mask_idx: 0.4187| ppl: 235.37094| %_neg_is_pos: 0.01124| lr: 0.0| temp: 1.97844 | loss: 1.11152| constrast_loss: 4.38115| div_loss: 0.64925| %_mask_idx: 0.3537| ppl: 224.48248| %_neg_is_pos: 0.04781| lr: 0.0| temp: 1.97844 | loss: 1.11433| constrast_loss: 4.39119| div_loss: 0.66109| %_mask_idx: 0.40946| ppl: 216.90195| %_neg_is_pos: 0.03426| lr: 0.0| temp: 1.97843 | loss: 1.14042| constrast_loss: 4.50096| div_loss: 0.60732| %_mask_idx: 0.40351| ppl: 251.31482| %_neg_is_pos: 0.01037| lr: 0.0| temp: 1.97843 | loss: 1.1322| constrast_loss: 4.46661| div_loss: 0.62182| %_mask_idx: 0.39959| ppl: 242.03441| %_neg_is_pos: 0.00856| lr: 0.0| temp: 1.97841 | loss: 1.10711| constrast_loss: 4.36254| div_loss: 0.65896| %_mask_idx: 0.37766| ppl: 218.26666| %_neg_is_pos: 0.02936| lr: 0.0| temp: 1.97841 | loss: 1.12564| constrast_loss: 4.43724| div_loss: 0.65325| %_mask_idx: 0.43578| ppl: 221.92035| %_neg_is_pos: 0.01645| lr: 0.0| temp: 1.9784 | loss: 1.11282| constrast_loss: 4.38426| div_loss: 0.67023| %_mask_idx: 0.38409| ppl: 211.05496| %_neg_is_pos: 0.01975| lr: 0.0| temp: 1.9784 | loss: 1.11066| constrast_loss: 4.37339| div_loss: 0.69234| %_mask_idx: 0.38941| ppl: 196.90018| %_neg_is_pos: 0.03031| lr: 0.0| temp: 1.97839 | loss: 1.12585| constrast_loss: 4.4402| div_loss: 0.63188| %_mask_idx: 0.39818| ppl: 235.5988| %_neg_is_pos: 0.00887| lr: 0.0| temp: 1.97839 | loss: 1.10903| constrast_loss: 4.36843| div_loss: 0.67694| %_mask_idx: 0.38001| ppl: 206.7594| %_neg_is_pos: 0.0179| lr: 0.0| temp: 1.97838 | loss: 1.12451| constrast_loss: 4.43432| div_loss: 0.63712| %_mask_idx: 0.4494| ppl: 232.24031| %_neg_is_pos: 0.01396| lr: 0.0| temp: 1.97838 | loss: 1.12565| constrast_loss: 4.43717| div_loss: 0.65444| %_mask_idx: 0.36153| ppl: 221.15594| %_neg_is_pos: 0.01797| lr: 0.0| temp: 1.97836 | loss: 1.11047| constrast_loss: 4.37468| div_loss: 0.67206| %_mask_idx: 0.375| ppl: 209.88432| %_neg_is_pos: 0.03537| lr: 0.0| temp: 1.97836 | loss: 1.11755| constrast_loss: 4.40636| div_loss: 0.63845| %_mask_idx: 0.35432| ppl: 231.39304| %_neg_is_pos: 0.0192| lr: 0.0| temp: 1.97835 | loss: 1.11047| constrast_loss: 4.3768| div_loss: 0.651| %_mask_idx: 0.38268| ppl: 223.35699| %_neg_is_pos: 0.02353| lr: 0.0| temp: 1.97835 | loss: 1.1237| constrast_loss: 4.42944| div_loss: 0.65361| %_mask_idx: 0.38518| ppl: 221.6926| %_neg_is_pos: 0.028| lr: 0.0| temp: 1.97833 | loss: 1.11997| constrast_loss: 4.41644| div_loss: 0.6344| %_mask_idx: 0.36685| ppl: 233.9866| %_neg_is_pos: 0.02638| lr: 0.0| temp: 1.97833 | loss: 1.13848| constrast_loss: 4.49284| div_loss: 0.61084| %_mask_idx: 0.37061| ppl: 249.0618| %_neg_is_pos: 0.00928| lr: 0.0| temp: 1.97832 | loss: 1.11709| constrast_loss: 4.40294| div_loss: 0.6542| %_mask_idx: 0.38001| ppl: 221.31348| %_neg_is_pos: 0.03082| lr: 0.0| temp: 1.97832 | loss: 1.14071| constrast_loss: 4.50076| div_loss: 0.62088| %_mask_idx: 0.36513| ppl: 242.63785| %_neg_is_pos: 0.01871| lr: 0.0| temp: 1.97831 | loss: 1.12666| constrast_loss: 4.44511| div_loss: 0.61518| %_mask_idx: 0.39427| ppl: 246.28723| %_neg_is_pos: 0.01202| lr: 0.0| temp: 1.97831 | loss: 1.13546| constrast_loss: 4.4809| div_loss: 0.60946| %_mask_idx: 0.39286| ppl: 249.94748| %_neg_is_pos: 0.02062| lr: 0.0| temp: 1.9783 | loss: 1.12663| constrast_loss: 4.44198| div_loss: 0.6453| %_mask_idx: 0.41667| ppl: 227.00555| %_neg_is_pos: 0.01201| lr: 0.0| temp: 1.9783 | loss: 1.1299| constrast_loss: 4.45946| div_loss: 0.60155| %_mask_idx: 0.37202| ppl: 255.00842| %_neg_is_pos: 0.02093| lr: 0.0| temp: 1.97828 | loss: 1.11084| constrast_loss: 4.37523| div_loss: 0.68147| %_mask_idx: 0.37688| ppl: 203.86067| %_neg_is_pos: 0.03774| lr: 0.0| temp: 1.97828 | loss: 1.12533| constrast_loss: 4.43864| div_loss: 0.62673| %_mask_idx: 0.38518| ppl: 238.8898| %_neg_is_pos: 0.00567| lr: 0.0| temp: 1.97827 | loss: 1.13152| constrast_loss: 4.46486| div_loss: 0.61207| %_mask_idx: 0.35009| ppl: 248.27484| %_neg_is_pos: 0.00839| lr: 0.0| temp: 1.97827 | loss: 1.12123| constrast_loss: 4.42081| div_loss: 0.64124| %_mask_idx: 0.41056| ppl: 229.60768| %_neg_is_pos: 0.01232| lr: 0.0| temp: 1.97826 | loss: 1.13212| constrast_loss: 4.46715| div_loss: 0.61331| %_mask_idx: 0.42591| ppl: 247.47922| %_neg_is_pos: 0.00944| lr: 0.0| temp: 1.97826 | loss: 1.12011| constrast_loss: 4.41683| div_loss: 0.63623| %_mask_idx: 0.36826| ppl: 232.81009| %_neg_is_pos: 0.01725| lr: 0.0| temp: 1.97825 | loss: 1.11094| constrast_loss: 4.37777| div_loss: 0.65971| %_mask_idx: 0.38111| ppl: 217.78705| %_neg_is_pos: 0.03166| lr: 0.0| temp: 1.97825 | loss: 1.13268| constrast_loss: 4.46799| div_loss: 0.62718| %_mask_idx: 0.39724| ppl: 238.60349| %_neg_is_pos: 0.01006| lr: 0.0| temp: 1.97823 | loss: 1.13385| constrast_loss: 4.47382| div_loss: 0.616| %_mask_idx: 0.39113| ppl: 245.75826| %_neg_is_pos: 0.01386| lr: 0.0| temp: 1.97823 | loss: 1.12535| constrast_loss: 4.43592| div_loss: 0.65477| %_mask_idx: 0.37296| ppl: 220.94652| %_neg_is_pos: 0.02149| lr: 0.0| temp: 1.97822 | loss: 1.11611| constrast_loss: 4.39813| div_loss: 0.66299| %_mask_idx: 0.40695| ppl: 215.68445| %_neg_is_pos: 0.01369| lr: 0.0| temp: 1.97822 | loss: 1.10932| constrast_loss: 4.37222| div_loss: 0.6507| %_mask_idx: 0.41056| ppl: 223.55368| %_neg_is_pos: 0.03539| lr: 0.0| temp: 1.97821 | loss: 1.12698| constrast_loss: 4.44319| div_loss: 0.64736| %_mask_idx: 0.38549| ppl: 225.68692| %_neg_is_pos: 0.01941| lr: 0.0| temp: 1.97821 | loss: 1.14226| constrast_loss: 4.50699| div_loss: 0.62062| %_mask_idx: 0.38847| ppl: 242.80084| %_neg_is_pos: 0.00937| lr: 0.0| temp: 1.9782 | loss: 1.1238| constrast_loss: 4.43181| div_loss: 0.63388| %_mask_idx: 0.33631| ppl: 234.3147| %_neg_is_pos: 0.02872| lr: 0.0| temp: 1.9782 | loss: 1.12075| constrast_loss: 4.41608| div_loss: 0.66922| %_mask_idx: 0.36873| ppl: 211.7016| %_neg_is_pos: 0.02135| lr: 0.0| temp: 1.97818 | loss: 1.11527| constrast_loss: 4.39316| div_loss: 0.67917| %_mask_idx: 0.35401| ppl: 205.32878| %_neg_is_pos: 0.01625| lr: 0.0| temp: 1.97818 | loss: 1.13043| constrast_loss: 4.4611| div_loss: 0.60615| %_mask_idx: 0.41761| ppl: 252.06717| %_neg_is_pos: 0.02325| lr: 0.0| temp: 1.97817 | loss: 1.11087| constrast_loss: 4.37711| div_loss: 0.66376| %_mask_idx: 0.38878| ppl: 215.19629| %_neg_is_pos: 0.03178| lr: 0.0| temp: 1.97817 | loss: 1.12727| constrast_loss: 4.44803| div_loss: 0.61036| %_mask_idx: 0.37594| ppl: 249.36642| %_neg_is_pos: 0.00994| lr: 0.0| temp: 1.97815 | loss: 1.12306| constrast_loss: 4.42732| div_loss: 0.64905| %_mask_idx: 0.42199| ppl: 224.60947| %_neg_is_pos: 0.01762| lr: 0.0| temp: 1.97815 | loss: 1.11896| constrast_loss: 4.40878| div_loss: 0.67057| %_mask_idx: 0.4162| ppl: 210.8364| %_neg_is_pos: 0.02449| lr: 0.0| temp: 1.97814 | loss: 1.12183| constrast_loss: 4.42286| div_loss: 0.64474| %_mask_idx: 0.41165| ppl: 227.36362| %_neg_is_pos: 0.02159| lr: 0.0| temp: 1.97814 | loss: 1.11739| constrast_loss: 4.4027| div_loss: 0.66842| %_mask_idx: 0.34539| ppl: 212.20992| %_neg_is_pos: 0.02728| lr: 0.0| temp: 1.97813 | loss: 1.12693| constrast_loss: 4.44391| div_loss: 0.63812| %_mask_idx: 0.4057| ppl: 231.6037| %_neg_is_pos: 0.02033| lr: 0.0| temp: 1.97813 | loss: 1.1299| constrast_loss: 4.45582| div_loss: 0.63774| %_mask_idx: 0.44612| ppl: 231.84361| %_neg_is_pos: 0.0066| lr: 0.0| temp: 1.97812 | loss: 1.1167| constrast_loss: 4.39913| div_loss: 0.67676| %_mask_idx: 0.39395| ppl: 206.87456| %_neg_is_pos: 0.01532| lr: 0.0| temp: 1.97812 | loss: 1.12362| constrast_loss: 4.42887| div_loss: 0.65596| %_mask_idx: 0.39489| ppl: 220.18384| %_neg_is_pos: 0.0277| lr: 0.0| temp: 1.9781 | loss: 1.12848| constrast_loss: 4.45317| div_loss: 0.60763| %_mask_idx: 0.34539| ppl: 251.11711| %_neg_is_pos: 0.0145| lr: 0.0| temp: 1.9781 | loss: 1.11136| constrast_loss: 4.38069| div_loss: 0.64768| %_mask_idx: 0.39301| ppl: 225.4819| %_neg_is_pos: 0.02131| lr: 0.0| temp: 1.97809 | loss: 1.13605| constrast_loss: 4.48177| div_loss: 0.62435| %_mask_idx: 0.41557| ppl: 240.41516| %_neg_is_pos: 0.00583| lr: 0.0| temp: 1.97809 | loss: 1.10662| constrast_loss: 4.36011| div_loss: 0.66366| %_mask_idx: 0.37406| ppl: 215.25867| %_neg_is_pos: 0.03521| lr: 0.0| temp: 1.97808 | loss: 1.11856| constrast_loss: 4.40771| div_loss: 0.6652| %_mask_idx: 0.38957| ppl: 214.27435| %_neg_is_pos: 0.02585| lr: 0.0| temp: 1.97808 | loss: 1.1128| constrast_loss: 4.38416| div_loss: 0.67057| %_mask_idx: 0.40382| ppl: 210.83406| %_neg_is_pos: 0.02171| lr: 0.0| temp: 1.97807 | loss: 1.12442| constrast_loss: 4.43379| div_loss: 0.63885| %_mask_idx: 0.4021| ppl: 231.13901| %_neg_is_pos: 0.01396| lr: 0.0| temp: 1.97807 | loss: 1.11902| constrast_loss: 4.41032| div_loss: 0.65744| %_mask_idx: 0.40727| ppl: 219.23929| %_neg_is_pos: 0.0183| lr: 0.0| temp: 1.97805 | loss: 1.13476| constrast_loss: 4.47689| div_loss: 0.62146| %_mask_idx: 0.39082| ppl: 242.26343| %_neg_is_pos: 0.02009| lr: 0.0| temp: 1.97805 | loss: 1.10105| constrast_loss: 4.33501| div_loss: 0.69168| %_mask_idx: 0.35417| ppl: 197.32503| %_neg_is_pos: 0.03061| lr: 0.0| temp: 1.97804 | loss: 1.11409| constrast_loss: 4.39214| div_loss: 0.64245| %_mask_idx: 0.38722| ppl: 228.83405| %_neg_is_pos: 0.0133| lr: 0.0| temp: 1.97804 | loss: 1.13341| constrast_loss: 4.47214| div_loss: 0.615| %_mask_idx: 0.40711| ppl: 246.39801| %_neg_is_pos: 0.00556| lr: 0.0| temp: 1.97803 | loss: 1.13743| constrast_loss: 4.48736| div_loss: 0.62341| %_mask_idx: 0.34289| ppl: 241.0192| %_neg_is_pos: 0.0097| lr: 0.0| temp: 1.97803 | loss: 1.11872| constrast_loss: 4.41165| div_loss: 0.63222| %_mask_idx: 0.36247| ppl: 235.37814| %_neg_is_pos: 0.03582| lr: 0.0| temp: 1.97802 | loss: 1.11224| constrast_loss: 4.38347| div_loss: 0.65473| %_mask_idx: 0.36607| ppl: 220.9711| %_neg_is_pos: 0.04416| lr: 0.0| temp: 1.97802 | loss: 1.14172| constrast_loss: 4.5052| div_loss: 0.6168| %_mask_idx: 0.4093| ppl: 245.24728| %_neg_is_pos: 0.01124| lr: 0.0| temp: 1.978 | loss: 1.12135| constrast_loss: 4.42086| div_loss: 0.64542| %_mask_idx: 0.37751| ppl: 226.931| %_neg_is_pos: 0.01627| lr: 0.0| temp: 1.978 | loss: 1.12643| constrast_loss: 4.44217| div_loss: 0.63544| %_mask_idx: 0.39223| ppl: 233.3201| %_neg_is_pos: 0.01672| lr: 0.0| temp: 1.97799 | loss: 1.12844| constrast_loss: 4.45154| div_loss: 0.62231| %_mask_idx: 0.42011| ppl: 241.72253| %_neg_is_pos: 0.01601| lr: 0.0| temp: 1.97799 [2021-09-02 02:02:01,824] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 02:02:01,824] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.12107| constrast_loss: 4.41994| div_loss: 0.64343| %_mask_idx: 0.37077| ppl: 228.20282| %_neg_is_pos: 0.01757| lr: 0.0| temp: 1.97797 | loss: 1.11584| constrast_loss: 4.39754| div_loss: 0.65811| %_mask_idx: 0.38534| ppl: 218.80875| %_neg_is_pos: 0.02352| lr: 0.0| temp: 1.97797 | loss: 1.11343| constrast_loss: 4.38806| div_loss: 0.65658| %_mask_idx: 0.3844| ppl: 219.78851| %_neg_is_pos: 0.02948| lr: 0.0| temp: 1.97796 | loss: 1.12469| constrast_loss: 4.43447| div_loss: 0.64296| %_mask_idx: 0.40226| ppl: 228.50662| %_neg_is_pos: 0.01648| lr: 0.0| temp: 1.97796 | loss: 1.1275| constrast_loss: 4.44482| div_loss: 0.65161| %_mask_idx: 0.40899| ppl: 222.96736| %_neg_is_pos: 0.01758| lr: 0.0| temp: 1.97795 | loss: 1.12033| constrast_loss: 4.41422| div_loss: 0.67103| %_mask_idx: 0.40085| ppl: 210.53862| %_neg_is_pos: 0.0308| lr: 0.0| temp: 1.97795 | loss: 1.12388| constrast_loss: 4.43387| div_loss: 0.6167| %_mask_idx: 0.33584| ppl: 245.31039| %_neg_is_pos: 0.02286| lr: 0.0| temp: 1.97794 | loss: 1.12801| constrast_loss: 4.44788| div_loss: 0.64138| %_mask_idx: 0.37829| ppl: 229.51828| %_neg_is_pos: 0.03188| lr: 0.0| temp: 1.97794 | loss: 1.11937| constrast_loss: 4.41043| div_loss: 0.67046| %_mask_idx: 0.34884| ppl: 210.90245| %_neg_is_pos: 0.02548| lr: 0.0| temp: 1.97792 | loss: 1.12786| constrast_loss: 4.4471| div_loss: 0.64346| %_mask_idx: 0.39364| ppl: 228.18291| %_neg_is_pos: 0.01275| lr: 0.0| temp: 1.97792 | loss: 1.12492| constrast_loss: 4.43629| div_loss: 0.63388| %_mask_idx: 0.36685| ppl: 234.31433| %_neg_is_pos: 0.02757| lr: 0.0| temp: 1.97791 | loss: 1.13203| constrast_loss: 4.46561| div_loss: 0.62493| %_mask_idx: 0.41761| ppl: 240.04446| %_neg_is_pos: 0.02294| lr: 0.0| temp: 1.97791 | loss: 1.11819| constrast_loss: 4.40557| div_loss: 0.67174| %_mask_idx: 0.3808| ppl: 210.08417| %_neg_is_pos: 0.02646| lr: 0.0| temp: 1.9779 | loss: 1.11955| constrast_loss: 4.41363| div_loss: 0.64568| %_mask_idx: 0.38503| ppl: 226.76256| %_neg_is_pos: 0.0335| lr: 0.0| temp: 1.9779 | loss: 1.1223| constrast_loss: 4.42551| div_loss: 0.6368| %_mask_idx: 0.41103| ppl: 232.4501| %_neg_is_pos: 0.02666| lr: 0.0| temp: 1.97789 | loss: 1.13505| constrast_loss: 4.47734| div_loss: 0.62866| %_mask_idx: 0.46413| ppl: 237.65979| %_neg_is_pos: 0.01394| lr: 0.0| temp: 1.97789 | loss: 1.13374| constrast_loss: 4.47137| div_loss: 0.63566| %_mask_idx: 0.41698| ppl: 233.17592| %_neg_is_pos: 0.01357| lr: 0.0| temp: 1.97787 | loss: 1.14431| constrast_loss: 4.51677| div_loss: 0.60465| %_mask_idx: 0.36576| ppl: 253.0256| %_neg_is_pos: 0.01314| lr: 0.0| temp: 1.97787 | loss: 1.12436| constrast_loss: 4.43336| div_loss: 0.64067| %_mask_idx: 0.35949| ppl: 229.96953| %_neg_is_pos: 0.03121| lr: 0.0| temp: 1.97786 | loss: 1.13796| constrast_loss: 4.48999| div_loss: 0.61841| %_mask_idx: 0.38878| ppl: 244.21724| %_neg_is_pos: 0.02015| lr: 0.0| temp: 1.97786 | loss: 1.13332| constrast_loss: 4.46854| div_loss: 0.64736| %_mask_idx: 0.40257| ppl: 225.69028| %_neg_is_pos: 0.01291| lr: 0.0| temp: 1.97785 | loss: 1.12558| constrast_loss: 4.43679| div_loss: 0.65551| %_mask_idx: 0.36106| ppl: 220.47348| %_neg_is_pos: 0.02482| lr: 0.0| temp: 1.97785 | loss: 1.11422| constrast_loss: 4.39344| div_loss: 0.63446| %_mask_idx: 0.41212| ppl: 233.94833| %_neg_is_pos: 0.01747| lr: 0.0| temp: 1.97784 | loss: 1.10372| constrast_loss: 4.34722| div_loss: 0.6766| %_mask_idx: 0.34665| ppl: 206.97537| %_neg_is_pos: 0.02591| lr: 0.0| temp: 1.97784 | loss: 1.11713| constrast_loss: 4.4037| div_loss: 0.64819| %_mask_idx: 0.39286| ppl: 225.15573| %_neg_is_pos: 0.00581| lr: 0.0| temp: 1.97782 | loss: 1.12619| constrast_loss: 4.44174| div_loss: 0.63009| %_mask_idx: 0.39427| ppl: 236.73943| %_neg_is_pos: 0.02275| lr: 0.0| temp: 1.97782 | loss: 1.12552| constrast_loss: 4.43929| div_loss: 0.62795| %_mask_idx: 0.37124| ppl: 238.11331| %_neg_is_pos: 0.01456| lr: 0.0| temp: 1.97781 | loss: 1.12196| constrast_loss: 4.42269| div_loss: 0.65141| %_mask_idx: 0.37594| ppl: 223.0961| %_neg_is_pos: 0.0286| lr: 0.0| temp: 1.97781 | loss: 1.12478| constrast_loss: 4.43466| div_loss: 0.64444| %_mask_idx: 0.45254| ppl: 227.56131| %_neg_is_pos: 0.01503| lr: 0.0| temp: 1.97779 | loss: 1.12379| constrast_loss: 4.43114| div_loss: 0.64007| %_mask_idx: 0.44596| ppl: 230.35532| %_neg_is_pos: 0.00712| lr: 0.0| temp: 1.97779 | loss: 1.12075| constrast_loss: 4.4187| div_loss: 0.64277| %_mask_idx: 0.36106| ppl: 228.62555| %_neg_is_pos: 0.01848| lr: 0.0| temp: 1.97778 | loss: 1.12443| constrast_loss: 4.43526| div_loss: 0.6246| %_mask_idx: 0.34085| ppl: 240.25876| %_neg_is_pos: 0.01395| lr: 0.0| temp: 1.97778 | loss: 1.12849| constrast_loss: 4.45168| div_loss: 0.62273| %_mask_idx: 0.35965| ppl: 241.45523| %_neg_is_pos: 0.00914| lr: 0.0| temp: 1.97777 | loss: 1.10705| constrast_loss: 4.36187| div_loss: 0.66329| %_mask_idx: 0.38863| ppl: 215.49286| %_neg_is_pos: 0.01952| lr: 0.0| temp: 1.97777 | loss: 1.12484| constrast_loss: 4.43555| div_loss: 0.63819| %_mask_idx: 0.36263| ppl: 231.55865| %_neg_is_pos: 0.01936| lr: 0.0| temp: 1.97776 | loss: 1.12895| constrast_loss: 4.45041| div_loss: 0.65403| %_mask_idx: 0.39881| ppl: 221.41966| %_neg_is_pos: 0.00649| lr: 0.0| temp: 1.97776 | loss: 1.14348| constrast_loss: 4.51364| div_loss: 0.60282| %_mask_idx: 0.41071| ppl: 254.19678| %_neg_is_pos: 0.00648| lr: 0.0| temp: 1.97774 | loss: 1.12704| constrast_loss: 4.44384| div_loss: 0.64304| %_mask_idx: 0.36263| ppl: 228.4539| %_neg_is_pos: 0.00942| lr: 0.0| temp: 1.97774 | loss: 1.12606| constrast_loss: 4.44196| div_loss: 0.62258| %_mask_idx: 0.40351| ppl: 241.55154| %_neg_is_pos: 0.01005| lr: 0.0| temp: 1.97773 | loss: 1.13442| constrast_loss: 4.47602| div_loss: 0.6167| %_mask_idx: 0.41103| ppl: 245.31126| %_neg_is_pos: 0.01659| lr: 0.0| temp: 1.97773 | loss: 1.11599| constrast_loss: 4.39904| div_loss: 0.64905| %_mask_idx: 0.37249| ppl: 224.61087| %_neg_is_pos: 0.02149| lr: 0.0| temp: 1.97772 | loss: 1.12495| constrast_loss: 4.4346| div_loss: 0.6519| %_mask_idx: 0.41635| ppl: 222.7834| %_neg_is_pos: 0.01041| lr: 0.0| temp: 1.97772 | loss: 1.14032| constrast_loss: 4.49693| div_loss: 0.64358| %_mask_idx: 0.39474| ppl: 228.10913| %_neg_is_pos: 0.01507| lr: 0.0| temp: 1.97772 | loss: 1.1302| constrast_loss: 4.45566| div_loss: 0.65148| %_mask_idx: 0.38988| ppl: 223.05031| %_neg_is_pos: 0.01198| lr: 0.0| temp: 1.97772 | loss: 1.13097| constrast_loss: 4.46025| div_loss: 0.63632| %_mask_idx: 0.38565| ppl: 232.75693| %_neg_is_pos: 0.00784| lr: 0.0| temp: 1.9777 | loss: 1.13723| constrast_loss: 4.48675| div_loss: 0.62161| %_mask_idx: 0.4364| ppl: 242.17191| %_neg_is_pos: 0.01178| lr: 0.0| temp: 1.9777 | loss: 1.12601| constrast_loss: 4.44091| div_loss: 0.63146| %_mask_idx: 0.37766| ppl: 235.86272| %_neg_is_pos: 0.01261| lr: 0.0| temp: 1.97769 | loss: 1.14545| constrast_loss: 4.5199| div_loss: 0.61883| %_mask_idx: 0.40006| ppl: 243.9476| %_neg_is_pos: 0.00838| lr: 0.0| temp: 1.97769 | loss: 1.12545| constrast_loss: 4.43719| div_loss: 0.64589| %_mask_idx: 0.35464| ppl: 226.63348| %_neg_is_pos: 0.02151| lr: 0.0| temp: 1.97768 | loss: 1.12083| constrast_loss: 4.41722| div_loss: 0.6611| %_mask_idx: 0.41071| ppl: 216.89337| %_neg_is_pos: 0.01894| lr: 0.0| temp: 1.97768 | loss: 1.11694| constrast_loss: 4.40361| div_loss: 0.64132| %_mask_idx: 0.39944| ppl: 229.55432| %_neg_is_pos: 0.02919| lr: 0.0| temp: 1.97767 | loss: 1.12525| constrast_loss: 4.43944| div_loss: 0.6155| %_mask_idx: 0.36466| ppl: 246.08089| %_neg_is_pos: 0.0142| lr: 0.0| temp: 1.97767 | loss: 1.13792| constrast_loss: 4.48841| div_loss: 0.63264| %_mask_idx: 0.38409| ppl: 235.10997| %_neg_is_pos: 0.01387| lr: 0.0| temp: 1.97765 | loss: 1.13202| constrast_loss: 4.46545| div_loss: 0.62642| %_mask_idx: 0.39129| ppl: 239.09232| %_neg_is_pos: 0.00429| lr: 0.0| temp: 1.97765 | loss: 1.12645| constrast_loss: 4.44017| div_loss: 0.65651| %_mask_idx: 0.36811| ppl: 219.83563| %_neg_is_pos: 0.00873| lr: 0.0| temp: 1.97764 | loss: 1.12652| constrast_loss: 4.443| div_loss: 0.63074| %_mask_idx: 0.38941| ppl: 236.32407| %_neg_is_pos: 0.01095| lr: 0.0| temp: 1.97764 | loss: 1.11114| constrast_loss: 4.37901| div_loss: 0.65567| %_mask_idx: 0.32754| ppl: 220.36823| %_neg_is_pos: 0.02638| lr: 0.0| temp: 1.97762 | loss: 1.1131| constrast_loss: 4.38611| div_loss: 0.6629| %_mask_idx: 0.36967| ppl: 215.74348| %_neg_is_pos: 0.01713| lr: 0.0| temp: 1.97762 | loss: 1.11675| constrast_loss: 4.40234| div_loss: 0.64673| %_mask_idx: 0.36576| ppl: 226.0947| %_neg_is_pos: 0.01117| lr: 0.0| temp: 1.97761 | loss: 1.13705| constrast_loss: 4.48509| div_loss: 0.6312| %_mask_idx: 0.40774| ppl: 236.03247| %_neg_is_pos: 0.01018| lr: 0.0| temp: 1.97761 | loss: 1.12211| constrast_loss: 4.42439| div_loss: 0.64062| %_mask_idx: 0.40789| ppl: 230.00018| %_neg_is_pos: 0.02612| lr: 0.0| temp: 1.9776 | loss: 1.1263| constrast_loss: 4.44256| div_loss: 0.62617| %_mask_idx: 0.4032| ppl: 239.24847| %_neg_is_pos: 0.00948| lr: 0.0| temp: 1.9776 | loss: 1.12738| constrast_loss: 4.4445| div_loss: 0.6503| %_mask_idx: 0.36012| ppl: 223.80638| %_neg_is_pos: 0.02703| lr: 0.0| temp: 1.97759 | loss: 1.13063| constrast_loss: 4.45753| div_loss: 0.64999| %_mask_idx: 0.38628| ppl: 224.00391| %_neg_is_pos: 0.00745| lr: 0.0| temp: 1.97759 | loss: 1.12554| constrast_loss: 4.43651| div_loss: 0.65637| %_mask_idx: 0.34132| ppl: 219.92059| %_neg_is_pos: 0.03561| lr: 0.0| temp: 1.97757 | loss: 1.12453| constrast_loss: 4.43426| div_loss: 0.63846| %_mask_idx: 0.35589| ppl: 231.38791| %_neg_is_pos: 0.01689| lr: 0.0| temp: 1.97757 | loss: 1.12736| constrast_loss: 4.44594| div_loss: 0.63493| %_mask_idx: 0.39991| ppl: 233.64255| %_neg_is_pos: 0.015| lr: 0.0| temp: 1.97756 | loss: 1.13728| constrast_loss: 4.48544| div_loss: 0.63678| %_mask_idx: 0.41792| ppl: 232.45795| %_neg_is_pos: 0.02299| lr: 0.0| temp: 1.97756 | loss: 1.1123| constrast_loss: 4.38457| div_loss: 0.64637| %_mask_idx: 0.39223| ppl: 226.32422| %_neg_is_pos: 0.04228| lr: 0.0| temp: 1.97755 | loss: 1.13423| constrast_loss: 4.47607| div_loss: 0.6086| %_mask_idx: 0.39019| ppl: 250.49585| %_neg_is_pos: 0.00567| lr: 0.0| temp: 1.97755 | loss: 1.11586| constrast_loss: 4.40053| div_loss: 0.62906| %_mask_idx: 0.36137| ppl: 237.39931| %_neg_is_pos: 0.01945| lr: 0.0| temp: 1.97754 | loss: 1.13245| constrast_loss: 4.46763| div_loss: 0.62156| %_mask_idx: 0.43186| ppl: 242.19931| %_neg_is_pos: 0.0066| lr: 0.0| temp: 1.97754 | loss: 1.12927| constrast_loss: 4.45176| div_loss: 0.65341| %_mask_idx: 0.41698| ppl: 221.81609| %_neg_is_pos: 0.01268| lr: 0.0| temp: 1.97752 | loss: 1.12478| constrast_loss: 4.433| div_loss: 0.66112| %_mask_idx: 0.38753| ppl: 216.8847| %_neg_is_pos: 0.02243| lr: 0.0| temp: 1.97752 | loss: 1.12287| constrast_loss: 4.42891| div_loss: 0.62575| %_mask_idx: 0.38878| ppl: 239.5174| %_neg_is_pos: 0.00935| lr: 0.0| temp: 1.97751 | loss: 1.13851| constrast_loss: 4.49065| div_loss: 0.63397| %_mask_idx: 0.37986| ppl: 234.25891| %_neg_is_pos: 0.00968| lr: 0.0| temp: 1.97751 | loss: 1.13031| constrast_loss: 4.45914| div_loss: 0.62085| %_mask_idx: 0.42043| ppl: 242.65436| %_neg_is_pos: 0.01765| lr: 0.0| temp: 1.9775 | loss: 1.133| constrast_loss: 4.46772| div_loss: 0.64287| %_mask_idx: 0.38847| ppl: 228.56464| %_neg_is_pos: 0.025| lr: 0.0| temp: 1.9775 | loss: 1.12366| constrast_loss: 4.43043| div_loss: 0.64212| %_mask_idx: 0.3631| ppl: 229.0441| %_neg_is_pos: 0.02032| lr: 0.0| temp: 1.97749 | loss: 1.11155| constrast_loss: 4.37795| div_loss: 0.68247| %_mask_idx: 0.30341| ppl: 203.21849| %_neg_is_pos: 0.04646| lr: 0.0| temp: 1.97749 | loss: 1.13559| constrast_loss: 4.47958| div_loss: 0.62788| %_mask_idx: 0.43484| ppl: 238.15671| %_neg_is_pos: 0.009| lr: 0.0| temp: 1.97747 | loss: 1.13059| constrast_loss: 4.45846| div_loss: 0.6389| %_mask_idx: 0.38471| ppl: 231.10254| %_neg_is_pos: 0.01981| lr: 0.0| temp: 1.97747 | loss: 1.12936| constrast_loss: 4.45489| div_loss: 0.62553| %_mask_idx: 0.42434| ppl: 239.65973| %_neg_is_pos: 0.00865| lr: 0.0| temp: 1.97746 | loss: 1.12364| constrast_loss: 4.43154| div_loss: 0.63033| %_mask_idx: 0.36905| ppl: 236.58704| %_neg_is_pos: 0.03788| lr: 0.0| temp: 1.97746 | loss: 1.12507| constrast_loss: 4.43718| div_loss: 0.63084| %_mask_idx: 0.43405| ppl: 236.26158| %_neg_is_pos: 0.01179| lr: 0.0| temp: 1.97744 | loss: 1.13832| constrast_loss: 4.49243| div_loss: 0.60856| %_mask_idx: 0.4516| ppl: 250.52017| %_neg_is_pos: 0.00876| lr: 0.0| temp: 1.97744 | loss: 1.12325| constrast_loss: 4.43093| div_loss: 0.62053| %_mask_idx: 0.35902| ppl: 242.86273| %_neg_is_pos: 0.01073| lr: 0.0| temp: 1.97743 | loss: 1.11783| constrast_loss: 4.40628| div_loss: 0.65032| %_mask_idx: 0.34226| ppl: 223.79749| %_neg_is_pos: 0.026| lr: 0.0| temp: 1.97743 | loss: 1.12654| constrast_loss: 4.44383| div_loss: 0.62325| %_mask_idx: 0.39912| ppl: 241.12138| %_neg_is_pos: 0.00692| lr: 0.0| temp: 1.97742 | loss: 1.1281| constrast_loss: 4.44956| div_loss: 0.62853| %_mask_idx: 0.40257| ppl: 237.74339| %_neg_is_pos: 0.00744| lr: 0.0| temp: 1.97742 | loss: 1.12708| constrast_loss: 4.44407| div_loss: 0.64233| %_mask_idx: 0.3609| ppl: 228.90836| %_neg_is_pos: 0.02465| lr: 0.0| temp: 1.97741 | loss: 1.12584| constrast_loss: 4.44048| div_loss: 0.62869| %_mask_idx: 0.38111| ppl: 237.64046| %_neg_is_pos: 0.00987| lr: 0.0| temp: 1.97741 | loss: 1.13071| constrast_loss: 4.46026| div_loss: 0.62573| %_mask_idx: 0.3396| ppl: 239.53439| %_neg_is_pos: 0.01607| lr: 0.0| temp: 1.97739 | loss: 1.12173| constrast_loss: 4.42408| div_loss: 0.62842| %_mask_idx: 0.4032| ppl: 237.8111| %_neg_is_pos: 0.01076| lr: 0.0| temp: 1.97739 | loss: 1.13025| constrast_loss: 4.4583| div_loss: 0.62683| %_mask_idx: 0.35025| ppl: 238.82608| %_neg_is_pos: 0.0045| lr: 0.0| temp: 1.97738 | loss: 1.13675| constrast_loss: 4.48613| div_loss: 0.60868| %_mask_idx: 0.40758| ppl: 250.4451| %_neg_is_pos: 0.00882| lr: 0.0| temp: 1.97738 | loss: 1.12343| constrast_loss: 4.43102| div_loss: 0.62697| %_mask_idx: 0.37375| ppl: 238.74191| %_neg_is_pos: 0.00791| lr: 0.0| temp: 1.97737 | loss: 1.12027| constrast_loss: 4.41704| div_loss: 0.64034| %_mask_idx: 0.3808| ppl: 230.18289| %_neg_is_pos: 0.02821| lr: 0.0| temp: 1.97737 | loss: 1.12574| constrast_loss: 4.44053| div_loss: 0.62442| %_mask_idx: 0.37719| ppl: 240.3728| %_neg_is_pos: 0.01567| lr: 0.0| temp: 1.97736 | loss: 1.13541| constrast_loss: 4.47736| div_loss: 0.64269| %_mask_idx: 0.3609| ppl: 228.6788| %_neg_is_pos: 0.01474| lr: 0.0| temp: 1.97736 | loss: 1.13108| constrast_loss: 4.46098| div_loss: 0.63342| %_mask_idx: 0.45144| ppl: 234.61087| %_neg_is_pos: 0.01187| lr: 0.0| temp: 1.97734 | loss: 1.14194| constrast_loss: 4.50724| div_loss: 0.60513| %_mask_idx: 0.45113| ppl: 252.71472| %_neg_is_pos: 0.00723| lr: 0.0| temp: 1.97734 | loss: 1.14168| constrast_loss: 4.50359| div_loss: 0.63144| %_mask_idx: 0.3938| ppl: 235.88091| %_neg_is_pos: 0.02772| lr: 0.0| temp: 1.97733 | loss: 1.13208| constrast_loss: 4.46295| div_loss: 0.65362| %_mask_idx: 0.34289| ppl: 221.68004| %_neg_is_pos: 0.01135| lr: 0.0| temp: 1.97733 | loss: 1.13517| constrast_loss: 4.47878| div_loss: 0.61895| %_mask_idx: 0.38174| ppl: 243.87167| %_neg_is_pos: 0.00568| lr: 0.0| temp: 1.97732 | loss: 1.12126| constrast_loss: 4.42138| div_loss: 0.63652| %_mask_idx: 0.31203| ppl: 232.62817| %_neg_is_pos: 0.02089| lr: 0.0| temp: 1.97732 | loss: 1.13255| constrast_loss: 4.46831| div_loss: 0.61871| %_mask_idx: 0.35103| ppl: 244.02248| %_neg_is_pos: 0.00476| lr: 0.0| temp: 1.97731 | loss: 1.13441| constrast_loss: 4.47682| div_loss: 0.60837| %_mask_idx: 0.39709| ppl: 250.6425| %_neg_is_pos: 0.00966| lr: 0.0| temp: 1.97731 | loss: 1.13675| constrast_loss: 4.48311| div_loss: 0.639| %_mask_idx: 0.40648| ppl: 231.03946| %_neg_is_pos: 0.00774| lr: 0.0| temp: 1.97729 | loss: 1.14313| constrast_loss: 4.51074| div_loss: 0.61799| %_mask_idx: 0.45959| ppl: 244.48734| %_neg_is_pos: 0.00419| lr: 0.0| temp: 1.97729 | loss: 1.13139| constrast_loss: 4.45905| div_loss: 0.66517| %_mask_idx: 0.35558| ppl: 214.29424| %_neg_is_pos: 0.02153| lr: 0.0| temp: 1.97728 | loss: 1.1391| constrast_loss: 4.4946| div_loss: 0.61809| %_mask_idx: 0.34179| ppl: 244.42429| %_neg_is_pos: 0.02018| lr: 0.0| temp: 1.97728 [2021-09-02 02:11:15,376] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 02:11:15,376] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.12605| constrast_loss: 4.44031| div_loss: 0.63891| %_mask_idx: 0.38424| ppl: 231.09554| %_neg_is_pos: 0.00793| lr: 0.0| temp: 1.97726 | loss: 1.12826| constrast_loss: 4.45084| div_loss: 0.62215| %_mask_idx: 0.3479| ppl: 241.82083| %_neg_is_pos: 0.01029| lr: 0.0| temp: 1.97726 | loss: 1.12888| constrast_loss: 4.45257| div_loss: 0.62955| %_mask_idx: 0.39944| ppl: 237.08841| %_neg_is_pos: 0.01051| lr: 0.0| temp: 1.97725 | loss: 1.14276| constrast_loss: 4.50909| div_loss: 0.61958| %_mask_idx: 0.39317| ppl: 243.47089| %_neg_is_pos: 0.009| lr: 0.0| temp: 1.97725 | loss: 1.13336| constrast_loss: 4.4701| div_loss: 0.63339| %_mask_idx: 0.38643| ppl: 234.62978| %_neg_is_pos: 0.005| lr: 0.0| temp: 1.97724 | loss: 1.14137| constrast_loss: 4.5031| div_loss: 0.62368| %_mask_idx: 0.43264| ppl: 240.84402| %_neg_is_pos: 0.0032| lr: 0.0| temp: 1.97724 | loss: 1.13741| constrast_loss: 4.48685| div_loss: 0.62796| %_mask_idx: 0.38957| ppl: 238.10553| %_neg_is_pos: 0.00541| lr: 0.0| temp: 1.97723 | loss: 1.13907| constrast_loss: 4.49435| div_loss: 0.61945| %_mask_idx: 0.40069| ppl: 243.55185| %_neg_is_pos: 0.00327| lr: 0.0| temp: 1.97723 | loss: 1.14197| constrast_loss: 4.50471| div_loss: 0.63169| %_mask_idx: 0.38988| ppl: 235.72096| %_neg_is_pos: 0.00349| lr: 0.0| temp: 1.97721 | loss: 1.13978| constrast_loss: 4.49691| div_loss: 0.62194| %_mask_idx: 0.40695| ppl: 241.96103| %_neg_is_pos: 0.00398| lr: 0.0| temp: 1.97721 | loss: 1.13299| constrast_loss: 4.47035| div_loss: 0.6163| %_mask_idx: 0.40147| ppl: 245.56837| %_neg_is_pos: 0.0021| lr: 0.0| temp: 1.9772 | loss: 1.13108| constrast_loss: 4.46192| div_loss: 0.62396| %_mask_idx: 0.39615| ppl: 240.66699| %_neg_is_pos: 0.00364| lr: 0.0| temp: 1.9772 | loss: 1.1298| constrast_loss: 4.45521| div_loss: 0.64003| %_mask_idx: 0.3667| ppl: 230.37775| %_neg_is_pos: 0.00569| lr: 0.0| temp: 1.97719 | loss: 1.14013| constrast_loss: 4.49586| div_loss: 0.64653| %_mask_idx: 0.43155| ppl: 226.22292| %_neg_is_pos: 0.00342| lr: 0.0| temp: 1.97719 | loss: 1.13099| constrast_loss: 4.46198| div_loss: 0.61994| %_mask_idx: 0.41792| ppl: 243.23698| %_neg_is_pos: 0.00453| lr: 0.0| temp: 1.97718 | loss: 1.14331| constrast_loss: 4.51128| div_loss: 0.61975| %_mask_idx: 0.3573| ppl: 243.3609| %_neg_is_pos: 0.00905| lr: 0.0| temp: 1.97718 | loss: 1.11939| constrast_loss: 4.41445| div_loss: 0.63094| %_mask_idx: 0.34806| ppl: 236.19702| %_neg_is_pos: 0.00668| lr: 0.0| temp: 1.97716 | loss: 1.13847| constrast_loss: 4.49194| div_loss: 0.61951| %_mask_idx: 0.40508| ppl: 243.51346| %_neg_is_pos: 0.00383| lr: 0.0| temp: 1.97716 | loss: 1.12063| constrast_loss: 4.42002| div_loss: 0.62512| %_mask_idx: 0.34524| ppl: 239.92534| %_neg_is_pos: 0.00512| lr: 0.0| temp: 1.97715 | loss: 1.12693| constrast_loss: 4.4455| div_loss: 0.62199| %_mask_idx: 0.39646| ppl: 241.92752| %_neg_is_pos: 0.00418| lr: 0.0| temp: 1.97715 | loss: 1.14507| constrast_loss: 4.51791| div_loss: 0.62374| %_mask_idx: 0.34743| ppl: 240.80554| %_neg_is_pos: 0.00486| lr: 0.0| temp: 1.97714 | loss: 1.13264| constrast_loss: 4.46688| div_loss: 0.63666| %_mask_idx: 0.4032| ppl: 232.5376| %_neg_is_pos: 0.00299| lr: 0.0| temp: 1.97714 | loss: 1.14357| constrast_loss: 4.51319| div_loss: 0.61074| %_mask_idx: 0.42528| ppl: 249.1261| %_neg_is_pos: 0.00486| lr: 0.0| temp: 1.97713 | loss: 1.12971| constrast_loss: 4.45592| div_loss: 0.62928| %_mask_idx: 0.36983| ppl: 237.25974| %_neg_is_pos: 0.00589| lr: 0.0| temp: 1.97713 | loss: 1.13191| constrast_loss: 4.46584| div_loss: 0.61802| %_mask_idx: 0.38988| ppl: 244.46658| %_neg_is_pos: 0.00459| lr: 0.0| temp: 1.97711 | loss: 1.13647| constrast_loss: 4.484| div_loss: 0.61877| %_mask_idx: 0.34962| ppl: 243.98486| %_neg_is_pos: 0.00267| lr: 0.0| temp: 1.97711 | loss: 1.1256| constrast_loss: 4.43702| div_loss: 0.65388| %_mask_idx: 0.3927| ppl: 221.51651| %_neg_is_pos: 0.00529| lr: 0.0| temp: 1.9771 | loss: 1.13248| constrast_loss: 4.46638| div_loss: 0.6355| %_mask_idx: 0.40727| ppl: 233.28299| %_neg_is_pos: 0.00408| lr: 0.0| temp: 1.9771 | loss: 1.11799| constrast_loss: 4.40833| div_loss: 0.6363| %_mask_idx: 0.36278| ppl: 232.76578| %_neg_is_pos: 0.0054| lr: 0.0| temp: 1.97708 | loss: 1.13393| constrast_loss: 4.47261| div_loss: 0.63106| %_mask_idx: 0.40273| ppl: 236.12445| %_neg_is_pos: 0.00589| lr: 0.0| temp: 1.97708 | loss: 1.12551| constrast_loss: 4.43689| div_loss: 0.65165| %_mask_idx: 0.35746| ppl: 222.94438| %_neg_is_pos: 0.0132| lr: 0.0| temp: 1.97707 | loss: 1.13586| constrast_loss: 4.48115| div_loss: 0.62303| %_mask_idx: 0.4281| ppl: 241.26016| %_neg_is_pos: 0.00456| lr: 0.0| temp: 1.97707 | loss: 1.13374| constrast_loss: 4.47084| div_loss: 0.64124| %_mask_idx: 0.38315| ppl: 229.60776| %_neg_is_pos: 0.00385| lr: 0.0| temp: 1.97706 | loss: 1.13021| constrast_loss: 4.45627| div_loss: 0.6457| %_mask_idx: 0.3584| ppl: 226.74884| %_neg_is_pos: 0.00831| lr: 0.0| temp: 1.97706 | loss: 1.12718| constrast_loss: 4.44615| div_loss: 0.62591| %_mask_idx: 0.37923| ppl: 239.41562| %_neg_is_pos: 0.00574| lr: 0.0| temp: 1.97705 | loss: 1.13539| constrast_loss: 4.47704| div_loss: 0.64531| %_mask_idx: 0.40946| ppl: 227.00191| %_neg_is_pos: 0.00589| lr: 0.0| temp: 1.97705 | loss: 1.12734| constrast_loss: 4.44574| div_loss: 0.63618| %_mask_idx: 0.31924| ppl: 232.84569| %_neg_is_pos: 0.01279| lr: 0.0| temp: 1.97703 | loss: 1.12253| constrast_loss: 4.42562| div_loss: 0.64483| %_mask_idx: 0.35683| ppl: 227.31021| %_neg_is_pos: 0.01062| lr: 0.0| temp: 1.97703 | loss: 1.12449| constrast_loss: 4.43522| div_loss: 0.62732| %_mask_idx: 0.40852| ppl: 238.51427| %_neg_is_pos: 0.00426| lr: 0.0| temp: 1.97702 | loss: 1.13279| constrast_loss: 4.46863| div_loss: 0.62521| %_mask_idx: 0.401| ppl: 239.86343| %_neg_is_pos: 0.00576| lr: 0.0| temp: 1.97702 | loss: 1.13338| constrast_loss: 4.47096| div_loss: 0.6256| %_mask_idx: 0.40727| ppl: 239.61581| %_neg_is_pos: 0.0043| lr: 0.0| temp: 1.97701 | loss: 1.13529| constrast_loss: 4.47925| div_loss: 0.61898| %_mask_idx: 0.36388| ppl: 243.85309| %_neg_is_pos: 0.00372| lr: 0.0| temp: 1.97701 | loss: 1.13683| constrast_loss: 4.48258| div_loss: 0.64734| %_mask_idx: 0.39364| ppl: 225.70441| %_neg_is_pos: 0.00898| lr: 0.0| temp: 1.977 | loss: 1.12442| constrast_loss: 4.43532| div_loss: 0.62366| %_mask_idx: 0.40226| ppl: 240.8591| %_neg_is_pos: 0.00428| lr: 0.0| temp: 1.977 | loss: 1.13794| constrast_loss: 4.49027| div_loss: 0.61484| %_mask_idx: 0.39928| ppl: 246.49963| %_neg_is_pos: 0.00908| lr: 0.0| temp: 1.97698 | loss: 1.12757| constrast_loss: 4.44662| div_loss: 0.63662| %_mask_idx: 0.37672| ppl: 232.56592| %_neg_is_pos: 0.00426| lr: 0.0| temp: 1.97698 | loss: 1.13816| constrast_loss: 4.49023| div_loss: 0.62424| %_mask_idx: 0.3584| ppl: 240.48602| %_neg_is_pos: 0.00866| lr: 0.0| temp: 1.97697 | loss: 1.12754| constrast_loss: 4.44526| div_loss: 0.64913| %_mask_idx: 0.36607| ppl: 224.55881| %_neg_is_pos: 0.00578| lr: 0.0| temp: 1.97697 | loss: 1.12834| constrast_loss: 4.44955| div_loss: 0.63797| %_mask_idx: 0.38252| ppl: 231.70023| %_neg_is_pos: 0.00355| lr: 0.0| temp: 1.97696 | loss: 1.13539| constrast_loss: 4.47977| div_loss: 0.61777| %_mask_idx: 0.401| ppl: 244.62735| %_neg_is_pos: 0.00324| lr: 0.0| temp: 1.97696 | loss: 1.12004| constrast_loss: 4.41674| div_loss: 0.63415| %_mask_idx: 0.3703| ppl: 234.14223| %_neg_is_pos: 0.00637| lr: 0.0| temp: 1.97695 | loss: 1.13261| constrast_loss: 4.4677| div_loss: 0.62727| %_mask_idx: 0.38549| ppl: 238.54938| %_neg_is_pos: 0.00388| lr: 0.0| temp: 1.97695 | loss: 1.12014| constrast_loss: 4.41747| div_loss: 0.63074| %_mask_idx: 0.36607| ppl: 236.32939| %_neg_is_pos: 0.00718| lr: 0.0| temp: 1.97693 | loss: 1.12968| constrast_loss: 4.45594| div_loss: 0.62781| %_mask_idx: 0.39333| ppl: 238.2043| %_neg_is_pos: 0.00354| lr: 0.0| temp: 1.97693 | loss: 1.13817| constrast_loss: 4.48937| div_loss: 0.63302| %_mask_idx: 0.39348| ppl: 234.86848| %_neg_is_pos: 0.00672| lr: 0.0| temp: 1.97692 | loss: 1.13995| constrast_loss: 4.49538| div_loss: 0.64432| %_mask_idx: 0.40398| ppl: 227.63788| %_neg_is_pos: 0.00862| lr: 0.0| temp: 1.97692 | loss: 1.13352| constrast_loss: 4.47124| div_loss: 0.62848| %_mask_idx: 0.40288| ppl: 237.77078| %_neg_is_pos: 0.01659| lr: 0.0| temp: 1.9769 | loss: 1.14002| constrast_loss: 4.49747| div_loss: 0.62599| %_mask_idx: 0.38706| ppl: 239.36906| %_neg_is_pos: 0.00541| lr: 0.0| temp: 1.9769 | loss: 1.1402| constrast_loss: 4.50118| div_loss: 0.59616| %_mask_idx: 0.36826| ppl: 258.45712| %_neg_is_pos: 0.00644| lr: 0.0| temp: 1.97689 | loss: 1.12606| constrast_loss: 4.44179| div_loss: 0.62435| %_mask_idx: 0.40335| ppl: 240.41663| %_neg_is_pos: 0.00708| lr: 0.0| temp: 1.97689 | loss: 1.13093| constrast_loss: 4.45999| div_loss: 0.63744| %_mask_idx: 0.38863| ppl: 232.03697| %_neg_is_pos: 0.00613| lr: 0.0| temp: 1.97688 | loss: 1.12465| constrast_loss: 4.43431| div_loss: 0.64296| %_mask_idx: 0.3891| ppl: 228.50662| %_neg_is_pos: 0.00887| lr: 0.0| temp: 1.97688 | loss: 1.12869| constrast_loss: 4.44977| div_loss: 0.64979| %_mask_idx: 0.42387| ppl: 224.13272| %_neg_is_pos: 0.00483| lr: 0.0| temp: 1.97687 | loss: 1.13399| constrast_loss: 4.47327| div_loss: 0.62696| %_mask_idx: 0.43249| ppl: 238.74478| %_neg_is_pos: 0.00431| lr: 0.0| temp: 1.97687 | loss: 1.13699| constrast_loss: 4.4854| div_loss: 0.62559| %_mask_idx: 0.41165| ppl: 239.61969| %_neg_is_pos: 0.00361| lr: 0.0| temp: 1.97685 | loss: 1.12922| constrast_loss: 4.45301| div_loss: 0.63863| %_mask_idx: 0.35182| ppl: 231.27963| %_neg_is_pos: 0.00563| lr: 0.0| temp: 1.97685 | loss: 1.13579| constrast_loss: 4.48071| div_loss: 0.62464| %_mask_idx: 0.42951| ppl: 240.22742| %_neg_is_pos: 0.00384| lr: 0.0| temp: 1.97685 | loss: 1.1304| constrast_loss: 4.4575| div_loss: 0.64093| %_mask_idx: 0.37014| ppl: 229.806| %_neg_is_pos: 0.00531| lr: 0.0| temp: 1.97685 | loss: 1.12184| constrast_loss: 4.42087| div_loss: 0.66477| %_mask_idx: 0.32268| ppl: 214.54443| %_neg_is_pos: 0.01064| lr: 0.0| temp: 1.97684 | loss: 1.13495| constrast_loss: 4.47789| div_loss: 0.61922| %_mask_idx: 0.40977| ppl: 243.7012| %_neg_is_pos: 0.00548| lr: 0.0| temp: 1.97684 | loss: 1.13044| constrast_loss: 4.45693| div_loss: 0.6482| %_mask_idx: 0.36106| ppl: 225.15283| %_neg_is_pos: 0.00499| lr: 0.0| temp: 1.97683 | loss: 1.14197| constrast_loss: 4.50698| div_loss: 0.60905| %_mask_idx: 0.40758| ppl: 250.20688| %_neg_is_pos: 0.00243| lr: 0.0| temp: 1.97683 | loss: 1.13927| constrast_loss: 4.4939| div_loss: 0.63186| %_mask_idx: 0.401| ppl: 235.60837| %_neg_is_pos: 0.01243| lr: 0.0| temp: 1.97681 | loss: 1.12672| constrast_loss: 4.4433| div_loss: 0.63582| %_mask_idx: 0.3833| ppl: 233.07248| %_neg_is_pos: 0.00476| lr: 0.0| temp: 1.97681 | loss: 1.13932| constrast_loss: 4.49575| div_loss: 0.61536| %_mask_idx: 0.4209| ppl: 246.16943| %_neg_is_pos: 0.00211| lr: 0.0| temp: 1.9768 | loss: 1.12049| constrast_loss: 4.41812| div_loss: 0.63848| %_mask_idx: 0.33866| ppl: 231.3757| %_neg_is_pos: 0.00921| lr: 0.0| temp: 1.9768 | loss: 1.12784| constrast_loss: 4.44815| div_loss: 0.63206| %_mask_idx: 0.37939| ppl: 235.48425| %_neg_is_pos: 0.00356| lr: 0.0| temp: 1.97679 | loss: 1.13172| constrast_loss: 4.46471| div_loss: 0.62162| %_mask_idx: 0.35777| ppl: 242.16122| %_neg_is_pos: 0.00332| lr: 0.0| temp: 1.97679 | loss: 1.13337| constrast_loss: 4.47013| div_loss: 0.63352| %_mask_idx: 0.39082| ppl: 234.55038| %_neg_is_pos: 0.00374| lr: 0.0| temp: 1.97678 | loss: 1.13183| constrast_loss: 4.46348| div_loss: 0.63848| %_mask_idx: 0.39129| ppl: 231.37036| %_neg_is_pos: 0.00821| lr: 0.0| temp: 1.97678 | loss: 1.13041| constrast_loss: 4.45981| div_loss: 0.61812| %_mask_idx: 0.40147| ppl: 244.4055| %_neg_is_pos: 0.00331| lr: 0.0| temp: 1.97676 | loss: 1.14145| constrast_loss: 4.50373| div_loss: 0.62064| %_mask_idx: 0.41056| ppl: 242.78909| %_neg_is_pos: 0.00952| lr: 0.0| temp: 1.97676 | loss: 1.11366| constrast_loss: 4.38928| div_loss: 0.65367| %_mask_idx: 0.41541| ppl: 221.65256| %_neg_is_pos: 0.00665| lr: 0.0| temp: 1.97675 | loss: 1.11989| constrast_loss: 4.41603| div_loss: 0.63525| %_mask_idx: 0.34336| ppl: 233.43907| %_neg_is_pos: 0.00657| lr: 0.0| temp: 1.97675 | loss: 1.12884| constrast_loss: 4.45417| div_loss: 0.6118| %_mask_idx: 0.38064| ppl: 248.44867| %_neg_is_pos: 0.00587| lr: 0.0| temp: 1.97673 | loss: 1.12385| constrast_loss: 4.42971| div_loss: 0.65685| %_mask_idx: 0.41714| ppl: 219.61797| %_neg_is_pos: 0.00827| lr: 0.0| temp: 1.97673 | loss: 1.12973| constrast_loss: 4.45517| div_loss: 0.63767| %_mask_idx: 0.44518| ppl: 231.89438| %_neg_is_pos: 0.00415| lr: 0.0| temp: 1.97672 | loss: 1.12984| constrast_loss: 4.45562| div_loss: 0.63724| %_mask_idx: 0.37516| ppl: 232.16806| %_neg_is_pos: 0.00847| lr: 0.0| temp: 1.97672 | loss: 1.13338| constrast_loss: 4.46868| div_loss: 0.64839| %_mask_idx: 0.39411| ppl: 225.02814| %_neg_is_pos: 0.00767| lr: 0.0| temp: 1.97671 | loss: 1.13249| constrast_loss: 4.46615| div_loss: 0.63795| %_mask_idx: 0.37202| ppl: 231.7128| %_neg_is_pos: 0.00561| lr: 0.0| temp: 1.97671 | loss: 1.13814| constrast_loss: 4.48873| div_loss: 0.6382| %_mask_idx: 0.3808| ppl: 231.55145| %_neg_is_pos: 0.00293| lr: 0.0| temp: 1.9767 | loss: 1.12533| constrast_loss: 4.43722| div_loss: 0.64113| %_mask_idx: 0.38111| ppl: 229.6796| %_neg_is_pos: 0.0073| lr: 0.0| temp: 1.9767 | loss: 1.13386| constrast_loss: 4.47347| div_loss: 0.61951| %_mask_idx: 0.42888| ppl: 243.51508| %_neg_is_pos: 0.00264| lr: 0.0| temp: 1.97668 | loss: 1.13272| constrast_loss: 4.46795| div_loss: 0.62916| %_mask_idx: 0.41291| ppl: 237.33658| %_neg_is_pos: 0.00463| lr: 0.0| temp: 1.97668 | loss: 1.13422| constrast_loss: 4.47459| div_loss: 0.62289| %_mask_idx: 0.40132| ppl: 241.35031| %_neg_is_pos: 0.00768| lr: 0.0| temp: 1.97667 | loss: 1.13485| constrast_loss: 4.47689| div_loss: 0.6252| %_mask_idx: 0.40414| ppl: 239.87202| %_neg_is_pos: 0.0039| lr: 0.0| temp: 1.97667 | loss: 1.13712| constrast_loss: 4.48756| div_loss: 0.60919| %_mask_idx: 0.39474| ppl: 250.11522| %_neg_is_pos: 0.00358| lr: 0.0| temp: 1.97666 | loss: 1.13739| constrast_loss: 4.48708| div_loss: 0.62466| %_mask_idx: 0.43437| ppl: 240.21631| %_neg_is_pos: 0.00152| lr: 0.0| temp: 1.97666 | loss: 1.12854| constrast_loss: 4.44798| div_loss: 0.66195| %_mask_idx: 0.39693| ppl: 216.35338| %_neg_is_pos: 0.00421| lr: 0.0| temp: 1.97665 | loss: 1.13445| constrast_loss: 4.4721| div_loss: 0.65699| %_mask_idx: 0.3833| ppl: 219.52652| %_neg_is_pos: 0.00846| lr: 0.0| temp: 1.97665 | loss: 1.14344| constrast_loss: 4.51222| div_loss: 0.61545| %_mask_idx: 0.39881| ppl: 246.11469| %_neg_is_pos: 0.00491| lr: 0.0| temp: 1.97663 | loss: 1.1302| constrast_loss: 4.45682| div_loss: 0.63971| %_mask_idx: 0.38456| ppl: 230.58624| %_neg_is_pos: 0.00572| lr: 0.0| temp: 1.97663 | loss: 1.12385| constrast_loss: 4.43167| div_loss: 0.63722| %_mask_idx: 0.39458| ppl: 232.18124| %_neg_is_pos: 0.00391| lr: 0.0| temp: 1.97662 | loss: 1.12978| constrast_loss: 4.45567| div_loss: 0.63473| %_mask_idx: 0.40523| ppl: 233.77563| %_neg_is_pos: 0.0039| lr: 0.0| temp: 1.97662 | loss: 1.13543| constrast_loss: 4.47892| div_loss: 0.62818| %_mask_idx: 0.42184| ppl: 237.96588| %_neg_is_pos: 0.00341| lr: 0.0| temp: 1.97661 | loss: 1.12226| constrast_loss: 4.42248| div_loss: 0.66544| %_mask_idx: 0.3714| ppl: 214.121| %_neg_is_pos: 0.00781| lr: 0.0| temp: 1.97661 | loss: 1.12393| constrast_loss: 4.43191| div_loss: 0.63808| %_mask_idx: 0.41745| ppl: 231.62891| %_neg_is_pos: 0.0081| lr: 0.0| temp: 1.9766 | loss: 1.13252| constrast_loss: 4.46565| div_loss: 0.64446| %_mask_idx: 0.38189| ppl: 227.5433| %_neg_is_pos: 0.00496| lr: 0.0| temp: 1.9766 | loss: 1.12453| constrast_loss: 4.43645| div_loss: 0.61685| %_mask_idx: 0.40367| ppl: 245.21848| %_neg_is_pos: 0.00519| lr: 0.0| temp: 1.97658 | loss: 1.1343| constrast_loss: 4.47548| div_loss: 0.61729| %_mask_idx: 0.34712| ppl: 244.93265| %_neg_is_pos: 0.00684| lr: 0.0| temp: 1.97658 | loss: 1.13143| constrast_loss: 4.46368| div_loss: 0.62059| %_mask_idx: 0.40836| ppl: 242.82443| %_neg_is_pos: 0.00449| lr: 0.0| temp: 1.97657 | loss: 1.13151| constrast_loss: 4.46239| div_loss: 0.63664| %_mask_idx: 0.38863| ppl: 232.54968| %_neg_is_pos: 0.00534| lr: 0.0| temp: 1.97657 [2021-09-02 02:20:30,399] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 02:20:30,400] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.12742| constrast_loss: 4.44681| div_loss: 0.62865| %_mask_idx: 0.34555| ppl: 237.66507| %_neg_is_pos: 0.00724| lr: 0.0| temp: 1.97655 | loss: 1.12222| constrast_loss: 4.42437| div_loss: 0.6452| %_mask_idx: 0.42121| ppl: 227.06888| %_neg_is_pos: 0.00908| lr: 0.0| temp: 1.97655 | loss: 1.13933| constrast_loss: 4.49368| div_loss: 0.63652| %_mask_idx: 0.41369| ppl: 232.62781| %_neg_is_pos: 0.00724| lr: 0.0| temp: 1.97654 | loss: 1.13539| constrast_loss: 4.47768| div_loss: 0.63892| %_mask_idx: 0.40382| ppl: 231.08942| %_neg_is_pos: 0.00698| lr: 0.0| temp: 1.97654 | loss: 1.13311| constrast_loss: 4.47109| div_loss: 0.6135| %_mask_idx: 0.39145| ppl: 247.35834| %_neg_is_pos: 0.00582| lr: 0.0| temp: 1.97653 | loss: 1.13679| constrast_loss: 4.48381| div_loss: 0.6333| %_mask_idx: 0.34117| ppl: 234.68619| %_neg_is_pos: 0.01815| lr: 0.0| temp: 1.97653 | loss: 1.12426| constrast_loss: 4.43278| div_loss: 0.64258| %_mask_idx: 0.41996| ppl: 228.74921| %_neg_is_pos: 0.01468| lr: 0.0| temp: 1.97652 | loss: 1.13429| constrast_loss: 4.47475| div_loss: 0.62401| %_mask_idx: 0.39317| ppl: 240.63489| %_neg_is_pos: 0.0076| lr: 0.0| temp: 1.97652 | loss: 1.12788| constrast_loss: 4.44628| div_loss: 0.65237| %_mask_idx: 0.33192| ppl: 222.48053| %_neg_is_pos: 0.02409| lr: 0.0| temp: 1.9765 | loss: 1.13034| constrast_loss: 4.45998| div_loss: 0.61374| %_mask_idx: 0.42513| ppl: 247.20767| %_neg_is_pos: 0.00676| lr: 0.0| temp: 1.9765 | loss: 1.13556| constrast_loss: 4.48162| div_loss: 0.60639| %_mask_idx: 0.38831| ppl: 251.91331| %_neg_is_pos: 0.01299| lr: 0.0| temp: 1.97649 | loss: 1.13435| constrast_loss: 4.47602| div_loss: 0.61391| %_mask_idx: 0.45363| ppl: 247.09824| %_neg_is_pos: 0.00148| lr: 0.0| temp: 1.97649 | loss: 1.124| constrast_loss: 4.43255| div_loss: 0.63449| %_mask_idx: 0.41667| ppl: 233.92776| %_neg_is_pos: 0.00539| lr: 0.0| temp: 1.97648 | loss: 1.13238| constrast_loss: 4.46685| div_loss: 0.6268| %_mask_idx: 0.41729| ppl: 238.85022| %_neg_is_pos: 0.00424| lr: 0.0| temp: 1.97648 | loss: 1.12348| constrast_loss: 4.42885| div_loss: 0.65055| %_mask_idx: 0.33694| ppl: 223.64806| %_neg_is_pos: 0.00926| lr: 0.0| temp: 1.97647 | loss: 1.12767| constrast_loss: 4.44674| div_loss: 0.63941| %_mask_idx: 0.35244| ppl: 230.77792| %_neg_is_pos: 0.00506| lr: 0.0| temp: 1.97647 | loss: 1.13993| constrast_loss: 4.49594| div_loss: 0.63767| %_mask_idx: 0.40523| ppl: 231.88826| %_neg_is_pos: 0.00415| lr: 0.0| temp: 1.97645| loss: 1.12882| constrast_loss: 4.45137| div_loss: 0.63927| %_mask_idx: 0.39944| ppl: 230.86765| %_neg_is_pos: 0.00661| lr: 0.0| temp: 1.97645 | loss: 1.12843| constrast_loss: 4.44961| div_loss: 0.64128| %_mask_idx: 0.40727| ppl: 229.57877| %_neg_is_pos: 0.0038| lr: 0.0| temp: 1.97644 | loss: 1.13834| constrast_loss: 4.49241| div_loss: 0.60939| %_mask_idx: 0.37296| ppl: 249.99069| %_neg_is_pos: 0.00259| lr: 0.0| temp: 1.97644 | loss: 1.13593| constrast_loss: 4.48023| div_loss: 0.63479| %_mask_idx: 0.4151| ppl: 233.73296| %_neg_is_pos: 0.00302| lr: 0.0| temp: 1.97643 | loss: 1.13526| constrast_loss: 4.47702| div_loss: 0.64022| %_mask_idx: 0.38142| ppl: 230.25681| %_neg_is_pos: 0.00473| lr: 0.0| temp: 1.97643 | loss: 1.13552| constrast_loss: 4.47648| div_loss: 0.65583| %_mask_idx: 0.42199| ppl: 220.27002| %_neg_is_pos: 0.00348| lr: 0.0| temp: 1.97642 | loss: 1.13826| constrast_loss: 4.49046| div_loss: 0.62571| %_mask_idx: 0.43405| ppl: 239.54718| %_neg_is_pos: 0.00333| lr: 0.0| temp: 1.97642 | loss: 1.11972| constrast_loss: 4.41342| div_loss: 0.65442| %_mask_idx: 0.34258| ppl: 221.17049| %_neg_is_pos: 0.00619| lr: 0.0| temp: 1.9764 | loss: 1.14022| constrast_loss: 4.49848| div_loss: 0.62385| %_mask_idx: 0.38377| ppl: 240.73511| %_neg_is_pos: 0.0042| lr: 0.0| temp: 1.9764 | loss: 1.13023| constrast_loss: 4.45688| div_loss: 0.64045| %_mask_idx: 0.39458| ppl: 230.11508| %_neg_is_pos: 0.00429| lr: 0.0| temp: 1.97639 | loss: 1.13734| constrast_loss: 4.48701| div_loss: 0.6234| %_mask_idx: 0.39865| ppl: 241.02257| %_neg_is_pos: 0.0032| lr: 0.0| temp: 1.97639 | loss: 1.12866| constrast_loss: 4.45231| div_loss: 0.62322| %_mask_idx: 0.36169| ppl: 241.1368| %_neg_is_pos: 0.00415| lr: 0.0| temp: 1.97637 | loss: 1.13247| constrast_loss: 4.46687| div_loss: 0.63022| %_mask_idx: 0.39865| ppl: 236.66028| %_neg_is_pos: 0.00288| lr: 0.0| temp: 1.97637 | loss: 1.12196| constrast_loss: 4.42102| div_loss: 0.66813| %_mask_idx: 0.37249| ppl: 212.3983| %_neg_is_pos: 0.00507| lr: 0.0| temp: 1.97636 | loss: 1.14064| constrast_loss: 4.50074| div_loss: 0.61822| %_mask_idx: 0.42356| ppl: 244.34167| %_neg_is_pos: 0.00325| lr: 0.0| temp: 1.97636 | loss: 1.13313| constrast_loss: 4.46818| div_loss: 0.64353| %_mask_idx: 0.38925| ppl: 228.14047| %_neg_is_pos: 0.00243| lr: 0.0| temp: 1.97635 | loss: 1.11605| constrast_loss: 4.39862| div_loss: 0.65594| %_mask_idx: 0.37343| ppl: 220.1985| %_neg_is_pos: 0.00614| lr: 0.0| temp: 1.97635 | loss: 1.13753| constrast_loss: 4.49001| div_loss: 0.60094| %_mask_idx: 0.43029| ppl: 255.40012| %_neg_is_pos: 0.00232| lr: 0.0| temp: 1.97634 | loss: 1.13992| constrast_loss: 4.49668| div_loss: 0.62999| %_mask_idx: 0.38142| ppl: 236.80728| %_neg_is_pos: 0.00363| lr: 0.0| temp: 1.97634 | loss: 1.13646| constrast_loss: 4.48463| div_loss: 0.61228| %_mask_idx: 0.41698| ppl: 248.1402| %_neg_is_pos: 0.00292| lr: 0.0| temp: 1.97632 | loss: 1.14486| constrast_loss: 4.51675| div_loss: 0.62676| %_mask_idx: 0.35464| ppl: 238.87303| %_neg_is_pos: 0.00451| lr: 0.0| temp: 1.97632 | loss: 1.13456| constrast_loss: 4.47583| div_loss: 0.62396| %_mask_idx: 0.37813| ppl: 240.66269| %_neg_is_pos: 0.00251| lr: 0.0| temp: 1.97631 | loss: 1.13756| constrast_loss: 4.48917| div_loss: 0.61068| %_mask_idx: 0.35652| ppl: 249.16638| %_neg_is_pos: 0.00246| lr: 0.0| temp: 1.97631 | loss: 1.12851| constrast_loss: 4.45137| div_loss: 0.62665| %_mask_idx: 0.37876| ppl: 238.94324| %_neg_is_pos: 0.00377| lr: 0.0| temp: 1.9763 | loss: 1.13292| constrast_loss: 4.46856| div_loss: 0.63123| %_mask_idx: 0.37093| ppl: 236.01367| %_neg_is_pos: 0.00302| lr: 0.0| temp: 1.9763 | loss: 1.12687| constrast_loss: 4.44247| div_loss: 0.6499| %_mask_idx: 0.3808| ppl: 224.06169| %_neg_is_pos: 0.00463| lr: 0.0| temp: 1.97629 | loss: 1.1281| constrast_loss: 4.44855| div_loss: 0.63842| %_mask_idx: 0.40492| ppl: 231.41113| %_neg_is_pos: 0.00365| lr: 0.0| temp: 1.97629 | loss: 1.13021| constrast_loss: 4.45658| div_loss: 0.64254| %_mask_idx: 0.35244| ppl: 228.7753| %_neg_is_pos: 0.00578| lr: 0.0| temp: 1.97627 | loss: 1.13331| constrast_loss: 4.47141| div_loss: 0.61812| %_mask_idx: 0.37547| ppl: 244.4035| %_neg_is_pos: 0.00263| lr: 0.0| temp: 1.97627 | loss: 1.14043| constrast_loss: 4.49857| div_loss: 0.63163| %_mask_idx: 0.41463| ppl: 235.7598| %_neg_is_pos: 0.00322| lr: 0.0| temp: 1.97626 | loss: 1.13752| constrast_loss: 4.49011| div_loss: 0.59987| %_mask_idx: 0.4162| ppl: 256.08316| %_neg_is_pos: 0.00179| lr: 0.0| temp: 1.97626 | loss: 1.12126| constrast_loss: 4.4205| div_loss: 0.6456| %_mask_idx: 0.33913| ppl: 226.81638| %_neg_is_pos: 0.00412| lr: 0.0| temp: 1.97625 | loss: 1.13168| constrast_loss: 4.4625| div_loss: 0.6423| %_mask_idx: 0.38095| ppl: 228.92624| %_neg_is_pos: 0.00341| lr: 0.0| temp: 1.97625 | loss: 1.12347| constrast_loss: 4.42963| div_loss: 0.64256| %_mask_idx: 0.35558| ppl: 228.76224| %_neg_is_pos: 0.00551| lr: 0.0| temp: 1.97624 | loss: 1.11907| constrast_loss: 4.41044| div_loss: 0.65827| %_mask_idx: 0.41604| ppl: 218.70778| %_neg_is_pos: 0.004| lr: 0.0| temp: 1.97624 | loss: 1.14513| constrast_loss: 4.51919| div_loss: 0.61348| %_mask_idx: 0.4093| ppl: 247.37549| %_neg_is_pos: 0.00291| lr: 0.0| temp: 1.97622 | loss: 1.13878| constrast_loss: 4.49288| div_loss: 0.6225| %_mask_idx: 0.43922| ppl: 241.60216| %_neg_is_pos: 0.00218| lr: 0.0| temp: 1.97622 | loss: 1.12764| constrast_loss: 4.44608| div_loss: 0.64496| %_mask_idx: 0.34258| ppl: 227.22696| %_neg_is_pos: 0.00474| lr: 0.0| temp: 1.97621 | loss: 1.13495| constrast_loss: 4.47745| div_loss: 0.62345| %_mask_idx: 0.37563| ppl: 240.9913| %_neg_is_pos: 0.00312| lr: 0.0| temp: 1.97621 | loss: 1.13073| constrast_loss: 4.45963| div_loss: 0.63296| %_mask_idx: 0.43123| ppl: 234.9025| %_neg_is_pos: 0.00334| lr: 0.0| temp: 1.97619 | loss: 1.119| constrast_loss: 4.41108| div_loss: 0.64914| %_mask_idx: 0.35918| ppl: 224.55005| %_neg_is_pos: 0.00428| lr: 0.0| temp: 1.97619 | loss: 1.13302| constrast_loss: 4.46868| div_loss: 0.63401| %_mask_idx: 0.45348| ppl: 234.23672| %_neg_is_pos: 0.00309| lr: 0.0| temp: 1.97618 | loss: 1.1464| constrast_loss: 4.52161| div_loss: 0.64| %_mask_idx: 0.40508| ppl: 230.397| %_neg_is_pos: 0.00434| lr: 0.0| temp: 1.97618 | loss: 1.12751| constrast_loss: 4.4458| div_loss: 0.6423| %_mask_idx: 0.38894| ppl: 228.93054| %_neg_is_pos: 0.00458| lr: 0.0| temp: 1.97617 | loss: 1.13977| constrast_loss: 4.4949| div_loss: 0.64164| %_mask_idx: 0.40069| ppl: 229.35066| %_neg_is_pos: 0.00344| lr: 0.0| temp: 1.97617 | loss: 1.12479| constrast_loss: 4.4355| div_loss: 0.6365| %_mask_idx: 0.37939| ppl: 232.63766| %_neg_is_pos: 0.00443| lr: 0.0| temp: 1.97616 | loss: 1.13835| constrast_loss: 4.49066| div_loss: 0.62737| %_mask_idx: 0.31046| ppl: 238.48416| %_neg_is_pos: 0.00459| lr: 0.0| temp: 1.97616 | loss: 1.13473| constrast_loss: 4.47492| div_loss: 0.64024| %_mask_idx: 0.35103| ppl: 230.24471| %_neg_is_pos: 0.00452| lr: 0.0| temp: 1.97614 | loss: 1.12701| constrast_loss: 4.44267| div_loss: 0.65363| %_mask_idx: 0.38769| ppl: 221.67734| %_neg_is_pos: 0.00555| lr: 0.0| temp: 1.97614 | loss: 1.12293| constrast_loss: 4.4276| div_loss: 0.64123| %_mask_idx: 0.38565| ppl: 229.61511| %_neg_is_pos: 0.0036| lr: 0.0| temp: 1.97613 | loss: 1.127| constrast_loss: 4.44418| div_loss: 0.63828| %_mask_idx: 0.3656| ppl: 231.49806| %_neg_is_pos: 0.006| lr: 0.0| temp: 1.97613 | loss: 1.13811| constrast_loss: 4.4902| div_loss: 0.62251| %_mask_idx: 0.42638| ppl: 241.59438| %_neg_is_pos: 0.00222| lr: 0.0| temp: 1.97612 | loss: 1.13974| constrast_loss: 4.49625| div_loss: 0.62694| %_mask_idx: 0.37171| ppl: 238.75543| %_neg_is_pos: 0.00297| lr: 0.0| temp: 1.97612 | loss: 1.13259| constrast_loss: 4.46803| div_loss: 0.62336| %_mask_idx: 0.39442| ppl: 241.04794| %_neg_is_pos: 0.00424| lr: 0.0| temp: 1.97611 | loss: 1.1393| constrast_loss: 4.49502| div_loss: 0.62194| %_mask_idx: 0.35573| ppl: 241.9576| %_neg_is_pos: 0.0037| lr: 0.0| temp: 1.97611 | loss: 1.13635| constrast_loss: 4.4832| div_loss: 0.62187| %_mask_idx: 0.39771| ppl: 242.00107| %_neg_is_pos: 0.00414| lr: 0.0| temp: 1.97609 | loss: 1.13329| constrast_loss: 4.47022| div_loss: 0.62947| %_mask_idx: 0.38831| ppl: 237.14233| %_neg_is_pos: 0.00402| lr: 0.0| temp: 1.97609 | loss: 1.14058| constrast_loss: 4.49847| div_loss: 0.63866| %_mask_idx: 0.43938| ppl: 231.25928| %_neg_is_pos: 0.00345| lr: 0.0| temp: 1.97608 | loss: 1.14116| constrast_loss: 4.50297| div_loss: 0.61673| %_mask_idx: 0.3938| ppl: 245.29295| %_neg_is_pos: 0.00275| lr: 0.0| temp: 1.97608 | loss: 1.12983| constrast_loss: 4.45472| div_loss: 0.64601| %_mask_idx: 0.39991| ppl: 226.55374| %_neg_is_pos: 0.00402| lr: 0.0| temp: 1.97607 | loss: 1.12033| constrast_loss: 4.41445| div_loss: 0.66894| %_mask_idx: 0.34696| ppl: 211.88147| %_neg_is_pos: 0.00449| lr: 0.0| temp: 1.97607 | loss: 1.13478| constrast_loss: 4.47801| div_loss: 0.61121| %_mask_idx: 0.38628| ppl: 248.82668| %_neg_is_pos: 0.00262| lr: 0.0| temp: 1.97606 | loss: 1.12711| constrast_loss: 4.44369| div_loss: 0.64737| %_mask_idx: 0.32049| ppl: 225.68335| %_neg_is_pos: 0.00466| lr: 0.0| temp: 1.97606 | loss: 1.13884| constrast_loss: 4.49285| div_loss: 0.62525| %_mask_idx: 0.40555| ppl: 239.84299| %_neg_is_pos: 0.00316| lr: 0.0| temp: 1.97604 | loss: 1.13077| constrast_loss: 4.45995| div_loss: 0.63118| %_mask_idx: 0.43468| ppl: 236.0423| %_neg_is_pos: 0.00301| lr: 0.0| temp: 1.97604 | loss: 1.12858| constrast_loss: 4.44761| div_loss: 0.66724| %_mask_idx: 0.43766| ppl: 212.96411| %_neg_is_pos: 0.00403| lr: 0.0| temp: 1.97603 | loss: 1.138| constrast_loss: 4.49067| div_loss: 0.61306| %_mask_idx: 0.36638| ppl: 247.64149| %_neg_is_pos: 0.00304| lr: 0.0| temp: 1.97603 | loss: 1.1269| constrast_loss: 4.44365| div_loss: 0.63934| %_mask_idx: 0.34164| ppl: 230.82251| %_neg_is_pos: 0.00561| lr: 0.0| temp: 1.97601 | loss: 1.13852| constrast_loss: 4.49175| div_loss: 0.62322| %_mask_idx: 0.42262| ppl: 241.1396| %_neg_is_pos: 0.00248| lr: 0.0| temp: 1.97601 | loss: 1.1359| constrast_loss: 4.48019| div_loss: 0.63404| %_mask_idx: 0.39458| ppl: 234.21236| %_neg_is_pos: 0.00315| lr: 0.0| temp: 1.97601 | loss: 1.12864| constrast_loss: 4.45088| div_loss: 0.63667| %_mask_idx: 0.41087| ppl: 232.53308| %_neg_is_pos: 0.00354| lr: 0.0| temp: 1.97601 | loss: 1.13147| constrast_loss: 4.46261| div_loss: 0.63252| %_mask_idx: 0.34179| ppl: 235.18744| %_neg_is_pos: 0.00477| lr: 0.0| temp: 1.976 | loss: 1.12189| constrast_loss: 4.42336| div_loss: 0.64197| %_mask_idx: 0.37531| ppl: 229.1385| %_neg_is_pos: 0.00465| lr: 0.0| temp: 1.976 | loss: 1.13989| constrast_loss: 4.49549| div_loss: 0.64081| %_mask_idx: 0.39019| ppl: 229.88263| %_neg_is_pos: 0.0036| lr: 0.0| temp: 1.97599 | loss: 1.12756| constrast_loss: 4.4454| div_loss: 0.64833| %_mask_idx: 0.35996| ppl: 225.0665| %_neg_is_pos: 0.00342| lr: 0.0| temp: 1.97599 | loss: 1.14191| constrast_loss: 4.50306| div_loss: 0.64592| %_mask_idx: 0.3844| ppl: 226.61005| %_neg_is_pos: 0.00225| lr: 0.0| temp: 1.97597 | loss: 1.13259| constrast_loss: 4.46706| div_loss: 0.63291| %_mask_idx: 0.40883| ppl: 234.93863| %_neg_is_pos: 0.00342| lr: 0.0| temp: 1.97597 | loss: 1.13874| constrast_loss: 4.49291| div_loss: 0.6206| %_mask_idx: 0.39724| ppl: 242.81502| %_neg_is_pos: 0.00359| lr: 0.0| temp: 1.97596 | loss: 1.12581| constrast_loss: 4.43724| div_loss: 0.6598| %_mask_idx: 0.34853| ppl: 217.72707| %_neg_is_pos: 0.00607| lr: 0.0| temp: 1.97596 | loss: 1.12571| constrast_loss: 4.43668| div_loss: 0.6615| %_mask_idx: 0.3479| ppl: 216.6395| %_neg_is_pos: 0.00468| lr: 0.0| temp: 1.97595 | loss: 1.12361| constrast_loss: 4.42998| div_loss: 0.6447| %_mask_idx: 0.4292| ppl: 227.38986| %_neg_is_pos: 0.00342| lr: 0.0| temp: 1.97595 | loss: 1.13723| constrast_loss: 4.48792| div_loss: 0.61008| %_mask_idx: 0.38643| ppl: 249.54953| %_neg_is_pos: 0.00332| lr: 0.0| temp: 1.97594 | loss: 1.13395| constrast_loss: 4.47252| div_loss: 0.63276| %_mask_idx: 0.39803| ppl: 235.03671| %_neg_is_pos: 0.00254| lr: 0.0| temp: 1.97594 | loss: 1.1366| constrast_loss: 4.48353| div_loss: 0.6286| %_mask_idx: 0.39771| ppl: 237.69409| %_neg_is_pos: 0.00209| lr: 0.0| temp: 1.97592 | loss: 1.12151| constrast_loss: 4.42177| div_loss: 0.64282| %_mask_idx: 0.35025| ppl: 228.59412| %_neg_is_pos: 0.00466| lr: 0.0| temp: 1.97592 | loss: 1.12691| constrast_loss: 4.44321| div_loss: 0.64449| %_mask_idx: 0.32331| ppl: 227.52737| %_neg_is_pos: 0.00612| lr: 0.0| temp: 1.97591 | loss: 1.13978| constrast_loss: 4.49676| div_loss: 0.62344| %_mask_idx: 0.42152| ppl: 240.99994| %_neg_is_pos: 0.00319| lr: 0.0| temp: 1.97591 | loss: 1.14013| constrast_loss: 4.4997| div_loss: 0.60799| %_mask_idx: 0.42152| ppl: 250.88748| %_neg_is_pos: 0.00229| lr: 0.0| temp: 1.9759 | loss: 1.13682| constrast_loss: 4.48331| div_loss: 0.63982| %_mask_idx: 0.35996| ppl: 230.51663| %_neg_is_pos: 0.00448| lr: 0.0| temp: 1.9759 | loss: 1.13487| constrast_loss: 4.47832| div_loss: 0.61149| %_mask_idx: 0.41432| ppl: 248.64944| %_neg_is_pos: 0.00185| lr: 0.0| temp: 1.97589 | loss: 1.12773| constrast_loss: 4.44748| div_loss: 0.63459| %_mask_idx: 0.3963| ppl: 233.86523| %_neg_is_pos: 0.00307| lr: 0.0| temp: 1.97589 | loss: 1.12993| constrast_loss: 4.45782| div_loss: 0.61897| %_mask_idx: 0.36059| ppl: 243.85751| %_neg_is_pos: 0.00418| lr: 0.0| temp: 1.97587 | loss: 1.1397| constrast_loss: 4.49596| div_loss: 0.62826| %_mask_idx: 0.36043| ppl: 237.91165| %_neg_is_pos: 0.00523| lr: 0.0| temp: 1.97587 | loss: 1.13736| constrast_loss: 4.4881| div_loss: 0.61354| %_mask_idx: 0.3869| ppl: 247.3356| %_neg_is_pos: 0.00282| lr: 0.0| temp: 1.97586 | loss: 1.1406| constrast_loss: 4.49898| div_loss: 0.63421| %_mask_idx: 0.41306| ppl: 234.10736| %_neg_is_pos: 0.0041| lr: 0.0| temp: 1.97586 [2021-09-02 02:29:45,061] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 02:29:45,061] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.13652| constrast_loss: 4.48201| div_loss: 0.64072| %_mask_idx: 0.42105| ppl: 229.94067| %_neg_is_pos: 0.00475| lr: 0.0| temp: 1.97584 | loss: 1.13307| constrast_loss: 4.46883| div_loss: 0.63461| %_mask_idx: 0.40163| ppl: 233.85263| %_neg_is_pos: 0.00318| lr: 0.0| temp: 1.97584 | loss: 1.12801| constrast_loss: 4.44813| div_loss: 0.63902| %_mask_idx: 0.41338| ppl: 231.02567| %_neg_is_pos: 0.00367| lr: 0.0| temp: 1.97583 | loss: 1.13238| constrast_loss: 4.46576| div_loss: 0.63767| %_mask_idx: 0.38205| ppl: 231.89194| %_neg_is_pos: 0.00355| lr: 0.0| temp: 1.97583 | loss: 1.13507| constrast_loss: 4.4777| div_loss: 0.62562| %_mask_idx: 0.33521| ppl: 239.60367| %_neg_is_pos: 0.00373| lr: 0.0| temp: 1.97582 | loss: 1.12925| constrast_loss: 4.45305| div_loss: 0.63947| %_mask_idx: 0.43327| ppl: 230.74216| %_neg_is_pos: 0.00378| lr: 0.0| temp: 1.97582 | loss: 1.13202| constrast_loss: 4.46601| div_loss: 0.62067| %_mask_idx: 0.39333| ppl: 242.77068| %_neg_is_pos: 0.00577| lr: 0.0| temp: 1.97581 | loss: 1.12835| constrast_loss: 4.44975| div_loss: 0.63654| %_mask_idx: 0.32581| ppl: 232.61621| %_neg_is_pos: 0.00462| lr: 0.0| temp: 1.97581 | loss: 1.13437| constrast_loss: 4.4753| div_loss: 0.62175| %_mask_idx: 0.38158| ppl: 242.07983| %_neg_is_pos: 0.00656| lr: 0.0| temp: 1.97579 | loss: 1.12964| constrast_loss: 4.45543| div_loss: 0.6314| %_mask_idx: 0.4588| ppl: 235.90225| %_neg_is_pos: 0.00251| lr: 0.0| temp: 1.97579 | loss: 1.12153| constrast_loss: 4.42167| div_loss: 0.64449| %_mask_idx: 0.40226| ppl: 227.52875| %_neg_is_pos: 0.00761| lr: 0.0| temp: 1.97578 | loss: 1.13142| constrast_loss: 4.46316| div_loss: 0.62524| %_mask_idx: 0.37077| ppl: 239.84613| %_neg_is_pos: 0.00557| lr: 0.0| temp: 1.97578 | loss: 1.14212| constrast_loss: 4.50585| div_loss: 0.62644| %_mask_idx: 0.41056| ppl: 239.08009| %_neg_is_pos: 0.00755| lr: 0.0| temp: 1.97577 | loss: 1.1369| constrast_loss: 4.48416| div_loss: 0.6343| %_mask_idx: 0.42419| ppl: 234.04694| %_neg_is_pos: 0.00393| lr: 0.0| temp: 1.97577 | loss: 1.13337| constrast_loss: 4.47193| div_loss: 0.61566| %_mask_idx: 0.36341| ppl: 245.97507| %_neg_is_pos: 0.00491| lr: 0.0| temp: 1.97576 | loss: 1.13099| constrast_loss: 4.45894| div_loss: 0.65034| %_mask_idx: 0.41494| ppl: 223.78429| %_neg_is_pos: 0.00606| lr: 0.0| temp: 1.97576 | loss: 1.12709| constrast_loss: 4.44406| div_loss: 0.64307| %_mask_idx: 0.40241| ppl: 228.43698| %_neg_is_pos: 0.00914| lr: 0.0| temp: 1.97574 | loss: 1.12082| constrast_loss: 4.41742| div_loss: 0.65868| %_mask_idx: 0.34023| ppl: 218.44434| %_neg_is_pos: 0.00979| lr: 0.0| temp: 1.97574 | loss: 1.11578| constrast_loss: 4.39861| div_loss: 0.64524| %_mask_idx: 0.38221| ppl: 227.04468| %_neg_is_pos: 0.00681| lr: 0.0| temp: 1.97573 | loss: 1.13002| constrast_loss: 4.45828| div_loss: 0.61818| %_mask_idx: 0.41071| ppl: 244.36339| %_neg_is_pos: 0.00525| lr: 0.0| temp: 1.97573 | loss: 1.14088| constrast_loss: 4.50169| div_loss: 0.61843| %_mask_idx: 0.40351| ppl: 244.20656| %_neg_is_pos: 0.00337| lr: 0.0| temp: 1.97572 | loss: 1.13358| constrast_loss: 4.47176| div_loss: 0.62563| %_mask_idx: 0.37986| ppl: 239.59888| %_neg_is_pos: 0.0079| lr: 0.0| temp: 1.97572 | loss: 1.12919| constrast_loss: 4.45372| div_loss: 0.63033| %_mask_idx: 0.33866| ppl: 236.58694| %_neg_is_pos: 0.0061| lr: 0.0| temp: 1.97571 | loss: 1.12391| constrast_loss: 4.43248| div_loss: 0.63176| %_mask_idx: 0.38831| ppl: 235.67538| %_neg_is_pos: 0.00866| lr: 0.0| temp: 1.97571 | loss: 1.1398| constrast_loss: 4.49809| div_loss: 0.61131| %_mask_idx: 0.4364| ppl: 248.7612| %_neg_is_pos: 0.00356| lr: 0.0| temp: 1.97569 | loss: 1.12307| constrast_loss: 4.42948| div_loss: 0.62794| %_mask_idx: 0.37923| ppl: 238.11734| %_neg_is_pos: 0.00492| lr: 0.0| temp: 1.97569 | loss: 1.12633| constrast_loss: 4.44351| div_loss: 0.6181| %_mask_idx: 0.4021| ppl: 244.41325| %_neg_is_pos: 0.0044| lr: 0.0| temp: 1.97568 | loss: 1.1164| constrast_loss: 4.39934| div_loss: 0.6626| %_mask_idx: 0.35871| ppl: 215.93454| %_neg_is_pos: 0.00776| lr: 0.0| temp: 1.97568 | loss: 1.13961| constrast_loss: 4.49639| div_loss: 0.62061| %_mask_idx: 0.39568| ppl: 242.81253| %_neg_is_pos: 0.00583| lr: 0.0| temp: 1.97566 | loss: 1.14046| constrast_loss: 4.50051| div_loss: 0.6134| %_mask_idx: 0.43781| ppl: 247.42279| %_neg_is_pos: 0.00573| lr: 0.0| temp: 1.97566 | loss: 1.13421| constrast_loss: 4.4734| div_loss: 0.63431| %_mask_idx: 0.39662| ppl: 234.04147| %_neg_is_pos: 0.00418| lr: 0.0| temp: 1.97565 | loss: 1.13383| constrast_loss: 4.47262| div_loss: 0.62687| %_mask_idx: 0.40038| ppl: 238.80206| %_neg_is_pos: 0.00344| lr: 0.0| temp: 1.97565 | loss: 1.13099| constrast_loss: 4.46318| div_loss: 0.60789| %_mask_idx: 0.37061| ppl: 250.95013| %_neg_is_pos: 0.00417| lr: 0.0| temp: 1.97564 | loss: 1.12394| constrast_loss: 4.42978| div_loss: 0.65981| %_mask_idx: 0.37437| ppl: 217.72182| %_neg_is_pos: 0.00596| lr: 0.0| temp: 1.97564 | loss: 1.13145| constrast_loss: 4.46339| div_loss: 0.62421| %_mask_idx: 0.40852| ppl: 240.50427| %_neg_is_pos: 0.00482| lr: 0.0| temp: 1.97563 | loss: 1.13443| constrast_loss: 4.47434| div_loss: 0.63386| %_mask_idx: 0.39803| ppl: 234.33087| %_neg_is_pos: 0.00532| lr: 0.0| temp: 1.97563 | loss: 1.12668| constrast_loss: 4.44313| div_loss: 0.63604| %_mask_idx: 0.42967| ppl: 232.93359| %_neg_is_pos: 0.00263| lr: 0.0| temp: 1.97561 | loss: 1.14093| constrast_loss: 4.5006| div_loss: 0.63111| %_mask_idx: 0.45504| ppl: 236.08795| %_neg_is_pos: 0.00262| lr: 0.0| temp: 1.97561 | loss: 1.11414| constrast_loss: 4.39131| div_loss: 0.65233| %_mask_idx: 0.36012| ppl: 222.50766| %_neg_is_pos: 0.00641| lr: 0.0| temp: 1.9756 | loss: 1.13985| constrast_loss: 4.49953| div_loss: 0.59879| %_mask_idx: 0.40742| ppl: 256.77325| %_neg_is_pos: 0.00315| lr: 0.0| temp: 1.9756 | loss: 1.13502| constrast_loss: 4.47438| div_loss: 0.65719| %_mask_idx: 0.37187| ppl: 219.39923| %_neg_is_pos: 0.00531| lr: 0.0| temp: 1.97559 | loss: 1.1363| constrast_loss: 4.48256| div_loss: 0.62621| %_mask_idx: 0.34352| ppl: 239.22702| %_neg_is_pos: 0.00831| lr: 0.0| temp: 1.97559 | loss: 1.12876| constrast_loss: 4.45225| div_loss: 0.62781| %_mask_idx: 0.32206| ppl: 238.20114| %_neg_is_pos: 0.00346| lr: 0.0| temp: 1.97558 | loss: 1.13809| constrast_loss: 4.48987| div_loss: 0.62472| %_mask_idx: 0.40915| ppl: 240.18109| %_neg_is_pos: 0.00429| lr: 0.0| temp: 1.97558 | loss: 1.12491| constrast_loss: 4.43563| div_loss: 0.64| %_mask_idx: 0.375| ppl: 230.40189| %_neg_is_pos: 0.00575| lr: 0.0| temp: 1.97556 | loss: 1.13876| constrast_loss: 4.4934| div_loss: 0.61653| %_mask_idx: 0.42043| ppl: 245.42091| %_neg_is_pos: 0.00347| lr: 0.0| temp: 1.97556 | loss: 1.13919| constrast_loss: 4.49544| div_loss: 0.61313| %_mask_idx: 0.4032| ppl: 247.59523| %_neg_is_pos: 0.00489| lr: 0.0| temp: 1.97555 | loss: 1.13127| constrast_loss: 4.45987| div_loss: 0.6519| %_mask_idx: 0.38001| ppl: 222.78476| %_neg_is_pos: 0.00613| lr: 0.0| temp: 1.97555 | loss: 1.13518| constrast_loss: 4.47843| div_loss: 0.62279| %_mask_idx: 0.40648| ppl: 241.41373| %_neg_is_pos: 0.00322| lr: 0.0| temp: 1.97554 | loss: 1.13035| constrast_loss: 4.45844| div_loss: 0.62943| %_mask_idx: 0.37249| ppl: 237.16298| %_neg_is_pos: 0.00445| lr: 0.0| temp: 1.97554 | loss: 1.13063| constrast_loss: 4.46124| div_loss: 0.61269| %_mask_idx: 0.39912| ppl: 247.87648| %_neg_is_pos: 0.0028| lr: 0.0| temp: 1.97553 | loss: 1.13504| constrast_loss: 4.47708| div_loss: 0.63076| %_mask_idx: 0.3985| ppl: 236.31325| %_neg_is_pos: 0.00359| lr: 0.0| temp: 1.97553 | loss: 1.13797| constrast_loss: 4.49085| div_loss: 0.61014| %_mask_idx: 0.33913| ppl: 249.50867| %_neg_is_pos: 0.00184| lr: 0.0| temp: 1.97551 | loss: 1.12701| constrast_loss: 4.44418| div_loss: 0.63868| %_mask_idx: 0.36544| ppl: 231.24698| %_neg_is_pos: 0.0039| lr: 0.0| temp: 1.97551 | loss: 1.12936| constrast_loss: 4.45308| div_loss: 0.64361| %_mask_idx: 0.35761| ppl: 228.09186| %_neg_is_pos: 0.00523| lr: 0.0| temp: 1.9755 | loss: 1.13598| constrast_loss: 4.48198| div_loss: 0.61956| %_mask_idx: 0.40648| ppl: 243.48186| %_neg_is_pos: 0.00346| lr: 0.0| temp: 1.9755 | loss: 1.12908| constrast_loss: 4.45121| div_loss: 0.65117| %_mask_idx: 0.42246| ppl: 223.24948| %_neg_is_pos: 0.00711| lr: 0.0| temp: 1.97548 | loss: 1.12213| constrast_loss: 4.42379| div_loss: 0.64739| %_mask_idx: 0.38424| ppl: 225.67279| %_neg_is_pos: 0.00574| lr: 0.0| temp: 1.97548 | loss: 1.13413| constrast_loss: 4.47524| div_loss: 0.61291| %_mask_idx: 0.40492| ppl: 247.74065| %_neg_is_pos: 0.00331| lr: 0.0| temp: 1.97547 | loss: 1.12171| constrast_loss: 4.42198| div_loss: 0.64873| %_mask_idx: 0.37014| ppl: 224.81468| %_neg_is_pos: 0.00944| lr: 0.0| temp: 1.97547 | loss: 1.1198| constrast_loss: 4.41491| div_loss: 0.64308| %_mask_idx: 0.36435| ppl: 228.42606| %_neg_is_pos: 0.00582| lr: 0.0| temp: 1.97546 | loss: 1.13199| constrast_loss: 4.46404| div_loss: 0.63928| %_mask_idx: 0.4187| ppl: 230.85883| %_neg_is_pos: 0.00394| lr: 0.0| temp: 1.97546 | loss: 1.1374| constrast_loss: 4.48635| div_loss: 0.63251| %_mask_idx: 0.40774| ppl: 235.19083| %_neg_is_pos: 0.00591| lr: 0.0| temp: 1.97545 | loss: 1.12958| constrast_loss: 4.4541| div_loss: 0.64223| %_mask_idx: 0.32284| ppl: 228.97214| %_neg_is_pos: 0.00812| lr: 0.0| temp: 1.97545 | loss: 1.12593| constrast_loss: 4.43895| div_loss: 0.64792| %_mask_idx: 0.38362| ppl: 225.33159| %_neg_is_pos: 0.00589| lr: 0.0| temp: 1.97543 | loss: 1.12825| constrast_loss: 4.45052| div_loss: 0.62494| %_mask_idx: 0.39176| ppl: 240.03976| %_neg_is_pos: 0.00575| lr: 0.0| temp: 1.97543 | loss: 1.13709| constrast_loss: 4.48617| div_loss: 0.62184| %_mask_idx: 0.35056| ppl: 242.02321| %_neg_is_pos: 0.00458| lr: 0.0| temp: 1.97542 | loss: 1.13845| constrast_loss: 4.49066| div_loss: 0.63144| %_mask_idx: 0.43139| ppl: 235.87889| %_neg_is_pos: 0.00361| lr: 0.0| temp: 1.97542 | loss: 1.14605| constrast_loss: 4.52253| div_loss: 0.61675| %_mask_idx: 0.41416| ppl: 245.28078| %_neg_is_pos: 0.00214| lr: 0.0| temp: 1.97541 | loss: 1.13906| constrast_loss: 4.49443| div_loss: 0.61811| %_mask_idx: 0.42763| ppl: 244.40765| %_neg_is_pos: 0.00248| lr: 0.0| temp: 1.97541 | loss: 1.13789| constrast_loss: 4.48843| div_loss: 0.6314| %_mask_idx: 0.32816| ppl: 235.90211| %_neg_is_pos: 0.00776| lr: 0.0| temp: 1.9754 | loss: 1.12446| constrast_loss: 4.4327| div_loss: 0.65128| %_mask_idx: 0.36435| ppl: 223.18369| %_neg_is_pos: 0.00655| lr: 0.0| temp: 1.9754 | loss: 1.1422| constrast_loss: 4.50677| div_loss: 0.62041| %_mask_idx: 0.40335| ppl: 242.93883| %_neg_is_pos: 0.00385| lr: 0.0| temp: 1.97538 | loss: 1.12515| constrast_loss: 4.43731| div_loss: 0.63306| %_mask_idx: 0.34712| ppl: 234.83945| %_neg_is_pos: 0.00727| lr: 0.0| temp: 1.97538 | loss: 1.12373| constrast_loss: 4.43101| div_loss: 0.6392| %_mask_idx: 0.38534| ppl: 230.9101| %_neg_is_pos: 0.00534| lr: 0.0| temp: 1.97537 | loss: 1.12746| constrast_loss: 4.44538| div_loss: 0.64458| %_mask_idx: 0.34743| ppl: 227.47162| %_neg_is_pos: 0.01008| lr: 0.0| temp: 1.97537 | loss: 1.13488| constrast_loss: 4.47688| div_loss: 0.6266| %_mask_idx: 0.37876| ppl: 238.97379| %_neg_is_pos: 0.00358| lr: 0.0| temp: 1.97536 | loss: 1.12688| constrast_loss: 4.4436| div_loss: 0.63918| %_mask_idx: 0.40539| ppl: 230.92212| %_neg_is_pos: 0.00437| lr: 0.0| temp: 1.97536 | loss: 1.14071| constrast_loss: 4.49974| div_loss: 0.63083| %_mask_idx: 0.40852| ppl: 236.27115| %_neg_is_pos: 0.00318| lr: 0.0| temp: 1.97535 | loss: 1.12984| constrast_loss: 4.45667| div_loss: 0.62702| %_mask_idx: 0.38503| ppl: 238.70946| %_neg_is_pos: 0.00446| lr: 0.0| temp: 1.97535 | loss: 1.12078| constrast_loss: 4.41887| div_loss: 0.64244| %_mask_idx: 0.36873| ppl: 228.83598| %_neg_is_pos: 0.0052| lr: 0.0| temp: 1.97533 | loss: 1.13685| constrast_loss: 4.48393| div_loss: 0.63452| %_mask_idx: 0.40351| ppl: 233.90869| %_neg_is_pos: 0.0049| lr: 0.0| temp: 1.97533 | loss: 1.11942| constrast_loss: 4.41399| div_loss: 0.63685| %_mask_idx: 0.35025| ppl: 232.41656| %_neg_is_pos: 0.00745| lr: 0.0| temp: 1.97532 | loss: 1.12177| constrast_loss: 4.42164| div_loss: 0.65441| %_mask_idx: 0.42935| ppl: 221.1745| %_neg_is_pos: 0.00468| lr: 0.0| temp: 1.97532 | loss: 1.13363| constrast_loss: 4.46966| div_loss: 0.64869| %_mask_idx: 0.3963| ppl: 224.83749| %_neg_is_pos: 0.00916| lr: 0.0| temp: 1.9753 | loss: 1.14047| constrast_loss: 4.50014| div_loss: 0.61737| %_mask_idx: 0.38221| ppl: 244.88168| %_neg_is_pos: 0.00418| lr: 0.0| temp: 1.9753 | loss: 1.13874| constrast_loss: 4.49159| div_loss: 0.6336| %_mask_idx: 0.34132| ppl: 234.49707| %_neg_is_pos: 0.00468| lr: 0.0| temp: 1.97529 | loss: 1.13956| constrast_loss: 4.49729| div_loss: 0.60944| %_mask_idx: 0.39771| ppl: 249.95544| %_neg_is_pos: 0.00251| lr: 0.0| temp: 1.97529 | loss: 1.13481| constrast_loss: 4.47584| div_loss: 0.63395| %_mask_idx: 0.39113| ppl: 234.27196| %_neg_is_pos: 0.00448| lr: 0.0| temp: 1.97528 | loss: 1.13602| constrast_loss: 4.48021| div_loss: 0.6387| %_mask_idx: 0.37782| ppl: 231.23076| %_neg_is_pos: 0.00529| lr: 0.0| temp: 1.97528 | loss: 1.13891| constrast_loss: 4.49447| div_loss: 0.61156| %_mask_idx: 0.4187| ppl: 248.60062| %_neg_is_pos: 0.00338| lr: 0.0| temp: 1.97527 | loss: 1.12797| constrast_loss: 4.44811| div_loss: 0.63784| %_mask_idx: 0.38487| ppl: 231.784| %_neg_is_pos: 0.00673| lr: 0.0| temp: 1.97527 | loss: 1.13561| constrast_loss: 4.47971| div_loss: 0.62726| %_mask_idx: 0.42231| ppl: 238.55074| %_neg_is_pos: 0.00365| lr: 0.0| temp: 1.97525 | loss: 1.13941| constrast_loss: 4.49648| div_loss: 0.61174| %_mask_idx: 0.37077| ppl: 248.48332| %_neg_is_pos: 0.00572| lr: 0.0| temp: 1.97525 | loss: 1.13068| constrast_loss: 4.45952| div_loss: 0.63195| %_mask_idx: 0.37547| ppl: 235.55045| %_neg_is_pos: 0.00671| lr: 0.0| temp: 1.97524 | loss: 1.12624| constrast_loss: 4.44085| div_loss: 0.64097| %_mask_idx: 0.40351| ppl: 229.78047| %_neg_is_pos: 0.0055| lr: 0.0| temp: 1.97524 | loss: 1.13264| constrast_loss: 4.46802| div_loss: 0.62526| %_mask_idx: 0.32065| ppl: 239.83124| %_neg_is_pos: 0.00606| lr: 0.0| temp: 1.97523 | loss: 1.12078| constrast_loss: 4.41734| div_loss: 0.65795| %_mask_idx: 0.37516| ppl: 218.91211| %_neg_is_pos: 0.00507| lr: 0.0| temp: 1.97523 | loss: 1.12477| constrast_loss: 4.43626| div_loss: 0.6282| %_mask_idx: 0.36607| ppl: 237.94958| %_neg_is_pos: 0.00486| lr: 0.0| temp: 1.97522 | loss: 1.12787| constrast_loss: 4.44607| div_loss: 0.65392| %_mask_idx: 0.39113| ppl: 221.49161| %_neg_is_pos: 0.0072| lr: 0.0| temp: 1.97522 | loss: 1.13551| constrast_loss: 4.48076| div_loss: 0.61269| %_mask_idx: 0.3869| ppl: 247.87982| %_neg_is_pos: 0.00465| lr: 0.0| temp: 1.9752 | loss: 1.138| constrast_loss: 4.48906| div_loss: 0.62962| %_mask_idx: 0.37406| ppl: 237.04596| %_neg_is_pos: 0.00607| lr: 0.0| temp: 1.9752 | loss: 1.13558| constrast_loss: 4.47863| div_loss: 0.6369| %_mask_idx: 0.38675| ppl: 232.38708| %_neg_is_pos: 0.00384| lr: 0.0| temp: 1.9752 | loss: 1.13049| constrast_loss: 4.45761| div_loss: 0.6437| %_mask_idx: 0.35244| ppl: 228.0302| %_neg_is_pos: 0.00684| lr: 0.0| temp: 1.9752 | loss: 1.1316| constrast_loss: 4.46333| div_loss: 0.6305| %_mask_idx: 0.40617| ppl: 236.47931| %_neg_is_pos: 0.00328| lr: 0.0| temp: 1.97519 | loss: 1.12684| constrast_loss: 4.44532| div_loss: 0.62032| %_mask_idx: 0.41447| ppl: 242.99617| %_neg_is_pos: 0.00504| lr: 0.0| temp: 1.97519 | loss: 1.13494| constrast_loss: 4.47652| div_loss: 0.63232| %_mask_idx: 0.43719| ppl: 235.31206| %_neg_is_pos: 0.0032| lr: 0.0| temp: 1.97518 | loss: 1.12511| constrast_loss: 4.43676| div_loss: 0.63677| %_mask_idx: 0.38659| ppl: 232.46786| %_neg_is_pos: 0.00502| lr: 0.0| temp: 1.97518 | loss: 1.13743| constrast_loss: 4.48785| div_loss: 0.61858| %_mask_idx: 0.39223| ppl: 244.11047| %_neg_is_pos: 0.00376| lr: 0.0| temp: 1.97516 | loss: 1.13059| constrast_loss: 4.45916| div_loss: 0.63203| %_mask_idx: 0.39834| ppl: 235.50394| %_neg_is_pos: 0.00661| lr: 0.0| temp: 1.97516 | loss: 1.13444| constrast_loss: 4.47582| div_loss: 0.61921| %_mask_idx: 0.40805| ppl: 243.70541| %_neg_is_pos: 0.00449| lr: 0.0| temp: 1.97515 | loss: 1.12524| constrast_loss: 4.43752| div_loss: 0.63429| %_mask_idx: 0.38612| ppl: 234.05202| %_neg_is_pos: 0.00457| lr: 0.0| temp: 1.97515 [2021-09-02 02:38:57,932] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 02:38:57,932] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.14374| constrast_loss: 4.51333| div_loss: 0.61639| %_mask_idx: 0.43123| ppl: 245.50923| %_neg_is_pos: 0.00199| lr: 0.0| temp: 1.97513 | loss: 1.12536| constrast_loss: 4.43893| div_loss: 0.62488| %_mask_idx: 0.35746| ppl: 240.07364| %_neg_is_pos: 0.00633| lr: 0.0| temp: 1.97513 | loss: 1.12582| constrast_loss: 4.43841| div_loss: 0.64862| %_mask_idx: 0.36952| ppl: 224.88113| %_neg_is_pos: 0.00716| lr: 0.0| temp: 1.97512 | loss: 1.14042| constrast_loss: 4.49985| div_loss: 0.61842| %_mask_idx: 0.38581| ppl: 244.21217| %_neg_is_pos: 0.00631| lr: 0.0| temp: 1.97512 | loss: 1.13134| constrast_loss: 4.46276| div_loss: 0.62604| %_mask_idx: 0.42074| ppl: 239.33557| %_neg_is_pos: 0.00336| lr: 0.0| temp: 1.97511 | loss: 1.1327| constrast_loss: 4.47007| div_loss: 0.6073| %_mask_idx: 0.35855| ppl: 251.32677| %_neg_is_pos: 0.00238| lr: 0.0| temp: 1.97511 | loss: 1.13514| constrast_loss: 4.47795| div_loss: 0.626| %_mask_idx: 0.39724| ppl: 239.3618| %_neg_is_pos: 0.00341| lr: 0.0| temp: 1.9751 | loss: 1.14266| constrast_loss: 4.50969| div_loss: 0.60955| %_mask_idx: 0.39897| ppl: 249.89078| %_neg_is_pos: 0.00235| lr: 0.0| temp: 1.9751 | loss: 1.12654| constrast_loss: 4.44204| div_loss: 0.6411| %_mask_idx: 0.4162| ppl: 229.69904| %_neg_is_pos: 0.00258| lr: 0.0| temp: 1.97508| loss: 1.14284| constrast_loss: 4.50643| div_loss: 0.64934| %_mask_idx: 0.40335| ppl: 224.42104| %_neg_is_pos: 0.0047| lr: 0.0| temp: 1.97508 | loss: 1.1201| constrast_loss: 4.41438| div_loss: 0.66006| %_mask_idx: 0.37766| ppl: 217.56238| %_neg_is_pos: 0.00513| lr: 0.0| temp: 1.97507 | loss: 1.13389| constrast_loss: 4.4729| div_loss: 0.62638| %_mask_idx: 0.388| ppl: 239.11505| %_neg_is_pos: 0.00352| lr: 0.0| temp: 1.97507 | loss: 1.14645| constrast_loss: 4.5249| div_loss: 0.609| %_mask_idx: 0.42074| ppl: 250.23856| %_neg_is_pos: 0.00156| lr: 0.0| temp: 1.97506 | loss: 1.13192| constrast_loss: 4.4632| div_loss: 0.64465| %_mask_idx: 0.35824| ppl: 227.42453| %_neg_is_pos: 0.00514| lr: 0.0| temp: 1.97506 | loss: 1.13416| constrast_loss: 4.4726| div_loss: 0.64063| %_mask_idx: 0.36685| ppl: 229.99902| %_neg_is_pos: 0.0035| lr: 0.0| temp: 1.97505 | loss: 1.11552| constrast_loss: 4.39694| div_loss: 0.65155| %_mask_idx: 0.35605| ppl: 223.01033| %_neg_is_pos: 0.00645| lr: 0.0| temp: 1.97505 | loss: 1.12622| constrast_loss: 4.44172| div_loss: 0.63182| %_mask_idx: 0.39004| ppl: 235.63336| %_neg_is_pos: 0.0045| lr: 0.0| temp: 1.97503| loss: 1.11611| constrast_loss: 4.39969| div_loss: 0.6476| %_mask_idx: 0.34023| ppl: 225.53485| %_neg_is_pos: 0.00542| lr: 0.0| temp: 1.97503 | loss: 1.13023| constrast_loss: 4.45867| div_loss: 0.6227| %_mask_idx: 0.38095| ppl: 241.47287| %_neg_is_pos: 0.00299| lr: 0.0| temp: 1.97502 | loss: 1.13068| constrast_loss: 4.45932| div_loss: 0.63391| %_mask_idx: 0.44298| ppl: 234.2966| %_neg_is_pos: 0.00378| lr: 0.0| temp: 1.97502 | loss: 1.11981| constrast_loss: 4.41451| div_loss: 0.64745| %_mask_idx: 0.37923| ppl: 225.63379| %_neg_is_pos: 0.00376| lr: 0.0| temp: 1.97501 | loss: 1.1378| constrast_loss: 4.48966| div_loss: 0.61554| %_mask_idx: 0.38487| ppl: 246.0536| %_neg_is_pos: 0.00255| lr: 0.0| temp: 1.97501 | loss: 1.13789| constrast_loss: 4.49068| div_loss: 0.60901| %_mask_idx: 0.39286| ppl: 250.23238| %_neg_is_pos: 0.0029| lr: 0.0| temp: 1.975 | loss: 1.13617| constrast_loss: 4.48317| div_loss: 0.61489| %_mask_idx: 0.42215| ppl: 246.47169| %_neg_is_pos: 0.00434| lr: 0.0| temp: 1.975 | loss: 1.13457| constrast_loss: 4.47748| div_loss: 0.60784| %_mask_idx: 0.40633| ppl: 250.98294| %_neg_is_pos: 0.00263| lr: 0.0| temp: 1.97498 | loss: 1.13662| constrast_loss: 4.48401| div_loss: 0.6246| %_mask_idx: 0.38017| ppl: 240.25729| %_neg_is_pos: 0.00299| lr: 0.0| temp: 1.97498 | loss: 1.12916| constrast_loss: 4.45428| div_loss: 0.6237| %_mask_idx: 0.34868| ppl: 240.83128| %_neg_is_pos: 0.00582| lr: 0.0| temp: 1.97497 | loss: 1.13556| constrast_loss: 4.47944| div_loss: 0.62818| %_mask_idx: 0.3526| ppl: 237.96646| %_neg_is_pos: 0.00264| lr: 0.0| temp: 1.97497 | loss: 1.13975| constrast_loss: 4.49908| div_loss: 0.59914| %_mask_idx: 0.4328| ppl: 256.54889| %_neg_is_pos: 0.0063| lr: 0.0| temp: 1.97495 | loss: 1.13659| constrast_loss: 4.48378| div_loss: 0.62581| %_mask_idx: 0.44032| ppl: 239.48152| %_neg_is_pos: 0.00412| lr: 0.0| temp: 1.97495 | loss: 1.12569| constrast_loss: 4.43908| div_loss: 0.6369| %_mask_idx: 0.3985| ppl: 232.3866| %_neg_is_pos: 0.00551| lr: 0.0| temp: 1.97494 | loss: 1.13424| constrast_loss: 4.47441| div_loss: 0.62569| %_mask_idx: 0.39474| ppl: 239.55585| %_neg_is_pos: 0.00427| lr: 0.0| temp: 1.97494 | loss: 1.14065| constrast_loss: 4.50142| div_loss: 0.61159| %_mask_idx: 0.36905| ppl: 248.58453| %_neg_is_pos: 0.00169| lr: 0.0| temp: 1.97493 | loss: 1.13269| constrast_loss: 4.46772| div_loss: 0.63035| %_mask_idx: 0.36842| ppl: 236.57498| %_neg_is_pos: 0.00304| lr: 0.0| temp: 1.97493 | loss: 1.13047| constrast_loss: 4.45939| div_loss: 0.62472| %_mask_idx: 0.36858| ppl: 240.17691| %_neg_is_pos: 0.00656| lr: 0.0| temp: 1.97492 | loss: 1.14447| constrast_loss: 4.51678| div_loss: 0.61104| %_mask_idx: 0.35761| ppl: 248.9364| %_neg_is_pos: 0.00223| lr: 0.0| temp: 1.97492 | loss: 1.13776| constrast_loss: 4.489| div_loss: 0.62047| %_mask_idx: 0.43092| ppl: 242.89674| %_neg_is_pos: 0.00251| lr: 0.0| temp: 1.9749 | loss: 1.12402| constrast_loss: 4.4301| div_loss: 0.65967| %_mask_idx: 0.35244| ppl: 217.81155| %_neg_is_pos: 0.00635| lr: 0.0| temp: 1.9749 | loss: 1.13507| constrast_loss: 4.4767| div_loss: 0.63578| %_mask_idx: 0.42967| ppl: 233.09851| %_neg_is_pos: 0.00305| lr: 0.0| temp: 1.97489 | loss: 1.12711| constrast_loss: 4.44447| div_loss: 0.63967| %_mask_idx: 0.37704| ppl: 230.61191| %_neg_is_pos: 0.00323| lr: 0.0| temp: 1.97489 | loss: 1.13226| constrast_loss: 4.46623| div_loss: 0.62808| %_mask_idx: 0.39223| ppl: 238.02695| %_neg_is_pos: 0.00376| lr: 0.0| temp: 1.97488 | loss: 1.13916| constrast_loss: 4.49547| div_loss: 0.61166| %_mask_idx: 0.40288| ppl: 248.5354| %_neg_is_pos: 0.0056| lr: 0.0| temp: 1.97488 | loss: 1.13589| constrast_loss: 4.48025| div_loss: 0.63328| %_mask_idx: 0.41729| ppl: 234.70358| %_neg_is_pos: 0.00617| lr: 0.0| temp: 1.97487 | loss: 1.14035| constrast_loss: 4.4987| div_loss: 0.62687| %_mask_idx: 0.40445| ppl: 238.8038| %_neg_is_pos: 0.00365| lr: 0.0| temp: 1.97487 | loss: 1.13429| constrast_loss: 4.4755| div_loss: 0.61647| %_mask_idx: 0.43515| ppl: 245.45848| %_neg_is_pos: 0.00382| lr: 0.0| temp: 1.97485 | loss: 1.13682| constrast_loss: 4.48403| div_loss: 0.63262| %_mask_idx: 0.41823| ppl: 235.1245| %_neg_is_pos: 0.00307| lr: 0.0| temp: 1.97485 | loss: 1.13241| constrast_loss: 4.46666| div_loss: 0.62978| %_mask_idx: 0.35307| ppl: 236.93945| %_neg_is_pos: 0.00178| lr: 0.0| temp: 1.97484 | loss: 1.12448| constrast_loss: 4.43288| div_loss: 0.65054| %_mask_idx: 0.3573| ppl: 223.65295| %_neg_is_pos: 0.00812| lr: 0.0| temp: 1.97484 | loss: 1.13513| constrast_loss: 4.47974| div_loss: 0.60781| %_mask_idx: 0.4021| ppl: 250.9995| %_neg_is_pos: 0.00135| lr: 0.0| temp: 1.97483 | loss: 1.12131| constrast_loss: 4.42239| div_loss: 0.62862| %_mask_idx: 0.43076| ppl: 237.6814| %_neg_is_pos: 0.00526| lr: 0.0| temp: 1.97483 | loss: 1.13443| constrast_loss: 4.47522| div_loss: 0.62509| %_mask_idx: 0.42544| ppl: 239.94411| %_neg_is_pos: 0.00487| lr: 0.0| temp: 1.97482 | loss: 1.12713| constrast_loss: 4.44423| div_loss: 0.64295| %_mask_idx: 0.40288| ppl: 228.51035| %_neg_is_pos: 0.00714| lr: 0.0| temp: 1.97482 | loss: 1.13972| constrast_loss: 4.49744| div_loss: 0.61455| %_mask_idx: 0.40977| ppl: 246.68692| %_neg_is_pos: 0.00355| lr: 0.0| temp: 1.9748 | loss: 1.14277| constrast_loss: 4.50886| div_loss: 0.62236| %_mask_idx: 0.42215| ppl: 241.69017| %_neg_is_pos: 0.0048| lr: 0.0| temp: 1.9748 | loss: 1.12206| constrast_loss: 4.42194| div_loss: 0.6629| %_mask_idx: 0.40993| ppl: 215.7433| %_neg_is_pos: 0.00539| lr: 0.0| temp: 1.97479 | loss: 1.14023| constrast_loss: 4.50112| div_loss: 0.59817| %_mask_idx: 0.4245| ppl: 257.1702| %_neg_is_pos: 0.00245| lr: 0.0| temp: 1.97479 | loss: 1.12414| constrast_loss: 4.43321| div_loss: 0.6333| %_mask_idx: 0.36435| ppl: 234.68884| %_neg_is_pos: 0.00848| lr: 0.0| temp: 1.97477 | loss: 1.12185| constrast_loss: 4.42259| div_loss: 0.64811| %_mask_idx: 0.31234| ppl: 225.20789| %_neg_is_pos: 0.00723| lr: 0.0| temp: 1.97477 | loss: 1.12892| constrast_loss: 4.45267| div_loss: 0.63008| %_mask_idx: 0.39051| ppl: 236.75143| %_neg_is_pos: 0.00768| lr: 0.0| temp: 1.97476 | loss: 1.12679| constrast_loss: 4.44158| div_loss: 0.65581| %_mask_idx: 0.36169| ppl: 220.28253| %_neg_is_pos: 0.00577| lr: 0.0| temp: 1.97476 | loss: 1.12969| constrast_loss: 4.45711| div_loss: 0.61644| %_mask_idx: 0.40179| ppl: 245.48091| %_neg_is_pos: 0.00352| lr: 0.0| temp: 1.97475 | loss: 1.13981| constrast_loss: 4.49715| div_loss: 0.6209| %_mask_idx: 0.41573| ppl: 242.6228| %_neg_is_pos: 0.00292| lr: 0.0| temp: 1.97475 | loss: 1.13753| constrast_loss: 4.48795| div_loss: 0.62183| %_mask_idx: 0.39865| ppl: 242.03009| %_neg_is_pos: 0.00781| lr: 0.0| temp: 1.97474 | loss: 1.1373| constrast_loss: 4.48529| div_loss: 0.63893| %_mask_idx: 0.38252| ppl: 231.08469| %_neg_is_pos: 0.00553| lr: 0.0| temp: 1.97474 | loss: 1.13418| constrast_loss: 4.47437| div_loss: 0.62334| %_mask_idx: 0.41134| ppl: 241.06241| %_neg_is_pos: 0.00437| lr: 0.0| temp: 1.97472 | loss: 1.14319| constrast_loss: 4.51081| div_loss: 0.6197| %_mask_idx: 0.3714| ppl: 243.39511| %_neg_is_pos: 0.00471| lr: 0.0| temp: 1.97472 | loss: 1.14237| constrast_loss: 4.50725| div_loss: 0.62217| %_mask_idx: 0.38456| ppl: 241.81007| %_neg_is_pos: 0.00373| lr: 0.0| temp: 1.97471 | loss: 1.1275| constrast_loss: 4.44399| div_loss: 0.6599| %_mask_idx: 0.38017| ppl: 217.66501| %_neg_is_pos: 0.00491| lr: 0.0| temp: 1.97471 | loss: 1.13276| constrast_loss: 4.46875| div_loss: 0.62276| %_mask_idx: 0.40414| ppl: 241.43498| %_neg_is_pos: 0.00267| lr: 0.0| temp: 1.9747 | loss: 1.1302| constrast_loss: 4.45679| div_loss: 0.64003| %_mask_idx: 0.35119| ppl: 230.38077| %_neg_is_pos: 0.00585| lr: 0.0| temp: 1.9747 | loss: 1.13412| constrast_loss: 4.47376| div_loss: 0.62703| %_mask_idx: 0.39411| ppl: 238.69769| %_neg_is_pos: 0.00471| lr: 0.0| temp: 1.97469 | loss: 1.13549| constrast_loss: 4.47766| div_loss: 0.64288| %_mask_idx: 0.35464| ppl: 228.55957| %_neg_is_pos: 0.00584| lr: 0.0| temp: 1.97469 | loss: 1.13768| constrast_loss: 4.48947| div_loss: 0.61238| %_mask_idx: 0.43593| ppl: 248.07867| %_neg_is_pos: 0.00189| lr: 0.0| temp: 1.97467 | loss: 1.13965| constrast_loss: 4.49471| div_loss: 0.63884| %_mask_idx: 0.39536| ppl: 231.14558| %_neg_is_pos: 0.0057| lr: 0.0| temp: 1.97467 | loss: 1.11925| constrast_loss: 4.41352| div_loss: 0.63464| %_mask_idx: 0.36231| ppl: 233.83102| %_neg_is_pos: 0.0064| lr: 0.0| temp: 1.97466 | loss: 1.12561| constrast_loss: 4.43847| div_loss: 0.6399| %_mask_idx: 0.38205| ppl: 230.46217| %_neg_is_pos: 0.00497| lr: 0.0| temp: 1.97466 | loss: 1.13661| constrast_loss: 4.48418| div_loss: 0.62255| %_mask_idx: 0.38988| ppl: 241.56503| %_neg_is_pos: 0.00751| lr: 0.0| temp: 1.97465 | loss: 1.13332| constrast_loss: 4.47203| div_loss: 0.61241| %_mask_idx: 0.39865| ppl: 248.05685| %_neg_is_pos: 0.00214| lr: 0.0| temp: 1.97465 | loss: 1.13266| constrast_loss: 4.46818| div_loss: 0.62469| %_mask_idx: 0.36967| ppl: 240.19742| %_neg_is_pos: 0.00246| lr: 0.0| temp: 1.97464 | loss: 1.12555| constrast_loss: 4.43827| div_loss: 0.63917| %_mask_idx: 0.3631| ppl: 230.92804| %_neg_is_pos: 0.00475| lr: 0.0| temp: 1.97464 | loss: 1.13177| constrast_loss: 4.46339| div_loss: 0.63708| %_mask_idx: 0.39818| ppl: 232.27066| %_neg_is_pos: 0.00539| lr: 0.0| temp: 1.97462 | loss: 1.12904| constrast_loss: 4.45236| div_loss: 0.63786| %_mask_idx: 0.37798| ppl: 231.7677| %_neg_is_pos: 0.00458| lr: 0.0| temp: 1.97462 | loss: 1.13606| constrast_loss: 4.48123| div_loss: 0.63024| %_mask_idx: 0.33819| ppl: 236.6474| %_neg_is_pos: 0.00616| lr: 0.0| temp: 1.97461 | loss: 1.13653| constrast_loss: 4.48265| div_loss: 0.63465| %_mask_idx: 0.40476| ppl: 233.82559| %_neg_is_pos: 0.00237| lr: 0.0| temp: 1.97461 | loss: 1.13336| constrast_loss: 4.47026| div_loss: 0.6318| %_mask_idx: 0.3761| ppl: 235.65088| %_neg_is_pos: 0.00349| lr: 0.0| temp: 1.97459 | loss: 1.12958| constrast_loss: 4.45639| div_loss: 0.61914| %_mask_idx: 0.40147| ppl: 243.75247| %_neg_is_pos: 0.0035| lr: 0.0| temp: 1.97459 | loss: 1.13259| constrast_loss: 4.46637| div_loss: 0.63988| %_mask_idx: 0.40461| ppl: 230.47565| %_neg_is_pos: 0.00318| lr: 0.0| temp: 1.97458 | loss: 1.12205| constrast_loss: 4.42393| div_loss: 0.64284| %_mask_idx: 0.34211| ppl: 228.58206| %_neg_is_pos: 0.00576| lr: 0.0| temp: 1.97458 | loss: 1.1294| constrast_loss: 4.45589| div_loss: 0.61698| %_mask_idx: 0.4198| ppl: 245.13104| %_neg_is_pos: 0.00392| lr: 0.0| temp: 1.97457 | loss: 1.13037| constrast_loss: 4.45744| div_loss: 0.64029| %_mask_idx: 0.38565| ppl: 230.21658| %_neg_is_pos: 0.00399| lr: 0.0| temp: 1.97457 | loss: 1.13544| constrast_loss: 4.47907| div_loss: 0.6269| %_mask_idx: 0.35855| ppl: 238.78094| %_neg_is_pos: 0.00602| lr: 0.0| temp: 1.97456 | loss: 1.12986| constrast_loss: 4.45615| div_loss: 0.63302| %_mask_idx: 0.42466| ppl: 234.86465| %_neg_is_pos: 0.00375| lr: 0.0| temp: 1.97456 | loss: 1.13237| constrast_loss: 4.466| div_loss: 0.63466| %_mask_idx: 0.38894| ppl: 233.8165| %_neg_is_pos: 0.00577| lr: 0.0| temp: 1.97454 | loss: 1.13858| constrast_loss: 4.49464| div_loss: 0.59696| %_mask_idx: 0.37296| ppl: 257.94305| %_neg_is_pos: 0.00231| lr: 0.0| temp: 1.97454 | loss: 1.1371| constrast_loss: 4.48666| div_loss: 0.61739| %_mask_idx: 0.39662| ppl: 244.868| %_neg_is_pos: 0.00263| lr: 0.0| temp: 1.97453 | loss: 1.14241| constrast_loss: 4.50689| div_loss: 0.62738| %_mask_idx: 0.35354| ppl: 238.47995| %_neg_is_pos: 0.0071| lr: 0.0| temp: 1.97453 | loss: 1.13392| constrast_loss: 4.47464| div_loss: 0.61035| %_mask_idx: 0.3739| ppl: 249.37518| %_neg_is_pos: 0.00237| lr: 0.0| temp: 1.97452 | loss: 1.12649| constrast_loss: 4.44346| div_loss: 0.62506| %_mask_idx: 0.40523| ppl: 239.96063| %_neg_is_pos: 0.00432| lr: 0.0| temp: 1.97452 | loss: 1.12606| constrast_loss: 4.44101| div_loss: 0.63218| %_mask_idx: 0.3891| ppl: 235.40417| %_neg_is_pos: 0.00616| lr: 0.0| temp: 1.97451 | loss: 1.12863| constrast_loss: 4.45345| div_loss: 0.61079| %_mask_idx: 0.39803| ppl: 249.09314| %_neg_is_pos: 0.00347| lr: 0.0| temp: 1.97451 | loss: 1.13275| constrast_loss: 4.4698| div_loss: 0.61194| %_mask_idx: 0.39693| ppl: 248.35989| %_neg_is_pos: 0.00196| lr: 0.0| temp: 1.97449 | loss: 1.12619| constrast_loss: 4.44111| div_loss: 0.63669| %_mask_idx: 0.36999| ppl: 232.51669| %_neg_is_pos: 0.00808| lr: 0.0| temp: 1.97449 | loss: 1.13923| constrast_loss: 4.49501| div_loss: 0.61923| %_mask_idx: 0.43703| ppl: 243.692| %_neg_is_pos: 0.00382| lr: 0.0| temp: 1.97448 | loss: 1.1362| constrast_loss: 4.48077| div_loss: 0.64024| %_mask_idx: 0.36576| ppl: 230.24849| %_neg_is_pos: 0.0089| lr: 0.0| temp: 1.97448 | loss: 1.1233| constrast_loss: 4.42883| div_loss: 0.64376| %_mask_idx: 0.37406| ppl: 227.9924| %_neg_is_pos: 0.00449| lr: 0.0| temp: 1.97447 | loss: 1.13696| constrast_loss: 4.48545| div_loss: 0.62393| %_mask_idx: 0.39364| ppl: 240.68326| %_neg_is_pos: 0.00221| lr: 0.0| temp: 1.97447 | loss: 1.12815| constrast_loss: 4.44762| div_loss: 0.64989| %_mask_idx: 0.38722| ppl: 224.06805| %_neg_is_pos: 0.00377| lr: 0.0| temp: 1.97446 | loss: 1.13156| constrast_loss: 4.46434| div_loss: 0.61912| %_mask_idx: 0.38628| ppl: 243.76498| %_neg_is_pos: 0.00431| lr: 0.0| temp: 1.97446 | loss: 1.1277| constrast_loss: 4.4485| div_loss: 0.62285| %_mask_idx: 0.40962| ppl: 241.37521| %_neg_is_pos: 0.0036| lr: 0.0| temp: 1.97444 | loss: 1.13637| constrast_loss: 4.48148| div_loss: 0.64016| %_mask_idx: 0.41087| ppl: 230.30075| %_neg_is_pos: 0.00426| lr: 0.0| temp: 1.97444 | loss: 1.13406| constrast_loss: 4.47495| div_loss: 0.61284| %_mask_idx: 0.34414| ppl: 247.78119| %_neg_is_pos: 0.0038| lr: 0.0| temp: 1.97443 | loss: 1.12876| constrast_loss: 4.44964| div_loss: 0.65411| %_mask_idx: 0.40241| ppl: 221.37115| %_neg_is_pos: 0.00825| lr: 0.0| temp: 1.97443 [2021-09-02 02:48:11,890] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 02:48:11,890] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.1344| constrast_loss: 4.47487| div_loss: 0.62724| %_mask_idx: 0.40335| ppl: 238.5687| %_neg_is_pos: 0.00573| lr: 0.0| temp: 1.97441 | loss: 1.12521| constrast_loss: 4.43703| div_loss: 0.638| %_mask_idx: 0.36685| ppl: 231.67914| %_neg_is_pos: 0.00681| lr: 0.0| temp: 1.97441 | loss: 1.14239| constrast_loss: 4.50964| div_loss: 0.59931| %_mask_idx: 0.4292| ppl: 256.43964| %_neg_is_pos: 0.00315| lr: 0.0| temp: 1.97441 | loss: 1.12785| constrast_loss: 4.44884| div_loss: 0.62551| %_mask_idx: 0.39693| ppl: 239.67535| %_neg_is_pos: 0.00859| lr: 0.0| temp: 1.97441 | loss: 1.12569| constrast_loss: 4.43972| div_loss: 0.6304| %_mask_idx: 0.37187| ppl: 236.54425| %_neg_is_pos: 0.01259| lr: 0.0| temp: 1.9744 | loss: 1.13637| constrast_loss: 4.48504| div_loss: 0.60448| %_mask_idx: 0.41087| ppl: 253.13095| %_neg_is_pos: 0.00614| lr: 0.0| temp: 1.9744 | loss: 1.12809| constrast_loss: 4.44935| div_loss: 0.63009| %_mask_idx: 0.35244| ppl: 236.74197| %_neg_is_pos: 0.0052| lr: 0.0| temp: 1.97439 | loss: 1.12737| constrast_loss: 4.44674| div_loss: 0.62751| %_mask_idx: 0.35981| ppl: 238.39536| %_neg_is_pos: 0.0035| lr: 0.0| temp: 1.97439 | loss: 1.13428| constrast_loss: 4.4729| div_loss: 0.64233| %_mask_idx: 0.40758| ppl: 228.90831| %_neg_is_pos: 0.0049| lr: 0.0| temp: 1.97437 | loss: 1.14151| constrast_loss: 4.50371| div_loss: 0.62324| %_mask_idx: 0.43719| ppl: 241.12531| %_neg_is_pos: 0.00229| lr: 0.0| temp: 1.97437 | loss: 1.13927| constrast_loss: 4.49389| div_loss: 0.63193| %_mask_idx: 0.40257| ppl: 235.56738| %_neg_is_pos: 0.00383| lr: 0.0| temp: 1.97436 | loss: 1.13858| constrast_loss: 4.49231| div_loss: 0.62021| %_mask_idx: 0.41792| ppl: 243.06654| %_neg_is_pos: 0.00449| lr: 0.0| temp: 1.97436 | loss: 1.14319| constrast_loss: 4.51007| div_loss: 0.62689| %_mask_idx: 0.40742| ppl: 238.79245| %_neg_is_pos: 0.00387| lr: 0.0| temp: 1.97435 | loss: 1.12573| constrast_loss: 4.43816| div_loss: 0.64739| %_mask_idx: 0.38189| ppl: 225.67337| %_neg_is_pos: 0.00825| lr: 0.0| temp: 1.97435 | loss: 1.12454| constrast_loss: 4.43303| div_loss: 0.65114| %_mask_idx: 0.36153| ppl: 223.27007| %_neg_is_pos: 0.00678| lr: 0.0| temp: 1.97434 | loss: 1.14416| constrast_loss: 4.51541| div_loss: 0.6125| %_mask_idx: 0.35652| ppl: 248.00317| %_neg_is_pos: 0.00331| lr: 0.0| temp: 1.97434 | loss: 1.14231| constrast_loss: 4.50792| div_loss: 0.61339| %_mask_idx: 0.43656| ppl: 247.43173| %_neg_is_pos: 0.00239| lr: 0.0| temp: 1.97432 | loss: 1.1176| constrast_loss: 4.40457| div_loss: 0.65826| %_mask_idx: 0.39489| ppl: 218.71472| %_neg_is_pos: 0.00544| lr: 0.0| temp: 1.97432 | loss: 1.13253| constrast_loss: 4.46709| div_loss: 0.63048| %_mask_idx: 0.37014| ppl: 236.49149| %_neg_is_pos: 0.00397| lr: 0.0| temp: 1.97431 | loss: 1.1296| constrast_loss: 4.45601| div_loss: 0.62407| %_mask_idx: 0.41181| ppl: 240.59348| %_neg_is_pos: 0.00591| lr: 0.0| temp: 1.97431 | loss: 1.12436| constrast_loss: 4.43281| div_loss: 0.64609| %_mask_idx: 0.39756| ppl: 226.50012| %_neg_is_pos: 0.00739| lr: 0.0| temp: 1.9743 | loss: 1.13901| constrast_loss: 4.4931| div_loss: 0.6294| %_mask_idx: 0.37343| ppl: 237.18695| %_neg_is_pos: 0.0056| lr: 0.0| temp: 1.9743 | loss: 1.13111| constrast_loss: 4.46157| div_loss: 0.62881| %_mask_idx: 0.4458| ppl: 237.56308| %_neg_is_pos: 0.00284| lr: 0.0| temp: 1.97429 | loss: 1.1145| constrast_loss: 4.39084| div_loss: 0.6716| %_mask_idx: 0.32331| ppl: 210.17557| %_neg_is_pos: 0.0065| lr: 0.0| temp: 1.97429 | loss: 1.13315| constrast_loss: 4.47024| div_loss: 0.62371| %_mask_idx: 0.38659| ppl: 240.82378| %_neg_is_pos: 0.0037| lr: 0.0| temp: 1.97427 | loss: 1.11714| constrast_loss: 4.40252| div_loss: 0.66025| %_mask_idx: 0.34117| ppl: 217.43903| %_neg_is_pos: 0.00603| lr: 0.0| temp: 1.97427 | loss: 1.12267| constrast_loss: 4.42498| div_loss: 0.65709| %_mask_idx: 0.38503| ppl: 219.46091| %_neg_is_pos: 0.00644| lr: 0.0| temp: 1.97426 | loss: 1.13284| constrast_loss: 4.4696| div_loss: 0.61743| %_mask_idx: 0.39066| ppl: 244.84206| %_neg_is_pos: 0.00313| lr: 0.0| temp: 1.97426 | loss: 1.12469| constrast_loss: 4.43639| div_loss: 0.62367| %_mask_idx: 0.36873| ppl: 240.85083| %_neg_is_pos: 0.00519| lr: 0.0| temp: 1.97424 | loss: 1.13503| constrast_loss: 4.47593| div_loss: 0.6419| %_mask_idx: 0.37281| ppl: 229.1813| %_neg_is_pos: 0.00432| lr: 0.0| temp: 1.97424 | loss: 1.12951| constrast_loss: 4.45459| div_loss: 0.63438| %_mask_idx: 0.41087| ppl: 233.99416| %_neg_is_pos: 0.00876| lr: 0.0| temp: 1.97423 | loss: 1.12961| constrast_loss: 4.45254| div_loss: 0.65889| %_mask_idx: 0.40085| ppl: 218.30927| %_neg_is_pos: 0.00479| lr: 0.0| temp: 1.97423 | loss: 1.13327| constrast_loss: 4.47096| div_loss: 0.62099| %_mask_idx: 0.40053| ppl: 242.56754| %_neg_is_pos: 0.00211| lr: 0.0| temp: 1.97422 | loss: 1.12936| constrast_loss: 4.45349| div_loss: 0.63952| %_mask_idx: 0.40617| ppl: 230.71024| %_neg_is_pos: 0.00498| lr: 0.0| temp: 1.97422 | loss: 1.12757| constrast_loss: 4.44748| div_loss: 0.62784| %_mask_idx: 0.37312| ppl: 238.18356| %_neg_is_pos: 0.00494| lr: 0.0| temp: 1.97421 | loss: 1.14853| constrast_loss: 4.53281| div_loss: 0.61304| %_mask_idx: 0.43531| ppl: 247.65604| %_neg_is_pos: 0.0022| lr: 0.0| temp: 1.97421 | loss: 1.14359| constrast_loss: 4.51435| div_loss: 0.60014| %_mask_idx: 0.44267| ppl: 255.90805| %_neg_is_pos: 0.0018| lr: 0.0| temp: 1.97419 | loss: 1.12584| constrast_loss: 4.44031| div_loss: 0.6305| %_mask_idx: 0.40069| ppl: 236.48285| %_neg_is_pos: 0.00438| lr: 0.0| temp: 1.97419 | loss: 1.13267| constrast_loss: 4.46805| div_loss: 0.62636| %_mask_idx: 0.3833| ppl: 239.12656| %_neg_is_pos: 0.00247| lr: 0.0| temp: 1.97418 | loss: 1.13265| constrast_loss: 4.46803| div_loss: 0.62575| %_mask_idx: 0.42779| ppl: 239.52042| %_neg_is_pos: 0.00354| lr: 0.0| temp: 1.97418 | loss: 1.13238| constrast_loss: 4.46172| div_loss: 0.67783| %_mask_idx: 0.31156| ppl: 206.19177| %_neg_is_pos: 0.00703| lr: 0.0| temp: 1.97417 | loss: 1.12792| constrast_loss: 4.44749| div_loss: 0.64182| %_mask_idx: 0.38033| ppl: 229.23364| %_neg_is_pos: 0.00456| lr: 0.0| temp: 1.97417 | loss: 1.12576| constrast_loss: 4.43665| div_loss: 0.66373| %_mask_idx: 0.38095| ppl: 215.2132| %_neg_is_pos: 0.00767| lr: 0.0| temp: 1.97416 | loss: 1.13754| constrast_loss: 4.48863| div_loss: 0.61546| %_mask_idx: 0.42074| ppl: 246.10849| %_neg_is_pos: 0.00384| lr: 0.0| temp: 1.97416 | loss: 1.12611| constrast_loss: 4.44131| div_loss: 0.63138| %_mask_idx: 0.34461| ppl: 235.91602| %_neg_is_pos: 0.00742| lr: 0.0| temp: 1.97414 | loss: 1.13216| constrast_loss: 4.46609| div_loss: 0.62546| %_mask_idx: 0.43217| ppl: 239.70317| %_neg_is_pos: 0.00326| lr: 0.0| temp: 1.97414 | loss: 1.14049| constrast_loss: 4.49831| div_loss: 0.63631| %_mask_idx: 0.3692| ppl: 232.76378| %_neg_is_pos: 0.00307| lr: 0.0| temp: 1.97413 | loss: 1.12555| constrast_loss: 4.439| div_loss: 0.63201| %_mask_idx: 0.36043| ppl: 235.5152| %_neg_is_pos: 0.00523| lr: 0.0| temp: 1.97413 | loss: 1.1296| constrast_loss: 4.45476| div_loss: 0.63646| %_mask_idx: 0.43578| ppl: 232.66418| %_neg_is_pos: 0.00223| lr: 0.0| temp: 1.97412 | loss: 1.13589| constrast_loss: 4.48074| div_loss: 0.62802| %_mask_idx: 0.38675| ppl: 238.06903| %_neg_is_pos: 0.00431| lr: 0.0| temp: 1.97412 | loss: 1.12803| constrast_loss: 4.45025| div_loss: 0.61877| %_mask_idx: 0.34398| ppl: 243.98724| %_neg_is_pos: 0.0028| lr: 0.0| temp: 1.97411 | loss: 1.1418| constrast_loss: 4.50383| div_loss: 0.63355| %_mask_idx: 0.43139| ppl: 234.52844| %_neg_is_pos: 0.00314| lr: 0.0| temp: 1.97411 | loss: 1.13005| constrast_loss: 4.45699| div_loss: 0.63213| %_mask_idx: 0.40946| ppl: 235.44002| %_neg_is_pos: 0.00293| lr: 0.0| temp: 1.97409 | loss: 1.12364| constrast_loss: 4.43007| div_loss: 0.64481| %_mask_idx: 0.35056| ppl: 227.32253| %_neg_is_pos: 0.00738| lr: 0.0| temp: 1.97409 | loss: 1.1295| constrast_loss: 4.45625| div_loss: 0.61738| %_mask_idx: 0.39004| ppl: 244.87691| %_neg_is_pos: 0.00861| lr: 0.0| temp: 1.97408 | loss: 1.13018| constrast_loss: 4.45365| div_loss: 0.67056| %_mask_idx: 0.38268| ppl: 210.84171| %_neg_is_pos: 0.005| lr: 0.0| temp: 1.97408 | loss: 1.1369| constrast_loss: 4.48523| div_loss: 0.62387| %_mask_idx: 0.41761| ppl: 240.72408| %_neg_is_pos: 0.00345| lr: 0.0| temp: 1.97406 | loss: 1.13956| constrast_loss: 4.49598| div_loss: 0.6228| %_mask_idx: 0.41259| ppl: 241.40984| %_neg_is_pos: 0.0038| lr: 0.0| temp: 1.97406 | loss: 1.12764| constrast_loss: 4.44513| div_loss: 0.65428| %_mask_idx: 0.35667| ppl: 221.25771| %_neg_is_pos: 0.00412| lr: 0.0| temp: 1.97405 | loss: 1.12253| constrast_loss: 4.4253| div_loss: 0.64832| %_mask_idx: 0.33929| ppl: 225.07718| %_neg_is_pos: 0.00441| lr: 0.0| temp: 1.97405 | loss: 1.13462| constrast_loss: 4.47446| div_loss: 0.64009| %_mask_idx: 0.3927| ppl: 230.33968| %_neg_is_pos: 0.00525| lr: 0.0| temp: 1.97404 | loss: 1.13003| constrast_loss: 4.45803| div_loss: 0.62085| %_mask_idx: 0.39771| ppl: 242.65611| %_neg_is_pos: 0.00236| lr: 0.0| temp: 1.97404 | loss: 1.13592| constrast_loss: 4.48109| div_loss: 0.62586| %_mask_idx: 0.40257| ppl: 239.44846| %_neg_is_pos: 0.00284| lr: 0.0| temp: 1.97403 | loss: 1.13713| constrast_loss: 4.48445| div_loss: 0.64062| %_mask_idx: 0.40695| ppl: 230.00067| %_neg_is_pos: 0.00495| lr: 0.0| temp: 1.97403 | loss: 1.12109| constrast_loss: 4.41858| div_loss: 0.65783| %_mask_idx: 0.36263| ppl: 218.98639| %_neg_is_pos: 0.00791| lr: 0.0| temp: 1.97401 | loss: 1.14269| constrast_loss: 4.50935| div_loss: 0.61401| %_mask_idx: 0.38565| ppl: 247.03072| %_neg_is_pos: 0.00288| lr: 0.0| temp: 1.97401 | loss: 1.13463| constrast_loss: 4.47409| div_loss: 0.6442| %_mask_idx: 0.36294| ppl: 227.71242| %_neg_is_pos: 0.00475| lr: 0.0| temp: 1.974 | loss: 1.13064| constrast_loss: 4.45976| div_loss: 0.62786| %_mask_idx: 0.34884| ppl: 238.16913| %_neg_is_pos: 0.00821| lr: 0.0| temp: 1.974 | loss: 1.1366| constrast_loss: 4.48278| div_loss: 0.63607| %_mask_idx: 0.38236| ppl: 232.91777| %_neg_is_pos: 0.00451| lr: 0.0| temp: 1.97399 | loss: 1.13708| constrast_loss: 4.48419| div_loss: 0.6414| %_mask_idx: 0.40382| ppl: 229.50612| %_neg_is_pos: 0.00258| lr: 0.0| temp: 1.97399 | loss: 1.1484| constrast_loss: 4.53119| div_loss: 0.62425| %_mask_idx: 0.39254| ppl: 240.47842| %_neg_is_pos: 0.00376| lr: 0.0| temp: 1.97398 | loss: 1.13035| constrast_loss: 4.45696| div_loss: 0.64452| %_mask_idx: 0.38863| ppl: 227.50787| %_neg_is_pos: 0.00354| lr: 0.0| temp: 1.97398 | loss: 1.13461| constrast_loss: 4.47709| div_loss: 0.61353| %_mask_idx: 0.38315| ppl: 247.34341| %_neg_is_pos: 0.00357| lr: 0.0| temp: 1.97396 | loss: 1.13768| constrast_loss: 4.48914| div_loss: 0.61557| %_mask_idx: 0.42935| ppl: 246.03363| %_neg_is_pos: 0.00339| lr: 0.0| temp: 1.97396 | loss: 1.13472| constrast_loss: 4.47743| div_loss: 0.61441| %_mask_idx: 0.39662| ppl: 246.77496| %_neg_is_pos: 0.00479| lr: 0.0| temp: 1.97395 | loss: 1.12783| constrast_loss: 4.44537| div_loss: 0.65964| %_mask_idx: 0.40429| ppl: 217.83246| %_neg_is_pos: 0.00662| lr: 0.0| temp: 1.97395 | loss: 1.11653| constrast_loss: 4.40092| div_loss: 0.65185| %_mask_idx: 0.3761| ppl: 222.81296| %_neg_is_pos: 0.00633| lr: 0.0| temp: 1.97394 | loss: 1.13387| constrast_loss: 4.47437| div_loss: 0.61121| %_mask_idx: 0.41745| ppl: 248.82324| %_neg_is_pos: 0.00202| lr: 0.0| temp: 1.97394 | loss: 1.1315| constrast_loss: 4.46175| div_loss: 0.64248| %_mask_idx: 0.35088| ppl: 228.81329| %_neg_is_pos: 0.00536| lr: 0.0| temp: 1.97393 | loss: 1.12186| constrast_loss: 4.42364| div_loss: 0.63786| %_mask_idx: 0.42763| ppl: 231.76978| %_neg_is_pos: 0.00555| lr: 0.0| temp: 1.97393 | loss: 1.13329| constrast_loss: 4.46983| div_loss: 0.63316| %_mask_idx: 0.44032| ppl: 234.77908| %_neg_is_pos: 0.00194| lr: 0.0| temp: 1.97391 | loss: 1.13092| constrast_loss: 4.46102| div_loss: 0.6267| %_mask_idx: 0.36748| ppl: 238.9115| %_neg_is_pos: 0.00502| lr: 0.0| temp: 1.97391 | loss: 1.14924| constrast_loss: 4.53235| div_loss: 0.64598| %_mask_idx: 0.40946| ppl: 226.57378| %_neg_is_pos: 0.0037| lr: 0.0| temp: 1.9739 | loss: 1.11927| constrast_loss: 4.41239| div_loss: 0.64692| %_mask_idx: 0.35135| ppl: 225.96817| %_neg_is_pos: 0.00854| lr: 0.0| temp: 1.9739 | loss: 1.13725| constrast_loss: 4.48784| div_loss: 0.61169| %_mask_idx: 0.41165| ppl: 248.51767| %_neg_is_pos: 0.0027| lr: 0.0| temp: 1.97388 | loss: 1.13973| constrast_loss: 4.49648| div_loss: 0.62459| %_mask_idx: 0.38925| ppl: 240.26408| %_neg_is_pos: 0.00366| lr: 0.0| temp: 1.97388 | loss: 1.1413| constrast_loss: 4.5033| div_loss: 0.61905| %_mask_idx: 0.40461| ppl: 243.80594| %_neg_is_pos: 0.00274| lr: 0.0| temp: 1.97387 | loss: 1.11448| constrast_loss: 4.38918| div_loss: 0.68751| %_mask_idx: 0.33349| ppl: 199.99451| %_neg_is_pos: 0.00885| lr: 0.0| temp: 1.97387 | loss: 1.13528| constrast_loss: 4.48092| div_loss: 0.60195| %_mask_idx: 0.40226| ppl: 254.754| %_neg_is_pos: 0.00344| lr: 0.0| temp: 1.97386 | loss: 1.12188| constrast_loss: 4.4223| div_loss: 0.65224| %_mask_idx: 0.39035| ppl: 222.56366| %_neg_is_pos: 0.00634| lr: 0.0| temp: 1.97386 | loss: 1.12889| constrast_loss: 4.44878| div_loss: 0.66765| %_mask_idx: 0.38111| ppl: 212.70148| %_neg_is_pos: 0.00506| lr: 0.0| temp: 1.97385 | loss: 1.14427| constrast_loss: 4.51385| div_loss: 0.63221| %_mask_idx: 0.36873| ppl: 235.3826| %_neg_is_pos: 0.00291| lr: 0.0| temp: 1.97385 | loss: 1.12377| constrast_loss: 4.43228| div_loss: 0.6282| %_mask_idx: 0.38001| ppl: 237.95435| %_neg_is_pos: 0.00572| lr: 0.0| temp: 1.97383 | loss: 1.1334| constrast_loss: 4.47158| div_loss: 0.62004| %_mask_idx: 0.38816| ppl: 243.1759| %_neg_is_pos: 0.00437| lr: 0.0| temp: 1.97383 | loss: 1.12053| constrast_loss: 4.41576| div_loss: 0.66352| %_mask_idx: 0.32989| ppl: 215.34671| %_neg_is_pos: 0.00536| lr: 0.0| temp: 1.97382 | loss: 1.1282| constrast_loss: 4.4507| div_loss: 0.62104| %_mask_idx: 0.37892| ppl: 242.5327| %_neg_is_pos: 0.00544| lr: 0.0| temp: 1.97382 | loss: 1.1252| constrast_loss: 4.43552| div_loss: 0.65297| %_mask_idx: 0.35119| ppl: 222.09747| %_neg_is_pos: 0.00576| lr: 0.0| temp: 1.97381 | loss: 1.13095| constrast_loss: 4.46051| div_loss: 0.63269| %_mask_idx: 0.40445| ppl: 235.0755| %_neg_is_pos: 0.00487| lr: 0.0| temp: 1.97381 | loss: 1.12986| constrast_loss: 4.45576| div_loss: 0.63692| %_mask_idx: 0.38659| ppl: 232.36975| %_neg_is_pos: 0.00449| lr: 0.0| temp: 1.9738 | loss: 1.12827| constrast_loss: 4.45116| div_loss: 0.61907| %_mask_idx: 0.36482| ppl: 243.79704| %_neg_is_pos: 0.00847| lr: 0.0| temp: 1.9738 | loss: 1.12107| constrast_loss: 4.41871| div_loss: 0.65584| %_mask_idx: 0.35135| ppl: 220.26013| %_neg_is_pos: 0.00871| lr: 0.0| temp: 1.97378 | loss: 1.13277| constrast_loss: 4.46707| div_loss: 0.64016| %_mask_idx: 0.3172| ppl: 230.29758| %_neg_is_pos: 0.00638| lr: 0.0| temp: 1.97378 | loss: 1.13391| constrast_loss: 4.47213| div_loss: 0.63501| %_mask_idx: 0.35683| ppl: 233.59427| %_neg_is_pos: 0.00355| lr: 0.0| temp: 1.97377 | loss: 1.13706| constrast_loss: 4.4849| div_loss: 0.6334| %_mask_idx: 0.4328| ppl: 234.62608| %_neg_is_pos: 0.00231| lr: 0.0| temp: 1.97377 | loss: 1.12724| constrast_loss: 4.4461| div_loss: 0.6285| %_mask_idx: 0.33647| ppl: 237.75908| %_neg_is_pos: 0.00588| lr: 0.0| temp: 1.97376 | loss: 1.1362| constrast_loss: 4.48218| div_loss: 0.62608| %_mask_idx: 0.40993| ppl: 239.3114| %_neg_is_pos: 0.0029| lr: 0.0| temp: 1.97376 | loss: 1.13277| constrast_loss: 4.46728| div_loss: 0.63789| %_mask_idx: 0.37046| ppl: 231.7495| %_neg_is_pos: 0.00568| lr: 0.0| temp: 1.97375 | loss: 1.12394| constrast_loss: 4.43147| div_loss: 0.64288| %_mask_idx: 0.38534| ppl: 228.55634| %_neg_is_pos: 0.00836| lr: 0.0| temp: 1.97375 | loss: 1.13908| constrast_loss: 4.4936| div_loss: 0.62723| %_mask_idx: 0.33239| ppl: 238.57071| %_neg_is_pos: 0.00606| lr: 0.0| temp: 1.97373 | loss: 1.13772| constrast_loss: 4.48943| div_loss: 0.61443| %_mask_idx: 0.39442| ppl: 246.76315| %_neg_is_pos: 0.00669| lr: 0.0| temp: 1.97373 | loss: 1.13004| constrast_loss: 4.45434| div_loss: 0.65838| %_mask_idx: 0.41103| ppl: 218.6369| %_neg_is_pos: 0.00507| lr: 0.0| temp: 1.97372 | loss: 1.13211| constrast_loss: 4.4662| div_loss: 0.62239| %_mask_idx: 0.37453| ppl: 241.67285| %_neg_is_pos: 0.00336| lr: 0.0| temp: 1.97372 [2021-09-02 02:57:25,406] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 02:57:25,406] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.1272| constrast_loss: 4.44529| div_loss: 0.63495| %_mask_idx: 0.33929| ppl: 233.63412| %_neg_is_pos: 0.00736| lr: 0.0| temp: 1.9737 | loss: 1.13053| constrast_loss: 4.45834| div_loss: 0.63793| %_mask_idx: 0.39521| ppl: 231.72412| %_neg_is_pos: 0.00276| lr: 0.0| temp: 1.9737 | loss: 1.12796| constrast_loss: 4.4489| div_loss: 0.62928| %_mask_idx: 0.3515| ppl: 237.25998| %_neg_is_pos: 0.00387| lr: 0.0| temp: 1.97369 | loss: 1.14281| constrast_loss: 4.5092| div_loss: 0.62053| %_mask_idx: 0.36873| ppl: 242.86359| %_neg_is_pos: 0.00414| lr: 0.0| temp: 1.97369 | loss: 1.12705| constrast_loss: 4.44405| div_loss: 0.64156| %_mask_idx: 0.42184| ppl: 229.40311| %_neg_is_pos: 0.00329| lr: 0.0| temp: 1.97368 | loss: 1.14537| constrast_loss: 4.51865| div_loss: 0.62825| %_mask_idx: 0.4411| ppl: 237.91704| %_neg_is_pos: 0.00171| lr: 0.0| temp: 1.97368 | loss: 1.13819| constrast_loss: 4.48926| div_loss: 0.63503| %_mask_idx: 0.39364| ppl: 233.57787| %_neg_is_pos: 0.00295| lr: 0.0| temp: 1.97367 | loss: 1.13348| constrast_loss: 4.47132| div_loss: 0.62619| %_mask_idx: 0.39803| ppl: 239.23627| %_neg_is_pos: 0.00265| lr: 0.0| temp: 1.97367 | loss: 1.13608| constrast_loss: 4.47991| div_loss: 0.64417| %_mask_idx: 0.37202| ppl: 227.72945| %_neg_is_pos: 0.00665| lr: 0.0| temp: 1.97365 | loss: 1.13392| constrast_loss: 4.47393| div_loss: 0.61751| %_mask_idx: 0.38362| ppl: 244.79163| %_neg_is_pos: 0.00304| lr: 0.0| temp: 1.97365 | loss: 1.1317| constrast_loss: 4.46112| div_loss: 0.65674| %_mask_idx: 0.34602| ppl: 219.68884| %_neg_is_pos: 0.00764| lr: 0.0| temp: 1.97365 | loss: 1.11845| constrast_loss: 4.41066| div_loss: 0.63136| %_mask_idx: 0.34853| ppl: 235.92789| %_neg_is_pos: 0.01002| lr: 0.0| temp: 1.97365 | loss: 1.11954| constrast_loss: 4.41427| div_loss: 0.63908| %_mask_idx: 0.36106| ppl: 230.99176| %_neg_is_pos: 0.00297| lr: 0.0| temp: 1.97364 | loss: 1.12066| constrast_loss: 4.41731| div_loss: 0.6533| %_mask_idx: 0.35808| ppl: 221.88879| %_neg_is_pos: 0.00562| lr: 0.0| temp: 1.97364 | loss: 1.13235| constrast_loss: 4.46427| div_loss: 0.65122| %_mask_idx: 0.43703| ppl: 223.21866| %_neg_is_pos: 0.00328| lr: 0.0| temp: 1.97363 | loss: 1.13431| constrast_loss: 4.475| div_loss: 0.62223| %_mask_idx: 0.36764| ppl: 241.77208| %_neg_is_pos: 0.00284| lr: 0.0| temp: 1.97363 | loss: 1.12275| constrast_loss: 4.42561| div_loss: 0.65408| %_mask_idx: 0.41291| ppl: 221.3869| %_neg_is_pos: 0.00319| lr: 0.0| temp: 1.97361 | loss: 1.12668| constrast_loss: 4.44202| div_loss: 0.64703| %_mask_idx: 0.39348| ppl: 225.8978| %_neg_is_pos: 0.00362| lr: 0.0| temp: 1.97361 | loss: 1.13544| constrast_loss: 4.47894| div_loss: 0.62827| %_mask_idx: 0.37892| ppl: 237.90964| %_neg_is_pos: 0.00311| lr: 0.0| temp: 1.9736 | loss: 1.13448| constrast_loss: 4.4753| div_loss: 0.62619| %_mask_idx: 0.33615| ppl: 239.24152| %_neg_is_pos: 0.00406| lr: 0.0| temp: 1.9736 | loss: 1.13302| constrast_loss: 4.47092| div_loss: 0.61176| %_mask_idx: 0.41792| ppl: 248.47523| %_neg_is_pos: 0.00234| lr: 0.0| temp: 1.97359 | loss: 1.13105| constrast_loss: 4.45794| div_loss: 0.66266| %_mask_idx: 0.39677| ppl: 215.89934| %_neg_is_pos: 0.00409| lr: 0.0| temp: 1.97359 | loss: 1.13696| constrast_loss: 4.48368| div_loss: 0.6417| %_mask_idx: 0.38878| ppl: 229.31154| %_neg_is_pos: 0.00316| lr: 0.0| temp: 1.97358 | loss: 1.12617| constrast_loss: 4.43963| div_loss: 0.65031| %_mask_idx: 0.34837| ppl: 223.80357| %_neg_is_pos: 0.00397| lr: 0.0| temp: 1.97358 | loss: 1.13314| constrast_loss: 4.46916| div_loss: 0.63393| %_mask_idx: 0.42058| ppl: 234.28488| %_neg_is_pos: 0.00265| lr: 0.0| temp: 1.97356 | loss: 1.12972| constrast_loss: 4.4569| div_loss: 0.61979| %_mask_idx: 0.39724| ppl: 243.3333| %_neg_is_pos: 0.00189| lr: 0.0| temp: 1.97356 | loss: 1.12448| constrast_loss: 4.43258| div_loss: 0.6534| %_mask_idx: 0.44283| ppl: 221.82356| %_neg_is_pos: 0.00326| lr: 0.0| temp: 1.97355 | loss: 1.13055| constrast_loss: 4.45778| div_loss: 0.64419| %_mask_idx: 0.37986| ppl: 227.71935| %_neg_is_pos: 0.00382| lr: 0.0| temp: 1.97355 | loss: 1.12095| constrast_loss: 4.41622| div_loss: 0.6756| %_mask_idx: 0.36513| ppl: 207.61368| %_neg_is_pos: 0.00469| lr: 0.0| temp: 1.97353 | loss: 1.14079| constrast_loss: 4.50102| div_loss: 0.62129| %_mask_idx: 0.41604| ppl: 242.37137| %_neg_is_pos: 0.00333| lr: 0.0| temp: 1.97353 | loss: 1.13666| constrast_loss: 4.48224| div_loss: 0.64385| %_mask_idx: 0.40226| ppl: 227.93817| %_neg_is_pos: 0.00302| lr: 0.0| temp: 1.97352 | loss: 1.13585| constrast_loss: 4.48065| div_loss: 0.62754| %_mask_idx: 0.32722| ppl: 238.37311| %_neg_is_pos: 0.00563| lr: 0.0| temp: 1.97352 | loss: 1.13317| constrast_loss: 4.47037| div_loss: 0.62317| %_mask_idx: 0.39129| ppl: 241.17334| %_neg_is_pos: 0.00427| lr: 0.0| temp: 1.97351 | loss: 1.12625| constrast_loss: 4.44087| div_loss: 0.64117| %_mask_idx: 0.41902| ppl: 229.65388| %_neg_is_pos: 0.00267| lr: 0.0| temp: 1.97351 | loss: 1.12747| constrast_loss: 4.44712| div_loss: 0.62763| %_mask_idx: 0.36858| ppl: 238.31459| %_neg_is_pos: 0.00391| lr: 0.0| temp: 1.9735 | loss: 1.1364| constrast_loss: 4.48322| div_loss: 0.62374| %_mask_idx: 0.38549| ppl: 240.80705| %_neg_is_pos: 0.00274| lr: 0.0| temp: 1.9735 | loss: 1.13953| constrast_loss: 4.49611| div_loss: 0.62027| %_mask_idx: 0.4281| ppl: 243.02435| %_neg_is_pos: 0.00225| lr: 0.0| temp: 1.97348 | loss: 1.13659| constrast_loss: 4.48404| div_loss: 0.62313| %_mask_idx: 0.4187| ppl: 241.19675| %_neg_is_pos: 0.00241| lr: 0.0| temp: 1.97348 | loss: 1.12291| constrast_loss: 4.42496| div_loss: 0.66666| %_mask_idx: 0.35573| ppl: 213.33994| %_neg_is_pos: 0.00475| lr: 0.0| temp: 1.97347 | loss: 1.11863| constrast_loss: 4.40921| div_loss: 0.65292| %_mask_idx: 0.4563| ppl: 222.13422| %_neg_is_pos: 0.00261| lr: 0.0| temp: 1.97347 | loss: 1.12866| constrast_loss: 4.44916| div_loss: 0.65493| %_mask_idx: 0.38831| ppl: 220.84357| %_neg_is_pos: 0.00415| lr: 0.0| temp: 1.97346 | loss: 1.13377| constrast_loss: 4.47057| div_loss: 0.64496| %_mask_idx: 0.40022| ppl: 227.22577| %_neg_is_pos: 0.00445| lr: 0.0| temp: 1.97346 | loss: 1.12128| constrast_loss: 4.42115| div_loss: 0.63966| %_mask_idx: 0.4032| ppl: 230.61763| %_neg_is_pos: 0.0037| lr: 0.0| temp: 1.97345 | loss: 1.12988| constrast_loss: 4.45502| div_loss: 0.64507| %_mask_idx: 0.41087| ppl: 227.15486| %_neg_is_pos: 0.00432| lr: 0.0| temp: 1.97345 | loss: 1.12859| constrast_loss: 4.45024| div_loss: 0.64101| %_mask_idx: 0.41385| ppl: 229.75114| %_neg_is_pos: 0.00428| lr: 0.0| temp: 1.97343 | loss: 1.13536| constrast_loss: 4.48131| div_loss: 0.60144| %_mask_idx: 0.37829| ppl: 255.0782| %_neg_is_pos: 0.00325| lr: 0.0| temp: 1.97343 | loss: 1.14012| constrast_loss: 4.49743| div_loss: 0.63035| %_mask_idx: 0.37093| ppl: 236.57596| %_neg_is_pos: 0.00391| lr: 0.0| temp: 1.97342 | loss: 1.13207| constrast_loss: 4.46604| div_loss: 0.62254| %_mask_idx: 0.3703| ppl: 241.57175| %_neg_is_pos: 0.00309| lr: 0.0| temp: 1.97342 | loss: 1.13026| constrast_loss: 4.45748| div_loss: 0.63552| %_mask_idx: 0.40445| ppl: 233.26733| %_neg_is_pos: 0.00337| lr: 0.0| temp: 1.97341 | loss: 1.12535| constrast_loss: 4.43604| div_loss: 0.65349| %_mask_idx: 0.39098| ppl: 221.7644| %_neg_is_pos: 0.0052| lr: 0.0| temp: 1.97341 | loss: 1.1237| constrast_loss: 4.42946| div_loss: 0.6536| %_mask_idx: 0.37876| ppl: 221.69519| %_neg_is_pos: 0.00406| lr: 0.0| temp: 1.9734 | loss: 1.1268| constrast_loss: 4.44447| div_loss: 0.62746| %_mask_idx: 0.41886| ppl: 238.42537| %_neg_is_pos: 0.0034| lr: 0.0| temp: 1.9734 | loss: 1.13116| constrast_loss: 4.46137| div_loss: 0.63275| %_mask_idx: 0.41996| ppl: 235.03915| %_neg_is_pos: 0.00356| lr: 0.0| temp: 1.97338 | loss: 1.12913| constrast_loss: 4.45086| div_loss: 0.65666| %_mask_idx: 0.30827| ppl: 219.73959| %_neg_is_pos: 0.00648| lr: 0.0| temp: 1.97338 | loss: 1.13597| constrast_loss: 4.48149| div_loss: 0.62408| %_mask_idx: 0.37422| ppl: 240.58914| %_neg_is_pos: 0.00419| lr: 0.0| temp: 1.97337 | loss: 1.1374| constrast_loss: 4.48862| div_loss: 0.61002| %_mask_idx: 0.42654| ppl: 249.58719| %_neg_is_pos: 0.00145| lr: 0.0| temp: 1.97337 | loss: 1.12779| constrast_loss: 4.44786| div_loss: 0.63307| %_mask_idx: 0.35761| ppl: 234.83723| %_neg_is_pos: 0.00466| lr: 0.0| temp: 1.97335 | loss: 1.13106| constrast_loss: 4.46118| div_loss: 0.63054| %_mask_idx: 0.41933| ppl: 236.455| %_neg_is_pos: 0.00232| lr: 0.0| temp: 1.97335 | loss: 1.14094| constrast_loss: 4.50174| div_loss: 0.62006| %_mask_idx: 0.38111| ppl: 243.16055| %_neg_is_pos: 0.00369| lr: 0.0| temp: 1.97334 | loss: 1.13121| constrast_loss: 4.46078| div_loss: 0.64071| %_mask_idx: 0.33412| ppl: 229.94778| %_neg_is_pos: 0.0029| lr: 0.0| temp: 1.97334 | loss: 1.13431| constrast_loss: 4.47321| div_loss: 0.64029| %_mask_idx: 0.39677| ppl: 230.21634| %_neg_is_pos: 0.00336| lr: 0.0| temp: 1.97333 | loss: 1.13174| constrast_loss: 4.46268| div_loss: 0.64263| %_mask_idx: 0.3739| ppl: 228.71887| %_neg_is_pos: 0.00357| lr: 0.0| temp: 1.97333 | loss: 1.13911| constrast_loss: 4.49309| div_loss: 0.63327| %_mask_idx: 0.44612| ppl: 234.70821| %_neg_is_pos: 0.00354| lr: 0.0| temp: 1.97332 | loss: 1.12625| constrast_loss: 4.43986| div_loss: 0.65131| %_mask_idx: 0.35323| ppl: 223.16202| %_neg_is_pos: 0.00596| lr: 0.0| temp: 1.97332 | loss: 1.13079| constrast_loss: 4.45972| div_loss: 0.63456| %_mask_idx: 0.40476| ppl: 233.87965| %_neg_is_pos: 0.00447| lr: 0.0| temp: 1.9733 | loss: 1.13372| constrast_loss: 4.47187| div_loss: 0.63026| %_mask_idx: 0.43139| ppl: 236.63307| %_neg_is_pos: 0.00264| lr: 0.0| temp: 1.9733 | loss: 1.12987| constrast_loss: 4.45732| div_loss: 0.6217| %_mask_idx: 0.39991| ppl: 242.11209| %_neg_is_pos: 0.00317| lr: 0.0| temp: 1.97329 | loss: 1.1339| constrast_loss: 4.47282| div_loss: 0.62789| %_mask_idx: 0.42513| ppl: 238.14822| %_neg_is_pos: 0.00267| lr: 0.0| temp: 1.97329 | loss: 1.12257| constrast_loss: 4.42718| div_loss: 0.63103| %_mask_idx: 0.35714| ppl: 236.14099| %_neg_is_pos: 0.00376| lr: 0.0| temp: 1.97328 | loss: 1.13009| constrast_loss: 4.45689| div_loss: 0.63458| %_mask_idx: 0.3584| ppl: 233.87019| %_neg_is_pos: 0.00403| lr: 0.0| temp: 1.97328 | loss: 1.12591| constrast_loss: 4.43881| div_loss: 0.64826| %_mask_idx: 0.33271| ppl: 225.11502| %_neg_is_pos: 0.00342| lr: 0.0| temp: 1.97327 | loss: 1.13612| constrast_loss: 4.48346| div_loss: 0.61041| %_mask_idx: 0.34273| ppl: 249.33636| %_neg_is_pos: 0.00285| lr: 0.0| temp: 1.97327 | loss: 1.12868| constrast_loss: 4.45094| div_loss: 0.63772| %_mask_idx: 0.4245| ppl: 231.85941| %_neg_is_pos: 0.00364| lr: 0.0| temp: 1.97325 | loss: 1.13185| constrast_loss: 4.46459| div_loss: 0.62823| %_mask_idx: 0.36012| ppl: 237.93044| %_neg_is_pos: 0.00247| lr: 0.0| temp: 1.97325 | loss: 1.13649| constrast_loss: 4.48293| div_loss: 0.63044| %_mask_idx: 0.42215| ppl: 236.51657| %_neg_is_pos: 0.00225| lr: 0.0| temp: 1.97324 | loss: 1.13979| constrast_loss: 4.4989| div_loss: 0.60252| %_mask_idx: 0.4505| ppl: 254.38586| %_neg_is_pos: 0.00172| lr: 0.0| temp: 1.97324 | loss: 1.1266| constrast_loss: 4.44006| div_loss: 0.6636| %_mask_idx: 0.3844| ppl: 215.29578| %_neg_is_pos: 0.00488| lr: 0.0| temp: 1.97323 | loss: 1.13494| constrast_loss: 4.47791| div_loss: 0.61864| %_mask_idx: 0.42058| ppl: 244.06847| %_neg_is_pos: 0.00225| lr: 0.0| temp: 1.97323 | loss: 1.12716| constrast_loss: 4.44392| div_loss: 0.64735| %_mask_idx: 0.38863| ppl: 225.69382| %_neg_is_pos: 0.00374| lr: 0.0| temp: 1.97322 | loss: 1.1178| constrast_loss: 4.40637| div_loss: 0.64818| %_mask_idx: 0.33772| ppl: 225.16245| %_neg_is_pos: 0.00448| lr: 0.0| temp: 1.97322 | loss: 1.12853| constrast_loss: 4.44915| div_loss: 0.64969| %_mask_idx: 0.38957| ppl: 224.19647| %_neg_is_pos: 0.00314| lr: 0.0| temp: 1.9732 | loss: 1.13313| constrast_loss: 4.46862| div_loss: 0.63908| %_mask_idx: 0.38863| ppl: 230.98636| %_neg_is_pos: 0.00277| lr: 0.0| temp: 1.9732 | loss: 1.12625| constrast_loss: 4.44054| div_loss: 0.64451| %_mask_idx: 0.35495| ppl: 227.51057| %_neg_is_pos: 0.00538| lr: 0.0| temp: 1.97319 | loss: 1.13542| constrast_loss: 4.47807| div_loss: 0.63601| %_mask_idx: 0.39881| ppl: 232.95619| %_neg_is_pos: 0.00273| lr: 0.0| temp: 1.97319 | loss: 1.14041| constrast_loss: 4.49772| div_loss: 0.63905| %_mask_idx: 0.4032| ppl: 231.00905| %_neg_is_pos: 0.00285| lr: 0.0| temp: 1.97317 | loss: 1.12229| constrast_loss: 4.42442| div_loss: 0.64752| %_mask_idx: 0.3584| ppl: 225.58667| %_neg_is_pos: 0.00378| lr: 0.0| temp: 1.97317 | loss: 1.12858| constrast_loss: 4.45119| div_loss: 0.63119| %_mask_idx: 0.3963| ppl: 236.03778| %_neg_is_pos: 0.00551| lr: 0.0| temp: 1.97316 | loss: 1.12514| constrast_loss: 4.43714| div_loss: 0.63405| %_mask_idx: 0.3515| ppl: 234.20589| %_neg_is_pos: 0.00272| lr: 0.0| temp: 1.97316 | loss: 1.13143| constrast_loss: 4.46328| div_loss: 0.62444| %_mask_idx: 0.40053| ppl: 240.35892| %_neg_is_pos: 0.00273| lr: 0.0| temp: 1.97315 | loss: 1.13189| constrast_loss: 4.46393| div_loss: 0.63633| %_mask_idx: 0.40053| ppl: 232.74786| %_neg_is_pos: 0.00385| lr: 0.0| temp: 1.97315 | loss: 1.13287| constrast_loss: 4.46809| div_loss: 0.63411| %_mask_idx: 0.40132| ppl: 234.16641| %_neg_is_pos: 0.00345| lr: 0.0| temp: 1.97314 | loss: 1.11278| constrast_loss: 4.38433| div_loss: 0.66776| %_mask_idx: 0.38487| ppl: 212.63087| %_neg_is_pos: 0.00427| lr: 0.0| temp: 1.97314 | loss: 1.13679| constrast_loss: 4.48291| div_loss: 0.64261| %_mask_idx: 0.401| ppl: 228.73248| %_neg_is_pos: 0.00318| lr: 0.0| temp: 1.97312 | loss: 1.13301| constrast_loss: 4.46927| div_loss: 0.62753| %_mask_idx: 0.3562| ppl: 238.38075| %_neg_is_pos: 0.00435| lr: 0.0| temp: 1.97312 | loss: 1.135| constrast_loss: 4.47707| div_loss: 0.62918| %_mask_idx: 0.41259| ppl: 237.32712| %_neg_is_pos: 0.00195| lr: 0.0| temp: 1.97311 | loss: 1.13906| constrast_loss: 4.49428| div_loss: 0.61964| %_mask_idx: 0.37343| ppl: 243.42996| %_neg_is_pos: 0.00336| lr: 0.0| temp: 1.97311 | loss: 1.12645| constrast_loss: 4.44246| div_loss: 0.63363| %_mask_idx: 0.37281| ppl: 234.47818| %_neg_is_pos: 0.00353| lr: 0.0| temp: 1.9731 | loss: 1.13244| constrast_loss: 4.46725| div_loss: 0.62522| %_mask_idx: 0.38784| ppl: 239.86154| %_neg_is_pos: 0.00268| lr: 0.0| temp: 1.9731 | loss: 1.13134| constrast_loss: 4.4628| div_loss: 0.62579| %_mask_idx: 0.42794| ppl: 239.49385| %_neg_is_pos: 0.00203| lr: 0.0| temp: 1.97309 | loss: 1.14329| constrast_loss: 4.51106| div_loss: 0.62116| %_mask_idx: 0.40962| ppl: 242.45575| %_neg_is_pos: 0.00123| lr: 0.0| temp: 1.97309 | loss: 1.13548| constrast_loss: 4.47986| div_loss: 0.62066| %_mask_idx: 0.36106| ppl: 242.7771| %_neg_is_pos: 0.00223| lr: 0.0| temp: 1.97307 | loss: 1.14392| constrast_loss: 4.5131| div_loss: 0.626| %_mask_idx: 0.45896| ppl: 239.35789| %_neg_is_pos: 0.00223| lr: 0.0| temp: 1.97307 | loss: 1.1415| constrast_loss: 4.5044| div_loss: 0.6161| %_mask_idx: 0.45175| ppl: 245.69687| %_neg_is_pos: 0.00286| lr: 0.0| temp: 1.97306 | loss: 1.12405| constrast_loss: 4.43094| div_loss: 0.65273| %_mask_idx: 0.38409| ppl: 222.2525| %_neg_is_pos: 0.00414| lr: 0.0| temp: 1.97306 | loss: 1.14254| constrast_loss: 4.50742| div_loss: 0.62741| %_mask_idx: 0.43092| ppl: 238.45956| %_neg_is_pos: 0.00399| lr: 0.0| temp: 1.97305 | loss: 1.13228| constrast_loss: 4.46506| div_loss: 0.64041| %_mask_idx: 0.36779| ppl: 230.14032| %_neg_is_pos: 0.00435| lr: 0.0| temp: 1.97305 | loss: 1.13077| constrast_loss: 4.45999| div_loss: 0.63092| %_mask_idx: 0.38346| ppl: 236.2088| %_neg_is_pos: 0.00395| lr: 0.0| temp: 1.97304 | loss: 1.14243| constrast_loss: 4.5099| div_loss: 0.59829| %_mask_idx: 0.40053| ppl: 257.09634| %_neg_is_pos: 0.00265| lr: 0.0| temp: 1.97304 | loss: 1.13122| constrast_loss: 4.46307| div_loss: 0.61814| %_mask_idx: 0.38659| ppl: 244.39247| %_neg_is_pos: 0.00292| lr: 0.0| temp: 1.97302 | loss: 1.14357| constrast_loss: 4.51308| div_loss: 0.6122| %_mask_idx: 0.44721| ppl: 248.19484| %_neg_is_pos: 0.00226| lr: 0.0| temp: 1.97302 | loss: 1.13227| constrast_loss: 4.46494| div_loss: 0.64153| %_mask_idx: 0.40852| ppl: 229.42252| %_neg_is_pos: 0.00356| lr: 0.0| temp: 1.97301 | loss: 1.13114| constrast_loss: 4.46133| div_loss: 0.63224| %_mask_idx: 0.3631| ppl: 235.3667| %_neg_is_pos: 0.00414| lr: 0.0| temp: 1.97301 [2021-09-02 03:06:39,588] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 03:06:39,588] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.12826| constrast_loss: 4.44983| div_loss: 0.63201| %_mask_idx: 0.42403| ppl: 235.51108| %_neg_is_pos: 0.00388| lr: 0.0| temp: 1.97299 | loss: 1.1253| constrast_loss: 4.43567| div_loss: 0.65529| %_mask_idx: 0.38393| ppl: 220.61209| %_neg_is_pos: 0.00511| lr: 0.0| temp: 1.97299 | loss: 1.13377| constrast_loss: 4.47322| div_loss: 0.61845| %_mask_idx: 0.43045| ppl: 244.19196| %_neg_is_pos: 0.00225| lr: 0.0| temp: 1.97298 | loss: 1.12153| constrast_loss: 4.42108| div_loss: 0.65031| %_mask_idx: 0.38737| ppl: 223.79901| %_neg_is_pos: 0.00439| lr: 0.0| temp: 1.97298 | loss: 1.12471| constrast_loss: 4.43247| div_loss: 0.66357| %_mask_idx: 0.33866| ppl: 215.31531| %_neg_is_pos: 0.0041| lr: 0.0| temp: 1.97297 | loss: 1.12842| constrast_loss: 4.44833| div_loss: 0.65326| %_mask_idx: 0.41761| ppl: 221.91074| %_neg_is_pos: 0.00291| lr: 0.0| temp: 1.97297 | loss: 1.12826| constrast_loss: 4.44728| div_loss: 0.65757| %_mask_idx: 0.45865| ppl: 219.15405| %_neg_is_pos: 0.00336| lr: 0.0| temp: 1.97296 | loss: 1.14065| constrast_loss: 4.50037| div_loss: 0.62225| %_mask_idx: 0.3797| ppl: 241.76205| %_neg_is_pos: 0.00344| lr: 0.0| temp: 1.97296 | loss: 1.13554| constrast_loss: 4.47945| div_loss: 0.62715| %_mask_idx: 0.43546| ppl: 238.62141| %_neg_is_pos: 0.0024| lr: 0.0| temp: 1.97294 | loss: 1.11361| constrast_loss: 4.38798| div_loss: 0.66467| %_mask_idx: 0.29903| ppl: 214.61404| %_neg_is_pos: 0.00552| lr: 0.0| temp: 1.97294 | loss: 1.11661| constrast_loss: 4.40085| div_loss: 0.65586| %_mask_idx: 0.38581| ppl: 220.24893| %_neg_is_pos: 0.00452| lr: 0.0| temp: 1.97293 | loss: 1.14277| constrast_loss: 4.50935| div_loss: 0.61752| %_mask_idx: 0.42685| ppl: 244.78702| %_neg_is_pos: 0.0021| lr: 0.0| temp: 1.97293 | loss: 1.13397| constrast_loss: 4.47298| div_loss: 0.62919| %_mask_idx: 0.37328| ppl: 237.3208| %_neg_is_pos: 0.00451| lr: 0.0| temp: 1.97292 | loss: 1.1199| constrast_loss: 4.41432| div_loss: 0.65303| %_mask_idx: 0.37234| ppl: 222.06326| %_neg_is_pos: 0.0047| lr: 0.0| temp: 1.97292 | loss: 1.13755| constrast_loss: 4.48856| div_loss: 0.61649| %_mask_idx: 0.3891| ppl: 245.44711| %_neg_is_pos: 0.00357| lr: 0.0| temp: 1.97292 | loss: 1.14459| constrast_loss: 4.51501| div_loss: 0.63355| %_mask_idx: 0.38769| ppl: 234.5275| %_neg_is_pos: 0.00401| lr: 0.0| temp: 1.97292 | loss: 1.13065| constrast_loss: 4.45916| div_loss: 0.63431| %_mask_idx: 0.38142| ppl: 234.03905| %_neg_is_pos: 0.00565| lr: 0.0| temp: 1.9729| loss: 1.12606| constrast_loss: 4.44006| div_loss: 0.6419| %_mask_idx: 0.38111| ppl: 229.18713| %_neg_is_pos: 0.00357| lr: 0.0| temp: 1.9729 | loss: 1.13221| constrast_loss: 4.46567| div_loss: 0.6316| %_mask_idx: 0.40805| ppl: 235.77786| %_neg_is_pos: 0.00448| lr: 0.0| temp: 1.97289 | loss: 1.13525| constrast_loss: 4.47845| div_loss: 0.62536| %_mask_idx: 0.39051| ppl: 239.767| %_neg_is_pos: 0.00232| lr: 0.0| temp: 1.97289 | loss: 1.13293| constrast_loss: 4.46827| div_loss: 0.63457| %_mask_idx: 0.38894| ppl: 233.87619| %_neg_is_pos: 0.00352| lr: 0.0| temp: 1.97288 | loss: 1.1353| constrast_loss: 4.47637| div_loss: 0.64827| %_mask_idx: 0.34868| ppl: 225.10568| %_neg_is_pos: 0.00531| lr: 0.0| temp: 1.97288 | loss: 1.13579| constrast_loss: 4.47984| div_loss: 0.63303| %_mask_idx: 0.36169| ppl: 234.85953| %_neg_is_pos: 0.0041| lr: 0.0| temp: 1.97287 | loss: 1.12754| constrast_loss: 4.44771| div_loss: 0.62446| %_mask_idx: 0.42105| ppl: 240.34773| %_neg_is_pos: 0.00329| lr: 0.0| temp: 1.97287 | loss: 1.12916| constrast_loss: 4.45284| div_loss: 0.63782| %_mask_idx: 0.41588| ppl: 231.79236| %_neg_is_pos: 0.00355| lr: 0.0| temp: 1.97285| loss: 1.1332| constrast_loss: 4.4694| div_loss: 0.63418| %_mask_idx: 0.38988| ppl: 234.12292| %_neg_is_pos: 0.00306| lr: 0.0| temp: 1.97285 | loss: 1.13063| constrast_loss: 4.45888| div_loss: 0.63626| %_mask_idx: 0.33929| ppl: 232.79285| %_neg_is_pos: 0.00529| lr: 0.0| temp: 1.97284 | loss: 1.13105| constrast_loss: 4.45999| div_loss: 0.642| %_mask_idx: 0.38127| ppl: 229.12036| %_neg_is_pos: 0.00324| lr: 0.0| temp: 1.97284 | loss: 1.12578| constrast_loss: 4.43866| div_loss: 0.64464| %_mask_idx: 0.34179| ppl: 227.4319| %_neg_is_pos: 0.00529| lr: 0.0| temp: 1.97282 | loss: 1.14042| constrast_loss: 4.50042| div_loss: 0.61243| %_mask_idx: 0.35338| ppl: 248.04337| %_neg_is_pos: 0.00308| lr: 0.0| temp: 1.97282 | loss: 1.1345| constrast_loss: 4.47544| div_loss: 0.6256| %_mask_idx: 0.39489| ppl: 239.61554| %_neg_is_pos: 0.00265| lr: 0.0| temp: 1.97281 | loss: 1.12759| constrast_loss: 4.44684| div_loss: 0.63531| %_mask_idx: 0.42747| ppl: 233.4003| %_neg_is_pos: 0.00373| lr: 0.0| temp: 1.97281 | loss: 1.12944| constrast_loss: 4.45488| div_loss: 0.62878| %_mask_idx: 0.43233| ppl: 237.5791| %_neg_is_pos: 0.0037| lr: 0.0| temp: 1.9728 | loss: 1.12449| constrast_loss: 4.43316| div_loss: 0.64804| %_mask_idx: 0.32628| ppl: 225.25192| %_neg_is_pos: 0.00626| lr: 0.0| temp: 1.9728 | loss: 1.14774| constrast_loss: 4.52934| div_loss: 0.61638| %_mask_idx: 0.39004| ppl: 245.51459| %_neg_is_pos: 0.00324| lr: 0.0| temp: 1.97279 | loss: 1.13349| constrast_loss: 4.47107| div_loss: 0.62886| %_mask_idx: 0.35949| ppl: 237.52885| %_neg_is_pos: 0.00349| lr: 0.0| temp: 1.97279 | loss: 1.13893| constrast_loss: 4.49177| div_loss: 0.63935| %_mask_idx: 0.41541| ppl: 230.8187| %_neg_is_pos: 0.00335| lr: 0.0| temp: 1.97277 | loss: 1.1412| constrast_loss: 4.50351| div_loss: 0.61302| %_mask_idx: 0.43014| ppl: 247.66959| %_neg_is_pos: 0.00198| lr: 0.0| temp: 1.97277 | loss: 1.13422| constrast_loss: 4.47389| div_loss: 0.62982| %_mask_idx: 0.40758| ppl: 236.91663| %_neg_is_pos: 0.00354| lr: 0.0| temp: 1.97276 | loss: 1.12043| constrast_loss: 4.41687| div_loss: 0.6485| %_mask_idx: 0.34477| ppl: 224.96054| %_neg_is_pos: 0.00305| lr: 0.0| temp: 1.97276 | loss: 1.13138| constrast_loss: 4.46102| div_loss: 0.64493| %_mask_idx: 0.37594| ppl: 227.24297| %_neg_is_pos: 0.00446| lr: 0.0| temp: 1.97275 | loss: 1.12782| constrast_loss: 4.44845| div_loss: 0.62823| %_mask_idx: 0.4104| ppl: 237.93208| %_neg_is_pos: 0.00317| lr: 0.0| temp: 1.97275 | loss: 1.12442| constrast_loss: 4.43142| div_loss: 0.66245| %_mask_idx: 0.32284| ppl: 216.02963| %_neg_is_pos: 0.00508| lr: 0.0| temp: 1.97274 | loss: 1.14571| constrast_loss: 4.5219| div_loss: 0.60963| %_mask_idx: 0.46413| ppl: 249.83762| %_neg_is_pos: 0.00133| lr: 0.0| temp: 1.97274 | loss: 1.13683| constrast_loss: 4.48473| div_loss: 0.62585| %_mask_idx: 0.4115| ppl: 239.45709| %_neg_is_pos: 0.00382| lr: 0.0| temp: 1.97272 | loss: 1.12961| constrast_loss: 4.45182| div_loss: 0.66629| %_mask_idx: 0.37265| ppl: 213.57205| %_neg_is_pos: 0.0072| lr: 0.0| temp: 1.97272 | loss: 1.12243| constrast_loss: 4.42573| div_loss: 0.63992| %_mask_idx: 0.41103| ppl: 230.45255| %_neg_is_pos: 0.00332| lr: 0.0| temp: 1.97271 | loss: 1.1251| constrast_loss: 4.43522| div_loss: 0.65199| %_mask_idx: 0.31751| ppl: 222.7291| %_neg_is_pos: 0.00373| lr: 0.0| temp: 1.97271 | loss: 1.13009| constrast_loss: 4.45867| div_loss: 0.61697| %_mask_idx: 0.38158| ppl: 245.13956| %_neg_is_pos: 0.00319| lr: 0.0| temp: 1.9727 | loss: 1.13859| constrast_loss: 4.48968| div_loss: 0.6468| %_mask_idx: 0.40758| ppl: 226.04938| %_neg_is_pos: 0.0033| lr: 0.0| temp: 1.9727 | loss: 1.12647| constrast_loss: 4.44026| div_loss: 0.65606| %_mask_idx: 0.39599| ppl: 220.12421| %_neg_is_pos: 0.00517| lr: 0.0| temp: 1.97269 | loss: 1.12869| constrast_loss: 4.44985| div_loss: 0.64909| %_mask_idx: 0.38941| ppl: 224.5854| %_neg_is_pos: 0.00424| lr: 0.0| temp: 1.97269 | loss: 1.13482| constrast_loss: 4.4771| div_loss: 0.62191| %_mask_idx: 0.43092| ppl: 241.97726| %_neg_is_pos: 0.00251| lr: 0.0| temp: 1.97267 | loss: 1.12527| constrast_loss: 4.4381| div_loss: 0.62996| %_mask_idx: 0.3479| ppl: 236.82278| %_neg_is_pos: 0.00454| lr: 0.0| temp: 1.97267 | loss: 1.1464| constrast_loss: 4.52434| div_loss: 0.61276| %_mask_idx: 0.39317| ppl: 247.83398| %_neg_is_pos: 0.00259| lr: 0.0| temp: 1.97266 | loss: 1.13703| constrast_loss: 4.4868| div_loss: 0.61335| %_mask_idx: 0.47039| ppl: 247.45657| %_neg_is_pos: 0.00159| lr: 0.0| temp: 1.97266 | loss: 1.13033| constrast_loss: 4.45753| div_loss: 0.63793| %_mask_idx: 0.36231| ppl: 231.72394| %_neg_is_pos: 0.00361| lr: 0.0| temp: 1.97264 | loss: 1.13297| constrast_loss: 4.46607| div_loss: 0.65806| %_mask_idx: 0.36341| ppl: 218.84305| %_neg_is_pos: 0.00411| lr: 0.0| temp: 1.97264 | loss: 1.13809| constrast_loss: 4.48785| div_loss: 0.64492| %_mask_idx: 0.36873| ppl: 227.25226| %_neg_is_pos: 0.00529| lr: 0.0| temp: 1.97263 | loss: 1.13229| constrast_loss: 4.46541| div_loss: 0.63754| %_mask_idx: 0.38581| ppl: 231.97414| %_neg_is_pos: 0.00323| lr: 0.0| temp: 1.97263 | loss: 1.13244| constrast_loss: 4.46695| div_loss: 0.62808| %_mask_idx: 0.33803| ppl: 238.03018| %_neg_is_pos: 0.00406| lr: 0.0| temp: 1.97262 | loss: 1.13076| constrast_loss: 4.45762| div_loss: 0.65399| %_mask_idx: 0.37704| ppl: 221.44803| %_neg_is_pos: 0.00772| lr: 0.0| temp: 1.97262 | loss: 1.12601| constrast_loss: 4.44037| div_loss: 0.63654| %_mask_idx: 0.37845| ppl: 232.61179| %_neg_is_pos: 0.0027| lr: 0.0| temp: 1.97261 | loss: 1.13222| constrast_loss: 4.46376| div_loss: 0.65132| %_mask_idx: 0.39348| ppl: 223.15237| %_neg_is_pos: 0.00495| lr: 0.0| temp: 1.97261 | loss: 1.13378| constrast_loss: 4.47216| div_loss: 0.62975| %_mask_idx: 0.37249| ppl: 236.95766| %_neg_is_pos: 0.00366| lr: 0.0| temp: 1.97259 | loss: 1.12187| constrast_loss: 4.42296| div_loss: 0.64502| %_mask_idx: 0.32487| ppl: 227.18759| %_neg_is_pos: 0.00411| lr: 0.0| temp: 1.97259 | loss: 1.13169| constrast_loss: 4.46418| div_loss: 0.62596| %_mask_idx: 0.37688| ppl: 239.38733| %_neg_is_pos: 0.00269| lr: 0.0| temp: 1.97258 | loss: 1.12319| constrast_loss: 4.42895| div_loss: 0.63806| %_mask_idx: 0.39975| ppl: 231.6423| %_neg_is_pos: 0.00332| lr: 0.0| temp: 1.97258 | loss: 1.13407| constrast_loss: 4.47258| div_loss: 0.63714| %_mask_idx: 0.43719| ppl: 232.2321| %_neg_is_pos: 0.00361| lr: 0.0| temp: 1.97257 | loss: 1.13385| constrast_loss: 4.47205| div_loss: 0.63364| %_mask_idx: 0.38831| ppl: 234.47272| %_neg_is_pos: 0.0038| lr: 0.0| temp: 1.97257 | loss: 1.12811| constrast_loss: 4.44973| div_loss: 0.62695| %_mask_idx: 0.36607| ppl: 238.74907| %_neg_is_pos: 0.00498| lr: 0.0| temp: 1.97256 | loss: 1.13158| constrast_loss: 4.46014| div_loss: 0.66185| %_mask_idx: 0.37923| ppl: 216.41331| %_neg_is_pos: 0.0035| lr: 0.0| temp: 1.97256 | loss: 1.13517| constrast_loss: 4.47507| div_loss: 0.656| %_mask_idx: 0.38189| ppl: 220.15744| %_neg_is_pos: 0.00635| lr: 0.0| temp: 1.97254 | loss: 1.13484| constrast_loss: 4.4754| div_loss: 0.63962| %_mask_idx: 0.34555| ppl: 230.64127| %_neg_is_pos: 0.00327| lr: 0.0| temp: 1.97254 | loss: 1.11806| constrast_loss: 4.40728| div_loss: 0.64947| %_mask_idx: 0.36952| ppl: 224.34027| %_neg_is_pos: 0.0061| lr: 0.0| temp: 1.97253 | loss: 1.13081| constrast_loss: 4.45943| div_loss: 0.63797| %_mask_idx: 0.42826| ppl: 231.69872| %_neg_is_pos: 0.00404| lr: 0.0| temp: 1.97253 | loss: 1.13918| constrast_loss: 4.49437| div_loss: 0.62359| %_mask_idx: 0.45363| ppl: 240.90419| %_neg_is_pos: 0.0024| lr: 0.0| temp: 1.97252 | loss: 1.13335| constrast_loss: 4.46984| div_loss: 0.63539| %_mask_idx: 0.40789| ppl: 233.35269| %_neg_is_pos: 0.00265| lr: 0.0| temp: 1.97252 | loss: 1.12875| constrast_loss: 4.45154| div_loss: 0.63473| %_mask_idx: 0.36216| ppl: 233.77414| %_neg_is_pos: 0.00446| lr: 0.0| temp: 1.97251 | loss: 1.1219| constrast_loss: 4.42335| div_loss: 0.64254| %_mask_idx: 0.39756| ppl: 228.77332| %_neg_is_pos: 0.00482| lr: 0.0| temp: 1.97251 | loss: 1.1379| constrast_loss: 4.4908| div_loss: 0.60785| %_mask_idx: 0.40977| ppl: 250.97284| %_neg_is_pos: 0.0019| lr: 0.0| temp: 1.97249 | loss: 1.12032| constrast_loss: 4.41587| div_loss: 0.65427| %_mask_idx: 0.34211| ppl: 221.26447| %_neg_is_pos: 0.00432| lr: 0.0| temp: 1.97249 | loss: 1.12597| constrast_loss: 4.44078| div_loss: 0.63082| %_mask_idx: 0.33506| ppl: 236.27235| %_neg_is_pos: 0.00445| lr: 0.0| temp: 1.97248 | loss: 1.12747| constrast_loss: 4.44437| div_loss: 0.65507| %_mask_idx: 0.35448| ppl: 220.75288| %_neg_is_pos: 0.00547| lr: 0.0| temp: 1.97248 | loss: 1.13665| constrast_loss: 4.4835| div_loss: 0.63097| %_mask_idx: 0.34978| ppl: 236.17969| %_neg_is_pos: 0.00427| lr: 0.0| temp: 1.97246 | loss: 1.12955| constrast_loss: 4.45407| div_loss: 0.64121| %_mask_idx: 0.3797| ppl: 229.62634| %_neg_is_pos: 0.00303| lr: 0.0| temp: 1.97246 | loss: 1.13116| constrast_loss: 4.46007| div_loss: 0.64566| %_mask_idx: 0.42654| ppl: 226.7767| %_neg_is_pos: 0.00266| lr: 0.0| temp: 1.97245 | loss: 1.12542| constrast_loss: 4.4377| div_loss: 0.63981| %_mask_idx: 0.38142| ppl: 230.51913| %_neg_is_pos: 0.0044| lr: 0.0| temp: 1.97245 | loss: 1.12754| constrast_loss: 4.44637| div_loss: 0.63787| %_mask_idx: 0.39474| ppl: 231.76172| %_neg_is_pos: 0.00438| lr: 0.0| temp: 1.97244 | loss: 1.12992| constrast_loss: 4.45521| div_loss: 0.64471| %_mask_idx: 0.37798| ppl: 227.38353| %_neg_is_pos: 0.00276| lr: 0.0| temp: 1.97244 | loss: 1.12387| constrast_loss: 4.42986| div_loss: 0.65637| %_mask_idx: 0.3985| ppl: 219.92621| %_neg_is_pos: 0.00443| lr: 0.0| temp: 1.97243 | loss: 1.14992| constrast_loss: 4.5375| div_loss: 0.6217| %_mask_idx: 0.38471| ppl: 242.11493| %_neg_is_pos: 0.00213| lr: 0.0| temp: 1.97243 | loss: 1.12467| constrast_loss: 4.4356| div_loss: 0.63093| %_mask_idx: 0.36779| ppl: 236.20724| %_neg_is_pos: 0.00372| lr: 0.0| temp: 1.97241 | loss: 1.12556| constrast_loss: 4.43738| div_loss: 0.64843| %_mask_idx: 0.36341| ppl: 225.00668| %_neg_is_pos: 0.00768| lr: 0.0| temp: 1.97241 | loss: 1.12944| constrast_loss: 4.45403| div_loss: 0.63749| %_mask_idx: 0.45395| ppl: 232.00787| %_neg_is_pos: 0.00296| lr: 0.0| temp: 1.9724 | loss: 1.12531| constrast_loss: 4.4366| div_loss: 0.64646| %_mask_idx: 0.3562| ppl: 226.26537| %_neg_is_pos: 0.00372| lr: 0.0| temp: 1.9724 | loss: 1.1332| constrast_loss: 4.46919| div_loss: 0.63594| %_mask_idx: 0.40147| ppl: 233.00101| %_neg_is_pos: 0.00371| lr: 0.0| temp: 1.97239 | loss: 1.12774| constrast_loss: 4.4472| div_loss: 0.63764| %_mask_idx: 0.41165| ppl: 231.90971| %_neg_is_pos: 0.00646| lr: 0.0| temp: 1.97239 | loss: 1.13945| constrast_loss: 4.49607| div_loss: 0.61709| %_mask_idx: 0.35714| ppl: 245.06052| %_neg_is_pos: 0.00484| lr: 0.0| temp: 1.97238 | loss: 1.1474| constrast_loss: 4.52779| div_loss: 0.61817| %_mask_idx: 0.36999| ppl: 244.36862| %_neg_is_pos: 0.00414| lr: 0.0| temp: 1.97238 | loss: 1.12735| constrast_loss: 4.44263| div_loss: 0.66755| %_mask_idx: 0.37782| ppl: 212.77069| %_neg_is_pos: 0.00561| lr: 0.0| temp: 1.97236 | loss: 1.1393| constrast_loss: 4.4949| div_loss: 0.62307| %_mask_idx: 0.39536| ppl: 241.23792| %_neg_is_pos: 0.0022| lr: 0.0| temp: 1.97236 | loss: 1.13092| constrast_loss: 4.45961| div_loss: 0.64054| %_mask_idx: 0.35996| ppl: 230.05682| %_neg_is_pos: 0.0056| lr: 0.0| temp: 1.97235 | loss: 1.13657| constrast_loss: 4.48417| div_loss: 0.62124| %_mask_idx: 0.41181| ppl: 242.40581| %_neg_is_pos: 0.0039| lr: 0.0| temp: 1.97235 | loss: 1.13201| constrast_loss: 4.46506| div_loss: 0.62965| %_mask_idx: 0.37484| ppl: 237.02625| %_neg_is_pos: 0.00427| lr: 0.0| temp: 1.97234 | loss: 1.1281| constrast_loss: 4.44872| div_loss: 0.63661| %_mask_idx: 0.39583| ppl: 232.5675| %_neg_is_pos: 0.00348| lr: 0.0| temp: 1.97234 | loss: 1.13361| constrast_loss: 4.47206| div_loss: 0.6237| %_mask_idx: 0.40053| ppl: 240.83344| %_neg_is_pos: 0.00332| lr: 0.0| temp: 1.97233 | loss: 1.12752| constrast_loss: 4.44609| div_loss: 0.63993| %_mask_idx: 0.41933| ppl: 230.44632| %_neg_is_pos: 0.00413| lr: 0.0| temp: 1.97233 | loss: 1.12563| constrast_loss: 4.44031| div_loss: 0.62203| %_mask_idx: 0.34038| ppl: 241.89862| %_neg_is_pos: 0.00505| lr: 0.0| temp: 1.97231 | loss: 1.1282| constrast_loss: 4.45033| div_loss: 0.62484| %_mask_idx: 0.41338| ppl: 240.09949| %_neg_is_pos: 0.00454| lr: 0.0| temp: 1.97231 | loss: 1.13145| constrast_loss: 4.46172| div_loss: 0.64069| %_mask_idx: 0.40461| ppl: 229.95607| %_neg_is_pos: 0.00359| lr: 0.0| temp: 1.9723 | loss: 1.14111| constrast_loss: 4.50073| div_loss: 0.63714| %_mask_idx: 0.42309| ppl: 232.23257| %_neg_is_pos: 0.00352| lr: 0.0| temp: 1.9723 [2021-09-02 03:15:53,325] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 03:15:53,326] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.12705| constrast_loss: 4.44308| div_loss: 0.65123| %_mask_idx: 0.42246| ppl: 223.21198| %_neg_is_pos: 0.00384| lr: 0.0| temp: 1.97228 | loss: 1.12948| constrast_loss: 4.45442| div_loss: 0.63485| %_mask_idx: 0.33913| ppl: 233.69693| %_neg_is_pos: 0.0049| lr: 0.0| temp: 1.97228 | loss: 1.12827| constrast_loss: 4.4472| div_loss: 0.65883| %_mask_idx: 0.39803| ppl: 218.35184| %_neg_is_pos: 0.00481| lr: 0.0| temp: 1.97227 | loss: 1.12405| constrast_loss: 4.43084| div_loss: 0.6535| %_mask_idx: 0.39301| ppl: 221.76305| %_neg_is_pos: 0.00369| lr: 0.0| temp: 1.97227 | loss: 1.13963| constrast_loss: 4.49364| div_loss: 0.64897| %_mask_idx: 0.39991| ppl: 224.65771| %_neg_is_pos: 0.00376| lr: 0.0| temp: 1.97226 | loss: 1.12266| constrast_loss: 4.42506| div_loss: 0.65588| %_mask_idx: 0.35307| ppl: 220.23651| %_neg_is_pos: 0.00557| lr: 0.0| temp: 1.97226 | loss: 1.13495| constrast_loss: 4.47779| div_loss: 0.62014| %_mask_idx: 0.38518| ppl: 243.11191| %_neg_is_pos: 0.00175| lr: 0.0| temp: 1.97225 | loss: 1.13517| constrast_loss: 4.47845| div_loss: 0.62246| %_mask_idx: 0.40946| ppl: 241.62497| %_neg_is_pos: 0.00224| lr: 0.0| temp: 1.97225 | loss: 1.12775| constrast_loss: 4.44673| div_loss: 0.64276| %_mask_idx: 0.39051| ppl: 228.63589| %_neg_is_pos: 0.00521| lr: 0.0| temp: 1.97223 | loss: 1.13654| constrast_loss: 4.48262| div_loss: 0.63539| %_mask_idx: 0.43781| ppl: 233.35016| %_neg_is_pos: 0.00314| lr: 0.0| temp: 1.97223 | loss: 1.12951| constrast_loss: 4.45401| div_loss: 0.64049| %_mask_idx: 0.39207| ppl: 230.08545| %_neg_is_pos: 0.0036| lr: 0.0| temp: 1.97222 | loss: 1.12798| constrast_loss: 4.44894| div_loss: 0.62984| %_mask_idx: 0.39442| ppl: 236.90004| %_neg_is_pos: 0.00207| lr: 0.0| temp: 1.97222 | loss: 1.12624| constrast_loss: 4.44172| div_loss: 0.63246| %_mask_idx: 0.37469| ppl: 235.22609| %_neg_is_pos: 0.00327| lr: 0.0| temp: 1.97221 | loss: 1.11742| constrast_loss: 4.4042| div_loss: 0.65465| %_mask_idx: 0.33161| ppl: 221.0264| %_neg_is_pos: 0.00496| lr: 0.0| temp: 1.97221 | loss: 1.12786| constrast_loss: 4.44727| div_loss: 0.64184| %_mask_idx: 0.41541| ppl: 229.22086| %_neg_is_pos: 0.00398| lr: 0.0| temp: 1.9722 | loss: 1.12923| constrast_loss: 4.45463| div_loss: 0.62301| %_mask_idx: 0.39317| ppl: 241.27135| %_neg_is_pos: 0.00398| lr: 0.0| temp: 1.9722 | loss: 1.13066| constrast_loss: 4.45718| div_loss: 0.65461| %_mask_idx: 0.42043| ppl: 221.04825| %_neg_is_pos: 0.00432| lr: 0.0| temp: 1.97219 | loss: 1.1295| constrast_loss: 4.45392| div_loss: 0.64093| %_mask_idx: 0.38393| ppl: 229.80392| %_neg_is_pos: 0.00618| lr: 0.0| temp: 1.97219 | loss: 1.14712| constrast_loss: 4.52654| div_loss: 0.61927| %_mask_idx: 0.40555| ppl: 243.66829| %_neg_is_pos: 0.00476| lr: 0.0| temp: 1.97218 | loss: 1.12672| constrast_loss: 4.44192| div_loss: 0.64969| %_mask_idx: 0.38659| ppl: 224.19556| %_neg_is_pos: 0.00581| lr: 0.0| temp: 1.97218 | loss: 1.13347| constrast_loss: 4.46845| div_loss: 0.65444| %_mask_idx: 0.38847| ppl: 221.1606| %_neg_is_pos: 0.0057| lr: 0.0| temp: 1.97217 | loss: 1.12617| constrast_loss: 4.43966| div_loss: 0.65012| %_mask_idx: 0.36967| ppl: 223.92078| %_neg_is_pos: 0.00297| lr: 0.0| temp: 1.97217 | loss: 1.12651| constrast_loss: 4.44301| div_loss: 0.63047| %_mask_idx: 0.40022| ppl: 236.49963| %_neg_is_pos: 0.00622| lr: 0.0| temp: 1.97216 | loss: 1.12139| constrast_loss: 4.42124| div_loss: 0.64329| %_mask_idx: 0.39536| ppl: 228.29666| %_neg_is_pos: 0.0048| lr: 0.0| temp: 1.97216 | loss: 1.12945| constrast_loss: 4.45471| div_loss: 0.63111| %_mask_idx: 0.38831| ppl: 236.09193| %_neg_is_pos: 0.00259| lr: 0.0| temp: 1.97214 | loss: 1.13868| constrast_loss: 4.49143| div_loss: 0.6328| %_mask_idx: 0.38471| ppl: 235.00659| %_neg_is_pos: 0.00329| lr: 0.0| temp: 1.97214 | loss: 1.12612| constrast_loss: 4.43988| div_loss: 0.64612| %_mask_idx: 0.40445| ppl: 226.48326| %_neg_is_pos: 0.00294| lr: 0.0| temp: 1.97213 | loss: 1.12645| constrast_loss: 4.44175| div_loss: 0.64055| %_mask_idx: 0.37578| ppl: 230.04495| %_neg_is_pos: 0.00353| lr: 0.0| temp: 1.97213 | loss: 1.1305| constrast_loss: 4.45768| div_loss: 0.64332| %_mask_idx: 0.39364| ppl: 228.27231| %_neg_is_pos: 0.00314| lr: 0.0| temp: 1.97211 | loss: 1.12365| constrast_loss: 4.42809| div_loss: 0.66501| %_mask_idx: 0.35699| ppl: 214.39325| %_neg_is_pos: 0.0051| lr: 0.0| temp: 1.97211 | loss: 1.13008| constrast_loss: 4.45775| div_loss: 0.62566| %_mask_idx: 0.33788| ppl: 239.57681| %_neg_is_pos: 0.00295| lr: 0.0| temp: 1.9721 | loss: 1.13321| constrast_loss: 4.47007| div_loss: 0.62774| %_mask_idx: 0.40946| ppl: 238.24326| %_neg_is_pos: 0.00458| lr: 0.0| temp: 1.9721 | loss: 1.13564| constrast_loss: 4.47928| div_loss: 0.63277| %_mask_idx: 0.41917| ppl: 235.02731| %_neg_is_pos: 0.00246| lr: 0.0| temp: 1.97209 | loss: 1.13525| constrast_loss: 4.47691| div_loss: 0.641| %_mask_idx: 0.38863| ppl: 229.75836| %_neg_is_pos: 0.0031| lr: 0.0| temp: 1.97209 | loss: 1.1349| constrast_loss: 4.47362| div_loss: 0.65999| %_mask_idx: 0.37829| ppl: 217.60342| %_neg_is_pos: 0.00709| lr: 0.0| temp: 1.97208 | loss: 1.12502| constrast_loss: 4.43632| div_loss: 0.63763| %_mask_idx: 0.42998| ppl: 231.9184| %_neg_is_pos: 0.00204| lr: 0.0| temp: 1.97208 | loss: 1.13599| constrast_loss: 4.47839| div_loss: 0.65575| %_mask_idx: 0.35605| ppl: 220.32098| %_neg_is_pos: 0.00783| lr: 0.0| temp: 1.97206 | loss: 1.12812| constrast_loss: 4.4482| div_loss: 0.64298| %_mask_idx: 0.40194| ppl: 228.49458| %_neg_is_pos: 0.00446| lr: 0.0| temp: 1.97206 | loss: 1.13088| constrast_loss: 4.45943| div_loss: 0.64101| %_mask_idx: 0.39646| ppl: 229.75137| %_neg_is_pos: 0.00419| lr: 0.0| temp: 1.97205 | loss: 1.13689| constrast_loss: 4.48422| div_loss: 0.6334| %_mask_idx: 0.3808| ppl: 234.62125| %_neg_is_pos: 0.00272| lr: 0.0| temp: 1.97205 | loss: 1.12605| constrast_loss: 4.44057| div_loss: 0.63617| %_mask_idx: 0.40241| ppl: 232.85004| %_neg_is_pos: 0.0025| lr: 0.0| temp: 1.97204 | loss: 1.1366| constrast_loss: 4.48374| div_loss: 0.62677| %_mask_idx: 0.38737| ppl: 238.8645| %_neg_is_pos: 0.0032| lr: 0.0| temp: 1.97204 | loss: 1.13713| constrast_loss: 4.48455| div_loss: 0.63961| %_mask_idx: 0.41056| ppl: 230.6478| %_neg_is_pos: 0.00217| lr: 0.0| temp: 1.97203 | loss: 1.12248| constrast_loss: 4.42504| div_loss: 0.64872| %_mask_idx: 0.40367| ppl: 224.8214| %_neg_is_pos: 0.00363| lr: 0.0| temp: 1.97203 | loss: 1.12169| constrast_loss: 4.42188| div_loss: 0.64896| %_mask_idx: 0.37516| ppl: 224.66562| %_neg_is_pos: 0.00351| lr: 0.0| temp: 1.97201 | loss: 1.12866| constrast_loss: 4.45213| div_loss: 0.62514| %_mask_idx: 0.32018| ppl: 239.91031| %_neg_is_pos: 0.00351| lr: 0.0| temp: 1.97201 | loss: 1.12485| constrast_loss: 4.43498| div_loss: 0.6443| %_mask_idx: 0.38706| ppl: 227.64807| %_neg_is_pos: 0.00239| lr: 0.0| temp: 1.972 | loss: 1.13171| constrast_loss: 4.46212| div_loss: 0.6471| %_mask_idx: 0.32816| ppl: 225.85852| %_neg_is_pos: 0.00388| lr: 0.0| temp: 1.972 | loss: 1.12839| constrast_loss: 4.44901| div_loss: 0.64548| %_mask_idx: 0.40523| ppl: 226.89294| %_neg_is_pos: 0.00521| lr: 0.0| temp: 1.97199 | loss: 1.1273| constrast_loss: 4.44488| div_loss: 0.64324| %_mask_idx: 0.42419| ppl: 228.32492| %_neg_is_pos: 0.00406| lr: 0.0| temp: 1.97199 | loss: 1.13561| constrast_loss: 4.47982| div_loss: 0.62609| %_mask_idx: 0.41776| ppl: 239.30042| %_neg_is_pos: 0.00174| lr: 0.0| temp: 1.97198 | loss: 1.13532| constrast_loss: 4.47881| div_loss: 0.62486| %_mask_idx: 0.38456| ppl: 240.08722| %_neg_is_pos: 0.00273| lr: 0.0| temp: 1.97198 | loss: 1.12492| constrast_loss: 4.43159| div_loss: 0.68094| %_mask_idx: 0.31751| ppl: 204.19879| %_neg_is_pos: 0.00734| lr: 0.0| temp: 1.97196 | loss: 1.12804| constrast_loss: 4.4466| div_loss: 0.65546| %_mask_idx: 0.35244| ppl: 220.50662| %_neg_is_pos: 0.00482| lr: 0.0| temp: 1.97196 | loss: 1.132| constrast_loss: 4.46373| div_loss: 0.64288| %_mask_idx: 0.39348| ppl: 228.55432| %_neg_is_pos: 0.00431| lr: 0.0| temp: 1.97195 | loss: 1.13484| constrast_loss: 4.47663| div_loss: 0.62733| %_mask_idx: 0.39411| ppl: 238.51035| %_neg_is_pos: 0.00301| lr: 0.0| temp: 1.97195 | loss: 1.12941| constrast_loss: 4.45404| div_loss: 0.63617| %_mask_idx: 0.37171| ppl: 232.85274| %_neg_is_pos: 0.00221| lr: 0.0| temp: 1.97193 | loss: 1.13306| constrast_loss: 4.46906| div_loss: 0.63192| %_mask_idx: 0.40727| ppl: 235.56862| %_neg_is_pos: 0.00316| lr: 0.0| temp: 1.97193 | loss: 1.12322| constrast_loss: 4.42871| div_loss: 0.64157| %_mask_idx: 0.41244| ppl: 229.39474| %_neg_is_pos: 0.00494| lr: 0.0| temp: 1.97192 | loss: 1.13814| constrast_loss: 4.48678| div_loss: 0.65795| %_mask_idx: 0.34712| ppl: 218.91452| %_neg_is_pos: 0.00577| lr: 0.0| temp: 1.97192 | loss: 1.13629| constrast_loss: 4.47966| div_loss: 0.65484| %_mask_idx: 0.37296| ppl: 220.90268| %_neg_is_pos: 0.00267| lr: 0.0| temp: 1.97191 | loss: 1.1351| constrast_loss: 4.47743| div_loss: 0.62979| %_mask_idx: 0.39489| ppl: 236.93732| %_neg_is_pos: 0.00301| lr: 0.0| temp: 1.97191 | loss: 1.13249| constrast_loss: 4.46658| div_loss: 0.63369| %_mask_idx: 0.3963| ppl: 234.43874| %_neg_is_pos: 0.00344| lr: 0.0| temp: 1.9719 | loss: 1.14186| constrast_loss: 4.50675| div_loss: 0.60698| %_mask_idx: 0.40977| ppl: 251.53093| %_neg_is_pos: 0.00274| lr: 0.0| temp: 1.9719 | loss: 1.12175| constrast_loss: 4.42144| div_loss: 0.65544| %_mask_idx: 0.37343| ppl: 220.52039| %_neg_is_pos: 0.00422| lr: 0.0| temp: 1.97188 | loss: 1.13077| constrast_loss: 4.46024| div_loss: 0.62847| %_mask_idx: 0.38048| ppl: 237.77663| %_neg_is_pos: 0.00323| lr: 0.0| temp: 1.97188 | loss: 1.13167| constrast_loss: 4.46368| div_loss: 0.63013| %_mask_idx: 0.38957| ppl: 236.71365| %_neg_is_pos: 0.00436| lr: 0.0| temp: 1.97187 | loss: 1.13833| constrast_loss: 4.48922| div_loss: 0.64118| %_mask_idx: 0.39411| ppl: 229.64319| %_neg_is_pos: 0.00443| lr: 0.0| temp: 1.97187 | loss: 1.13444| constrast_loss: 4.47406| div_loss: 0.63693| %_mask_idx: 0.40648| ppl: 232.36758| %_neg_is_pos: 0.00262| lr: 0.0| temp: 1.97186 | loss: 1.1208| constrast_loss: 4.41767| div_loss: 0.65543| %_mask_idx: 0.35025| ppl: 220.52679| %_neg_is_pos: 0.00385| lr: 0.0| temp: 1.97186 | loss: 1.12571| constrast_loss: 4.43843| div_loss: 0.64395| %_mask_idx: 0.31579| ppl: 227.87427| %_neg_is_pos: 0.00444| lr: 0.0| temp: 1.97185 | loss: 1.12876| constrast_loss: 4.45181| div_loss: 0.63235| %_mask_idx: 0.39803| ppl: 235.29634| %_neg_is_pos: 0.00288| lr: 0.0| temp: 1.97185 | loss: 1.12775| constrast_loss: 4.44775| div_loss: 0.63236| %_mask_idx: 0.37954| ppl: 235.29008| %_neg_is_pos: 0.00381| lr: 0.0| temp: 1.97183 | loss: 1.1305| constrast_loss: 4.45917| div_loss: 0.62851| %_mask_idx: 0.38863| ppl: 237.75052| %_neg_is_pos: 0.00333| lr: 0.0| temp: 1.97183 | loss: 1.12498| constrast_loss: 4.4356| div_loss: 0.6433| %_mask_idx: 0.34947| ppl: 228.28603| %_neg_is_pos: 0.00372| lr: 0.0| temp: 1.97182 | loss: 1.12907| constrast_loss: 4.45169| div_loss: 0.64592| %_mask_idx: 0.34915| ppl: 226.60912| %_neg_is_pos: 0.00439| lr: 0.0| temp: 1.97182 | loss: 1.13934| constrast_loss: 4.49526| div_loss: 0.62103| %_mask_idx: 0.40257| ppl: 242.54202| %_neg_is_pos: 0.00156| lr: 0.0| temp: 1.97181 | loss: 1.13983| constrast_loss: 4.49543| div_loss: 0.63898| %_mask_idx: 0.36544| ppl: 231.05504| %_neg_is_pos: 0.00271| lr: 0.0| temp: 1.97181 | loss: 1.13253| constrast_loss: 4.46578| div_loss: 0.64348| %_mask_idx: 0.40539| ppl: 228.17451| %_neg_is_pos: 0.00191| lr: 0.0| temp: 1.9718 | loss: 1.13793| constrast_loss: 4.4881| div_loss: 0.63633| %_mask_idx: 0.42544| ppl: 232.74763| %_neg_is_pos: 0.00348| lr: 0.0| temp: 1.9718 | loss: 1.12927| constrast_loss: 4.45419| div_loss: 0.62894| %_mask_idx: 0.36122| ppl: 237.48029| %_neg_is_pos: 0.00317| lr: 0.0| temp: 1.97178 | loss: 1.12877| constrast_loss: 4.45057| div_loss: 0.64505| %_mask_idx: 0.33756| ppl: 227.16806| %_neg_is_pos: 0.00402| lr: 0.0| temp: 1.97178 | loss: 1.1275| constrast_loss: 4.44523| div_loss: 0.64776| %_mask_idx: 0.34477| ppl: 225.43095| %_neg_is_pos: 0.00295| lr: 0.0| temp: 1.97177 | loss: 1.13993| constrast_loss: 4.49782| div_loss: 0.61897| %_mask_idx: 0.38001| ppl: 243.86072| %_neg_is_pos: 0.00402| lr: 0.0| temp: 1.97177 | loss: 1.13217| constrast_loss: 4.46526| div_loss: 0.63432| %_mask_idx: 0.35683| ppl: 234.03647| %_neg_is_pos: 0.00255| lr: 0.0| temp: 1.97175 | loss: 1.12746| constrast_loss: 4.44603| div_loss: 0.63816| %_mask_idx: 0.39286| ppl: 231.57558| %_neg_is_pos: 0.00437| lr: 0.0| temp: 1.97175 | loss: 1.12759| constrast_loss: 4.44614| div_loss: 0.64222| %_mask_idx: 0.37719| ppl: 228.98108| %_neg_is_pos: 0.0031| lr: 0.0| temp: 1.97174 | loss: 1.1264| constrast_loss: 4.44104| div_loss: 0.64547| %_mask_idx: 0.35056| ppl: 226.89651| %_neg_is_pos: 0.00366| lr: 0.0| temp: 1.97174 | loss: 1.12761| constrast_loss: 4.44721| div_loss: 0.63237| %_mask_idx: 0.40648| ppl: 235.28427| %_neg_is_pos: 0.00338| lr: 0.0| temp: 1.97173 | loss: 1.12718| constrast_loss: 4.44405| div_loss: 0.6467| %_mask_idx: 0.45066| ppl: 226.10907| %_neg_is_pos: 0.00257| lr: 0.0| temp: 1.97173 | loss: 1.11802| constrast_loss: 4.40491| div_loss: 0.67168| %_mask_idx: 0.39364| ppl: 210.12741| %_neg_is_pos: 0.00539| lr: 0.0| temp: 1.97172 | loss: 1.13749| constrast_loss: 4.48751| div_loss: 0.62459| %_mask_idx: 0.39709| ppl: 240.26094| %_neg_is_pos: 0.00298| lr: 0.0| temp: 1.97172 | loss: 1.13234| constrast_loss: 4.46539| div_loss: 0.63977| %_mask_idx: 0.36623| ppl: 230.545| %_neg_is_pos: 0.00422| lr: 0.0| temp: 1.9717 | loss: 1.1299| constrast_loss: 4.45295| div_loss: 0.66662| %_mask_idx: 0.41181| ppl: 213.36401| %_neg_is_pos: 0.00317| lr: 0.0| temp: 1.9717 | loss: 1.1406| constrast_loss: 4.49992| div_loss: 0.62486| %_mask_idx: 0.34492| ppl: 240.08801| %_neg_is_pos: 0.0017| lr: 0.0| temp: 1.97169 | loss: 1.13554| constrast_loss: 4.4777| div_loss: 0.64459| %_mask_idx: 0.40147| ppl: 227.46043| %_neg_is_pos: 0.0041| lr: 0.0| temp: 1.97169 | loss: 1.12746| constrast_loss: 4.44434| div_loss: 0.65496| %_mask_idx: 0.41432| ppl: 220.82285| %_neg_is_pos: 0.00528| lr: 0.0| temp: 1.97168 | loss: 1.12925| constrast_loss: 4.45224| div_loss: 0.64768| %_mask_idx: 0.38612| ppl: 225.48749| %_neg_is_pos: 0.00282| lr: 0.0| temp: 1.97168 | loss: 1.13409| constrast_loss: 4.47386| div_loss: 0.62515| %_mask_idx: 0.40304| ppl: 239.90248| %_neg_is_pos: 0.00266| lr: 0.0| temp: 1.97167 | loss: 1.13567| constrast_loss: 4.48013| div_loss: 0.62535| %_mask_idx: 0.43249| ppl: 239.77844| %_neg_is_pos: 0.0024| lr: 0.0| temp: 1.97167 | loss: 1.13521| constrast_loss: 4.47775| div_loss: 0.63075| %_mask_idx: 0.45442| ppl: 236.32288| %_neg_is_pos: 0.00281| lr: 0.0| temp: 1.97165 | loss: 1.12511| constrast_loss: 4.43533| div_loss: 0.65101| %_mask_idx: 0.39176| ppl: 223.35558| %_neg_is_pos: 0.00241| lr: 0.0| temp: 1.97165 | loss: 1.13539| constrast_loss: 4.47644| div_loss: 0.65108| %_mask_idx: 0.36513| ppl: 223.30721| %_neg_is_pos: 0.00627| lr: 0.0| temp: 1.97164 | loss: 1.13476| constrast_loss: 4.47488| div_loss: 0.6416| %_mask_idx: 0.37516| ppl: 229.37311| %_neg_is_pos: 0.00275| lr: 0.0| temp: 1.97164 | loss: 1.13476| constrast_loss: 4.47625| div_loss: 0.62774| %_mask_idx: 0.41964| ppl: 238.2489| %_neg_is_pos: 0.00357| lr: 0.0| temp: 1.97163 | loss: 1.12886| constrast_loss: 4.452| div_loss: 0.63435| %_mask_idx: 0.38769| ppl: 234.01904| %_neg_is_pos: 0.00478| lr: 0.0| temp: 1.97163 | loss: 1.12908| constrast_loss: 4.45264| div_loss: 0.63697| %_mask_idx: 0.40038| ppl: 232.33807| %_neg_is_pos: 0.00452| lr: 0.0| temp: 1.97162 | loss: 1.12941| constrast_loss: 4.45388| div_loss: 0.63764| %_mask_idx: 0.33944| ppl: 231.91324| %_neg_is_pos: 0.00351| lr: 0.0| temp: 1.97162 | loss: 1.12881| constrast_loss: 4.45166| div_loss: 0.6356| %_mask_idx: 0.36983| ppl: 233.21341| %_neg_is_pos: 0.00377| lr: 0.0| temp: 1.9716 | loss: 1.13111| constrast_loss: 4.46009| div_loss: 0.6434| %_mask_idx: 0.4328| ppl: 228.22672| %_neg_is_pos: 0.00284| lr: 0.0| temp: 1.9716 | loss: 1.13441| constrast_loss: 4.4723| div_loss: 0.65339| %_mask_idx: 0.39944| ppl: 221.82906| %_neg_is_pos: 0.00337| lr: 0.0| temp: 1.97159 | loss: 1.13951| constrast_loss: 4.49435| div_loss: 0.63703| %_mask_idx: 0.43499| ppl: 232.30319| %_neg_is_pos: 0.00344| lr: 0.0| temp: 1.97159 [2021-09-02 03:25:06,831] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 03:25:06,831] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.14016| constrast_loss: 4.49701| div_loss: 0.63613| %_mask_idx: 0.37578| ppl: 232.87595| %_neg_is_pos: 0.00378| lr: 0.0| temp: 1.97157 | loss: 1.13951| constrast_loss: 4.49558| div_loss: 0.62445| %_mask_idx: 0.45238| ppl: 240.3519| %_neg_is_pos: 0.00294| lr: 0.0| temp: 1.97157 | loss: 1.14103| constrast_loss: 4.50089| div_loss: 0.63245| %_mask_idx: 0.37375| ppl: 235.22932| %_neg_is_pos: 0.00327| lr: 0.0| temp: 1.97156 | loss: 1.12636| constrast_loss: 4.44157| div_loss: 0.63867| %_mask_idx: 0.37766| ppl: 231.25439| %_neg_is_pos: 0.0057| lr: 0.0| temp: 1.97156 | loss: 1.13257| constrast_loss: 4.4661| div_loss: 0.64185| %_mask_idx: 0.36169| ppl: 229.21591| %_neg_is_pos: 0.00323| lr: 0.0| temp: 1.97155 | loss: 1.1415| constrast_loss: 4.50168| div_loss: 0.64339| %_mask_idx: 0.38863| ppl: 228.23093| %_neg_is_pos: 0.00339| lr: 0.0| temp: 1.97155 | loss: 1.1356| constrast_loss: 4.47889| div_loss: 0.63497| %_mask_idx: 0.37876| ppl: 233.62021| %_neg_is_pos: 0.00297| lr: 0.0| temp: 1.97154 | loss: 1.13945| constrast_loss: 4.49438| div_loss: 0.63427| %_mask_idx: 0.40492| ppl: 234.06866| %_neg_is_pos: 0.00284| lr: 0.0| temp: 1.97154 | loss: 1.12786| constrast_loss: 4.44573| div_loss: 0.65696| %_mask_idx: 0.35526| ppl: 219.54678| %_neg_is_pos: 0.00503| lr: 0.0| temp: 1.97152 | loss: 1.13652| constrast_loss: 4.4819| div_loss: 0.6416| %_mask_idx: 0.39458| ppl: 229.37396| %_neg_is_pos: 0.00292| lr: 0.0| temp: 1.97152 | loss: 1.13367| constrast_loss: 4.47127| div_loss: 0.63411| %_mask_idx: 0.40382| ppl: 234.16919| %_neg_is_pos: 0.00421| lr: 0.0| temp: 1.97151 | loss: 1.13457| constrast_loss: 4.47531| div_loss: 0.62948| %_mask_idx: 0.37202| ppl: 237.12976| %_neg_is_pos: 0.00291| lr: 0.0| temp: 1.97151 | loss: 1.13213| constrast_loss: 4.46542| div_loss: 0.63102| %_mask_idx: 0.38753| ppl: 236.14749| %_neg_is_pos: 0.00204| lr: 0.0| temp: 1.9715 | loss: 1.13691| constrast_loss: 4.48171| div_loss: 0.65945| %_mask_idx: 0.401| ppl: 217.94897| %_neg_is_pos: 0.00351| lr: 0.0| temp: 1.9715 | loss: 1.13261| constrast_loss: 4.46628| div_loss: 0.64152| %_mask_idx: 0.38158| ppl: 229.4269| %_neg_is_pos: 0.00459| lr: 0.0| temp: 1.9715 | loss: 1.12904| constrast_loss: 4.4518| div_loss: 0.64372| %_mask_idx: 0.42403| ppl: 228.02122| %_neg_is_pos: 0.00321| lr: 0.0| temp: 1.9715 | loss: 1.128| constrast_loss: 4.44757| div_loss: 0.64449| %_mask_idx: 0.39599| ppl: 227.52386| %_neg_is_pos: 0.00315| lr: 0.0| temp: 1.97148 | loss: 1.13432| constrast_loss: 4.47348| div_loss: 0.63789| %_mask_idx: 0.3927| ppl: 231.752| %_neg_is_pos: 0.00309| lr: 0.0| temp: 1.97148 | loss: 1.13453| constrast_loss: 4.47372| div_loss: 0.64408| %_mask_idx: 0.36607| ppl: 227.78778| %_neg_is_pos: 0.00355| lr: 0.0| temp: 1.97147 | loss: 1.13757| constrast_loss: 4.48708| div_loss: 0.63199| %_mask_idx: 0.41197| ppl: 235.52341| %_neg_is_pos: 0.00495| lr: 0.0| temp: 1.97147 | loss: 1.13911| constrast_loss: 4.49365| div_loss: 0.62795| %_mask_idx: 0.41134| ppl: 238.11044| %_neg_is_pos: 0.00207| lr: 0.0| temp: 1.97146 | loss: 1.12659| constrast_loss: 4.44073| div_loss: 0.65642| %_mask_idx: 0.41588| ppl: 219.88998| %_neg_is_pos: 0.00344| lr: 0.0| temp: 1.97146 | loss: 1.11803| constrast_loss: 4.4063| div_loss: 0.65813| %_mask_idx: 0.38847| ppl: 218.79626| %_neg_is_pos: 0.00967| lr: 0.0| temp: 1.97145 | loss: 1.132| constrast_loss: 4.46242| div_loss: 0.6556| %_mask_idx: 0.3656| ppl: 220.41737| %_neg_is_pos: 0.01106| lr: 0.0| temp: 1.97145 | loss: 1.1339| constrast_loss: 4.46962| div_loss: 0.65979| %_mask_idx: 0.3573| ppl: 217.73386| %_neg_is_pos: 0.00878| lr: 0.0| temp: 1.97143 | loss: 1.13027| constrast_loss: 4.45831| div_loss: 0.62754| %_mask_idx: 0.42027| ppl: 238.37512| %_neg_is_pos: 0.00356| lr: 0.0| temp: 1.97143 | loss: 1.13424| constrast_loss: 4.47274| div_loss: 0.64209| %_mask_idx: 0.41667| ppl: 229.0614| %_neg_is_pos: 0.00325| lr: 0.0| temp: 1.97142 | loss: 1.1378| constrast_loss: 4.48919| div_loss: 0.62017| %_mask_idx: 0.41197| ppl: 243.08981| %_neg_is_pos: 0.00517| lr: 0.0| temp: 1.97142 | loss: 1.13993| constrast_loss: 4.49706| div_loss: 0.62667| %_mask_idx: 0.37923| ppl: 238.93008| %_neg_is_pos: 0.00266| lr: 0.0| temp: 1.9714 | loss: 1.11954| constrast_loss: 4.41439| div_loss: 0.63788| %_mask_idx: 0.40993| ppl: 231.75851| %_neg_is_pos: 0.00364| lr: 0.0| temp: 1.9714 | loss: 1.13792| constrast_loss: 4.48662| div_loss: 0.65039| %_mask_idx: 0.38362| ppl: 223.75201| %_neg_is_pos: 0.00375| lr: 0.0| temp: 1.97139 | loss: 1.13589| constrast_loss: 4.4795| div_loss: 0.64063| %_mask_idx: 0.40946| ppl: 229.99777| %_neg_is_pos: 0.00306| lr: 0.0| temp: 1.97139 | loss: 1.13281| constrast_loss: 4.46765| div_loss: 0.63589| %_mask_idx: 0.3916| ppl: 233.03244| %_neg_is_pos: 0.00232| lr: 0.0| temp: 1.97138 | loss: 1.13876| constrast_loss: 4.48992| div_loss: 0.65119| %_mask_idx: 0.3974| ppl: 223.24051| %_neg_is_pos: 0.0027| lr: 0.0| temp: 1.97138 | loss: 1.12162| constrast_loss: 4.42033| div_loss: 0.66152| %_mask_idx: 0.36607| ppl: 216.62656| %_neg_is_pos: 0.0072| lr: 0.0| temp: 1.97137 | loss: 1.12888| constrast_loss: 4.4484| div_loss: 0.67117| %_mask_idx: 0.35041| ppl: 210.45413| %_neg_is_pos: 0.00339| lr: 0.0| temp: 1.97137 | loss: 1.1311| constrast_loss: 4.45941| div_loss: 0.64978| %_mask_idx: 0.39599| ppl: 224.14102| %_neg_is_pos: 0.00345| lr: 0.0| temp: 1.97135 | loss: 1.13011| constrast_loss: 4.45375| div_loss: 0.66687| %_mask_idx: 0.39051| ppl: 213.20352| %_neg_is_pos: 0.00524| lr: 0.0| temp: 1.97135 | loss: 1.14066| constrast_loss: 4.49972| div_loss: 0.62912| %_mask_idx: 0.41416| ppl: 237.36185| %_neg_is_pos: 0.00165| lr: 0.0| temp: 1.97134 | loss: 1.13515| constrast_loss: 4.47629| div_loss: 0.64294| %_mask_idx: 0.37265| ppl: 228.51855| %_neg_is_pos: 0.00466| lr: 0.0| temp: 1.97134 | loss: 1.13982| constrast_loss: 4.49447| div_loss: 0.64818| %_mask_idx: 0.39035| ppl: 225.16516| %_neg_is_pos: 0.00319| lr: 0.0| temp: 1.97133 | loss: 1.12851| constrast_loss: 4.44854| div_loss: 0.65488| %_mask_idx: 0.38017| ppl: 220.8756| %_neg_is_pos: 0.00416| lr: 0.0| temp: 1.97133 | loss: 1.13204| constrast_loss: 4.46513| div_loss: 0.63046| %_mask_idx: 0.39458| ppl: 236.50662| %_neg_is_pos: 0.00328| lr: 0.0| temp: 1.97132 | loss: 1.132| constrast_loss: 4.46623| div_loss: 0.61756| %_mask_idx: 0.44392| ppl: 244.76413| %_neg_is_pos: 0.00173| lr: 0.0| temp: 1.97132 | loss: 1.12903| constrast_loss: 4.45184| div_loss: 0.64274| %_mask_idx: 0.3844| ppl: 228.64491| %_neg_is_pos: 0.00502| lr: 0.0| temp: 1.9713 | loss: 1.13563| constrast_loss: 4.47976| div_loss: 0.62761| %_mask_idx: 0.39865| ppl: 238.32948| %_neg_is_pos: 0.00255| lr: 0.0| temp: 1.9713 | loss: 1.13862| constrast_loss: 4.49337| div_loss: 0.61124| %_mask_idx: 0.39536| ppl: 248.80576| %_neg_is_pos: 0.00247| lr: 0.0| temp: 1.97129 | loss: 1.13712| constrast_loss: 4.48529| div_loss: 0.63181| %_mask_idx: 0.41479| ppl: 235.64021| %_neg_is_pos: 0.0025| lr: 0.0| temp: 1.97129 | loss: 1.12428| constrast_loss: 4.4322| div_loss: 0.64941| %_mask_idx: 0.35401| ppl: 224.3793| %_neg_is_pos: 0.00327| lr: 0.0| temp: 1.97128 | loss: 1.1264| constrast_loss: 4.4396| div_loss: 0.65982| %_mask_idx: 0.33835| ppl: 217.71732| %_neg_is_pos: 0.00507| lr: 0.0| temp: 1.97128 | loss: 1.13227| constrast_loss: 4.46618| div_loss: 0.62902| %_mask_idx: 0.43061| ppl: 237.42926| %_neg_is_pos: 0.00263| lr: 0.0| temp: 1.97127 | loss: 1.14134| constrast_loss: 4.50149| div_loss: 0.63886| %_mask_idx: 0.38189| ppl: 231.12929| %_neg_is_pos: 0.00193| lr: 0.0| temp: 1.97127 | loss: 1.1358| constrast_loss: 4.47938| div_loss: 0.6384| %_mask_idx: 0.38549| ppl: 231.42252| %_neg_is_pos: 0.00345| lr: 0.0| temp: 1.97125 | loss: 1.13776| constrast_loss: 4.48844| div_loss: 0.62593| %_mask_idx: 0.4433| ppl: 239.4072| %_neg_is_pos: 0.00266| lr: 0.0| temp: 1.97125 | loss: 1.13998| constrast_loss: 4.49616| div_loss: 0.63755| %_mask_idx: 0.46382| ppl: 231.96974| %_neg_is_pos: 0.00171| lr: 0.0| temp: 1.97124 | loss: 1.13643| constrast_loss: 4.481| div_loss: 0.64735| %_mask_idx: 0.40163| ppl: 225.69304| %_neg_is_pos: 0.00285| lr: 0.0| temp: 1.97124 | loss: 1.13511| constrast_loss: 4.47451| div_loss: 0.65925| %_mask_idx: 0.36466| ppl: 218.08246| %_neg_is_pos: 0.00367| lr: 0.0| temp: 1.97122 | loss: 1.12986| constrast_loss: 4.45532| div_loss: 0.64134| %_mask_idx: 0.38017| ppl: 229.54559| %_neg_is_pos: 0.00304| lr: 0.0| temp: 1.97122 | loss: 1.12963| constrast_loss: 4.45412| div_loss: 0.64393| %_mask_idx: 0.34586| ppl: 227.88243| %_neg_is_pos: 0.00382| lr: 0.0| temp: 1.97121 | loss: 1.14455| constrast_loss: 4.51504| div_loss: 0.6317| %_mask_idx: 0.35417| ppl: 235.7142| %_neg_is_pos: 0.00309| lr: 0.0| temp: 1.97121 | loss: 1.13261| constrast_loss: 4.4664| div_loss: 0.64045| %_mask_idx: 0.42387| ppl: 230.11084| %_neg_is_pos: 0.00235| lr: 0.0| temp: 1.9712 | loss: 1.12945| constrast_loss: 4.45396| div_loss: 0.63848| %_mask_idx: 0.41306| ppl: 231.37384| %_neg_is_pos: 0.00379| lr: 0.0| temp: 1.9712 | loss: 1.1345| constrast_loss: 4.4728| div_loss: 0.65197| %_mask_idx: 0.40883| ppl: 222.74054| %_neg_is_pos: 0.00639| lr: 0.0| temp: 1.97119 | loss: 1.1415| constrast_loss: 4.50417| div_loss: 0.61816| %_mask_idx: 0.40993| ppl: 244.3786| %_neg_is_pos: 0.00263| lr: 0.0| temp: 1.97119 | loss: 1.13045| constrast_loss: 4.4572| div_loss: 0.64585| %_mask_idx: 0.40977| ppl: 226.65524| %_neg_is_pos: 0.00299| lr: 0.0| temp: 1.97117 | loss: 1.14223| constrast_loss: 4.50686| div_loss: 0.62052| %_mask_idx: 0.37907| ppl: 242.86871| %_neg_is_pos: 0.00382| lr: 0.0| temp: 1.97117 | loss: 1.1366| constrast_loss: 4.48145| div_loss: 0.64958| %_mask_idx: 0.36184| ppl: 224.26718| %_neg_is_pos: 0.00313| lr: 0.0| temp: 1.97116 | loss: 1.13446| constrast_loss: 4.47375| div_loss: 0.6408| %_mask_idx: 0.43922| ppl: 229.88867| %_neg_is_pos: 0.00245| lr: 0.0| temp: 1.97116 | loss: 1.12999| constrast_loss: 4.45321| div_loss: 0.66741| %_mask_idx: 0.37672| ppl: 212.85568| %_neg_is_pos: 0.00504| lr: 0.0| temp: 1.97115 | loss: 1.13134| constrast_loss: 4.45973| div_loss: 0.65627| %_mask_idx: 0.38048| ppl: 219.98849| %_neg_is_pos: 0.00222| lr: 0.0| temp: 1.97115 | loss: 1.12603| constrast_loss: 4.43938| div_loss: 0.64736| %_mask_idx: 0.36717| ppl: 225.69009| %_neg_is_pos: 0.00422| lr: 0.0| temp: 1.97114 | loss: 1.13192| constrast_loss: 4.46291| div_loss: 0.64776| %_mask_idx: 0.3562| ppl: 225.43069| %_neg_is_pos: 0.00311| lr: 0.0| temp: 1.97114 | loss: 1.13957| constrast_loss: 4.49566| div_loss: 0.62625| %_mask_idx: 0.40821| ppl: 239.19873| %_neg_is_pos: 0.0016| lr: 0.0| temp: 1.97112 | loss: 1.13663| constrast_loss: 4.48374| div_loss: 0.62757| %_mask_idx: 0.34414| ppl: 238.35251| %_neg_is_pos: 0.00361| lr: 0.0| temp: 1.97112 | loss: 1.12987| constrast_loss: 4.45452| div_loss: 0.64963| %_mask_idx: 0.37234| ppl: 224.23694| %_neg_is_pos: 0.00323| lr: 0.0| temp: 1.97111 | loss: 1.12691| constrast_loss: 4.44372| div_loss: 0.63932| %_mask_idx: 0.36153| ppl: 230.83549| %_neg_is_pos: 0.00541| lr: 0.0| temp: 1.97111 | loss: 1.12933| constrast_loss: 4.45354| div_loss: 0.63794| %_mask_idx: 0.39364| ppl: 231.71689| %_neg_is_pos: 0.00552| lr: 0.0| temp: 1.9711 | loss: 1.13773| constrast_loss: 4.48553| div_loss: 0.65399| %_mask_idx: 0.36106| ppl: 221.44519| %_neg_is_pos: 0.00248| lr: 0.0| temp: 1.9711 | loss: 1.13115| constrast_loss: 4.45993| div_loss: 0.64679| %_mask_idx: 0.33318| ppl: 226.05293| %_neg_is_pos: 0.00248| lr: 0.0| temp: 1.97109 | loss: 1.13836| constrast_loss: 4.49104| div_loss: 0.62401| %_mask_idx: 0.44424| ppl: 240.63402| %_neg_is_pos: 0.00155| lr: 0.0| temp: 1.97109 | loss: 1.14251| constrast_loss: 4.50731| div_loss: 0.62716| %_mask_idx: 0.36294| ppl: 238.61853| %_neg_is_pos: 0.00203| lr: 0.0| temp: 1.97107 | loss: 1.13822| constrast_loss: 4.48974| div_loss: 0.63132| %_mask_idx: 0.38831| ppl: 235.95532| %_neg_is_pos: 0.0036| lr: 0.0| temp: 1.97107 | loss: 1.13272| constrast_loss: 4.46546| div_loss: 0.65402| %_mask_idx: 0.37108| ppl: 221.4257| %_neg_is_pos: 0.00262| lr: 0.0| temp: 1.97106 | loss: 1.12834| constrast_loss: 4.44959| div_loss: 0.6376| %_mask_idx: 0.43029| ppl: 231.93884| %_neg_is_pos: 0.00245| lr: 0.0| temp: 1.97106 | loss: 1.13963| constrast_loss: 4.4955| div_loss: 0.63028| %_mask_idx: 0.3844| ppl: 236.61877| %_neg_is_pos: 0.00318| lr: 0.0| temp: 1.97104 | loss: 1.11685| constrast_loss: 4.40117| div_loss: 0.66238| %_mask_idx: 0.33224| ppl: 216.0762| %_neg_is_pos: 0.00373| lr: 0.0| temp: 1.97104 | loss: 1.12767| constrast_loss: 4.44762| div_loss: 0.63056| %_mask_idx: 0.38957| ppl: 236.44028| %_neg_is_pos: 0.00354| lr: 0.0| temp: 1.97103 | loss: 1.12559| constrast_loss: 4.43676| div_loss: 0.6558| %_mask_idx: 0.40179| ppl: 220.29041| %_neg_is_pos: 0.00265| lr: 0.0| temp: 1.97103 | loss: 1.13189| constrast_loss: 4.46279| div_loss: 0.6479| %_mask_idx: 0.37187| ppl: 225.34415| %_neg_is_pos: 0.00366| lr: 0.0| temp: 1.97102 | loss: 1.12646| constrast_loss: 4.43988| div_loss: 0.65979| %_mask_idx: 0.38487| ppl: 217.73627| %_neg_is_pos: 0.00511| lr: 0.0| temp: 1.97102 | loss: 1.1337| constrast_loss: 4.47255| div_loss: 0.62253| %_mask_idx: 0.39709| ppl: 241.57773| %_neg_is_pos: 0.00505| lr: 0.0| temp: 1.97101 | loss: 1.1349| constrast_loss: 4.4769| div_loss: 0.62711| %_mask_idx: 0.38471| ppl: 238.64792| %_neg_is_pos: 0.00137| lr: 0.0| temp: 1.97101 | loss: 1.13852| constrast_loss: 4.49018| div_loss: 0.63916| %_mask_idx: 0.44878| ppl: 230.93643| %_neg_is_pos: 0.00252| lr: 0.0| temp: 1.97099 | loss: 1.13802| constrast_loss: 4.4878| div_loss: 0.64277| %_mask_idx: 0.3869| ppl: 228.62717| %_neg_is_pos: 0.00219| lr: 0.0| temp: 1.97099 | loss: 1.12127| constrast_loss: 4.41675| div_loss: 0.68338| %_mask_idx: 0.3808| ppl: 202.63545| %_neg_is_pos: 0.00498| lr: 0.0| temp: 1.97098 | loss: 1.13748| constrast_loss: 4.48671| div_loss: 0.63215| %_mask_idx: 0.39004| ppl: 235.42336| %_neg_is_pos: 0.00222| lr: 0.0| temp: 1.97098 | loss: 1.12921| constrast_loss: 4.45278| div_loss: 0.64062| %_mask_idx: 0.40946| ppl: 230.00037| %_neg_is_pos: 0.00521| lr: 0.0| temp: 1.97097 | loss: 1.11559| constrast_loss: 4.39506| div_loss: 0.67314| %_mask_idx: 0.28665| ppl: 209.18817| %_neg_is_pos: 0.00874| lr: 0.0| temp: 1.97097 | loss: 1.12532| constrast_loss: 4.43567| div_loss: 0.65604| %_mask_idx: 0.32331| ppl: 220.13589| %_neg_is_pos: 0.0052| lr: 0.0| temp: 1.97096 | loss: 1.1339| constrast_loss: 4.47147| div_loss: 0.64148| %_mask_idx: 0.32863| ppl: 229.45322| %_neg_is_pos: 0.00346| lr: 0.0| temp: 1.97096 | loss: 1.13064| constrast_loss: 4.45818| div_loss: 0.64381| %_mask_idx: 0.35526| ppl: 227.95993| %_neg_is_pos: 0.0047| lr: 0.0| temp: 1.97094 | loss: 1.13139| constrast_loss: 4.46073| div_loss: 0.64816| %_mask_idx: 0.3963| ppl: 225.17673| %_neg_is_pos: 0.00323| lr: 0.0| temp: 1.97094 | loss: 1.13998| constrast_loss: 4.49673| div_loss: 0.63179| %_mask_idx: 0.42105| ppl: 235.65631| %_neg_is_pos: 0.00226| lr: 0.0| temp: 1.97093 | loss: 1.13856| constrast_loss: 4.4915| div_loss: 0.62733| %_mask_idx: 0.42779| ppl: 238.50751| %_neg_is_pos: 0.00217| lr: 0.0| temp: 1.97093 | loss: 1.13437| constrast_loss: 4.47286| div_loss: 0.64627| %_mask_idx: 0.39333| ppl: 226.38988| %_neg_is_pos: 0.00413| lr: 0.0| temp: 1.97092 | loss: 1.1289| constrast_loss: 4.45153| div_loss: 0.64083| %_mask_idx: 0.39599| ppl: 229.8671| %_neg_is_pos: 0.00443| lr: 0.0| temp: 1.97092 | loss: 1.13648| constrast_loss: 4.48251| div_loss: 0.63409| %_mask_idx: 0.40069| ppl: 234.18192| %_neg_is_pos: 0.00214| lr: 0.0| temp: 1.97091 | loss: 1.11264| constrast_loss: 4.38381| div_loss: 0.66753| %_mask_idx: 0.37923| ppl: 212.78165| %_neg_is_pos: 0.00559| lr: 0.0| temp: 1.97091 | loss: 1.14637| constrast_loss: 4.52271| div_loss: 0.62755| %_mask_idx: 0.41635| ppl: 238.36694| %_neg_is_pos: 0.00326| lr: 0.0| temp: 1.97089 | loss: 1.12929| constrast_loss: 4.45361| div_loss: 0.63541| %_mask_idx: 0.39395| ppl: 233.33954| %_neg_is_pos: 0.00313| lr: 0.0| temp: 1.97089 | loss: 1.13102| constrast_loss: 4.45953| div_loss: 0.6454| %_mask_idx: 0.37281| ppl: 226.94708| %_neg_is_pos: 0.00293| lr: 0.0| temp: 1.97088 | loss: 1.13946| constrast_loss: 4.49499| div_loss: 0.62865| %_mask_idx: 0.40821| ppl: 237.662| %_neg_is_pos: 0.0044| lr: 0.0| temp: 1.97088 [2021-09-02 03:34:19,505] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 03:34:19,505] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.13686| constrast_loss: 4.48135| div_loss: 0.66098| %_mask_idx: 0.36451| ppl: 216.97267| %_neg_is_pos: 0.00316| lr: 0.0| temp: 1.97086 | loss: 1.13499| constrast_loss: 4.4765| div_loss: 0.63478| %_mask_idx: 0.43045| ppl: 233.73911| %_neg_is_pos: 0.00377| lr: 0.0| temp: 1.97086 | loss: 1.13125| constrast_loss: 4.46004| div_loss: 0.64945| %_mask_idx: 0.42434| ppl: 224.35086| %_neg_is_pos: 0.00343| lr: 0.0| temp: 1.97085 | loss: 1.12157| constrast_loss: 4.42108| div_loss: 0.65187| %_mask_idx: 0.40226| ppl: 222.80025| %_neg_is_pos: 0.00375| lr: 0.0| temp: 1.97085 | loss: 1.13476| constrast_loss: 4.47483| div_loss: 0.64223| %_mask_idx: 0.37437| ppl: 228.9716| %_neg_is_pos: 0.00381| lr: 0.0| temp: 1.97084 | loss: 1.13167| constrast_loss: 4.46328| div_loss: 0.63409| %_mask_idx: 0.34461| ppl: 234.18155| %_neg_is_pos: 0.00366| lr: 0.0| temp: 1.97084 | loss: 1.13546| constrast_loss: 4.47794| div_loss: 0.6388| %_mask_idx: 0.40038| ppl: 231.16699| %_neg_is_pos: 0.00226| lr: 0.0| temp: 1.97083 | loss: 1.1323| constrast_loss: 4.46532| div_loss: 0.63878| %_mask_idx: 0.40523| ppl: 231.17865| %_neg_is_pos: 0.00228| lr: 0.0| temp: 1.97083 | loss: 1.13729| constrast_loss: 4.48644| div_loss: 0.62727| %_mask_idx: 0.42888| ppl: 238.54993| %_neg_is_pos: 0.00157| lr: 0.0| temp: 1.97081 | loss: 1.13091| constrast_loss: 4.45957| div_loss: 0.64059| %_mask_idx: 0.43374| ppl: 230.02002| %_neg_is_pos: 0.00252| lr: 0.0| temp: 1.97081 | loss: 1.12361| constrast_loss: 4.43| div_loss: 0.64457| %_mask_idx: 0.39223| ppl: 227.47697| %_neg_is_pos: 0.00381| lr: 0.0| temp: 1.97081 | loss: 1.12994| constrast_loss: 4.45427| div_loss: 0.65499| %_mask_idx: 0.31375| ppl: 220.80881| %_neg_is_pos: 0.00404| lr: 0.0| temp: 1.97081 | loss: 1.13309| constrast_loss: 4.46812| div_loss: 0.64228| %_mask_idx: 0.35025| ppl: 228.9426| %_neg_is_pos: 0.00231| lr: 0.0| temp: 1.9708 | loss: 1.13763| constrast_loss: 4.48746| div_loss: 0.6307| %_mask_idx: 0.3739| ppl: 236.35132| %_neg_is_pos: 0.00363| lr: 0.0| temp: 1.9708 | loss: 1.13473| constrast_loss: 4.4734| div_loss: 0.65525| %_mask_idx: 0.34665| ppl: 220.64029| %_neg_is_pos: 0.00383| lr: 0.0| temp: 1.97079 | loss: 1.13562| constrast_loss: 4.47886| div_loss: 0.63616| %_mask_idx: 0.39944| ppl: 232.85965| %_neg_is_pos: 0.00298| lr: 0.0| temp: 1.97079 | loss: 1.13487| constrast_loss: 4.47593| div_loss: 0.63559| %_mask_idx: 0.46632| ppl: 233.22176| %_neg_is_pos: 0.00258| lr: 0.0| temp: 1.97077| loss: 1.13677| constrast_loss: 4.48261| div_loss: 0.64475| %_mask_idx: 0.36591| ppl: 227.35866| %_neg_is_pos: 0.0059| lr: 0.0| temp: 1.97077 | loss: 1.13311| constrast_loss: 4.4684| div_loss: 0.64046| %_mask_idx: 0.38565| ppl: 230.10724| %_neg_is_pos: 0.00846| lr: 0.0| temp: 1.97076 | loss: 1.13516| constrast_loss: 4.47541| div_loss: 0.65215| %_mask_idx: 0.37375| ppl: 222.62656| %_neg_is_pos: 0.00411| lr: 0.0| temp: 1.97076 | loss: 1.1416| constrast_loss: 4.50323| div_loss: 0.63168| %_mask_idx: 0.32033| ppl: 235.72314| %_neg_is_pos: 0.00563| lr: 0.0| temp: 1.97075 | loss: 1.13308| constrast_loss: 4.4687| div_loss: 0.63614| %_mask_idx: 0.41118| ppl: 232.86797| %_neg_is_pos: 0.00259| lr: 0.0| temp: 1.97075 | loss: 1.13543| constrast_loss: 4.47759| div_loss: 0.64132| %_mask_idx: 0.42497| ppl: 229.556| %_neg_is_pos: 0.00375| lr: 0.0| temp: 1.97074 | loss: 1.12602| constrast_loss: 4.43844| div_loss: 0.65619| %_mask_idx: 0.35385| ppl: 220.04007| %_neg_is_pos: 0.01454| lr: 0.0| temp: 1.97074 | loss: 1.11802| constrast_loss: 4.40357| div_loss: 0.68525| %_mask_idx: 0.36748| ppl: 201.44228| %_neg_is_pos: 0.01476| lr: 0.0| temp: 1.97072| loss: 1.13541| constrast_loss: 4.47728| div_loss: 0.64373| %_mask_idx: 0.40132| ppl: 228.01501| %_neg_is_pos: 0.00812| lr: 0.0| temp: 1.97072 | loss: 1.12194| constrast_loss: 4.42086| div_loss: 0.66923| %_mask_idx: 0.36137| ppl: 211.68968| %_neg_is_pos: 0.00271| lr: 0.0| temp: 1.97071 | loss: 1.13467| constrast_loss: 4.47368| div_loss: 0.64978| %_mask_idx: 0.44298| ppl: 224.13785| %_neg_is_pos: 0.00541| lr: 0.0| temp: 1.97071 | loss: 1.13349| constrast_loss: 4.46814| div_loss: 0.65836| %_mask_idx: 0.37155| ppl: 218.6519| %_neg_is_pos: 0.00503| lr: 0.0| temp: 1.97069 | loss: 1.12296| constrast_loss: 4.42602| div_loss: 0.6583| %_mask_idx: 0.38064| ppl: 218.68878| %_neg_is_pos: 0.00437| lr: 0.0| temp: 1.97069 | loss: 1.12517| constrast_loss: 4.43456| div_loss: 0.6613| %_mask_idx: 0.4151| ppl: 216.77028| %_neg_is_pos: 0.00273| lr: 0.0| temp: 1.97068 | loss: 1.14855| constrast_loss: 4.53054| div_loss: 0.63641| %_mask_idx: 0.40508| ppl: 232.69943| %_neg_is_pos: 0.00216| lr: 0.0| temp: 1.97068 | loss: 1.13177| constrast_loss: 4.45891| div_loss: 0.68175| %_mask_idx: 0.36122| ppl: 203.68027| %_neg_is_pos: 0.00479| lr: 0.0| temp: 1.97067 | loss: 1.12311| constrast_loss: 4.42623| div_loss: 0.66223| %_mask_idx: 0.40617| ppl: 216.17444| %_neg_is_pos: 0.00312| lr: 0.0| temp: 1.97067 | loss: 1.13366| constrast_loss: 4.46983| div_loss: 0.64816| %_mask_idx: 0.34038| ppl: 225.17941| %_neg_is_pos: 0.00284| lr: 0.0| temp: 1.97066 | loss: 1.14129| constrast_loss: 4.5011| div_loss: 0.64073| %_mask_idx: 0.42231| ppl: 229.93517| %_neg_is_pos: 0.00297| lr: 0.0| temp: 1.97066 | loss: 1.13786| constrast_loss: 4.48607| div_loss: 0.65386| %_mask_idx: 0.38518| ppl: 221.53259| %_neg_is_pos: 0.00295| lr: 0.0| temp: 1.97064 | loss: 1.13129| constrast_loss: 4.45835| div_loss: 0.66822| %_mask_idx: 0.36341| ppl: 212.34016| %_neg_is_pos: 0.00701| lr: 0.0| temp: 1.97064 | loss: 1.14164| constrast_loss: 4.50354| div_loss: 0.63015| %_mask_idx: 0.37954| ppl: 236.70232| %_neg_is_pos: 0.00146| lr: 0.0| temp: 1.97063 | loss: 1.13187| constrast_loss: 4.46235| div_loss: 0.65113| %_mask_idx: 0.36028| ppl: 223.27444| %_neg_is_pos: 0.00308| lr: 0.0| temp: 1.97063 | loss: 1.13611| constrast_loss: 4.48112| div_loss: 0.63307| %_mask_idx: 0.44032| ppl: 234.83514| %_neg_is_pos: 0.00192| lr: 0.0| temp: 1.97062 | loss: 1.12736| constrast_loss: 4.44308| div_loss: 0.66362| %_mask_idx: 0.38377| ppl: 215.28377| %_neg_is_pos: 0.00368| lr: 0.0| temp: 1.97062 | loss: 1.13418| constrast_loss: 4.47187| div_loss: 0.64859| %_mask_idx: 0.36685| ppl: 224.90074| %_neg_is_pos: 0.00305| lr: 0.0| temp: 1.97061 | loss: 1.13254| constrast_loss: 4.46558| div_loss: 0.64577| %_mask_idx: 0.38487| ppl: 226.70944| %_neg_is_pos: 0.00456| lr: 0.0| temp: 1.97061 | loss: 1.13467| constrast_loss: 4.47536| div_loss: 0.63335| %_mask_idx: 0.4317| ppl: 234.65303| %_neg_is_pos: 0.00188| lr: 0.0| temp: 1.97059 | loss: 1.13247| constrast_loss: 4.46539| div_loss: 0.64507| %_mask_idx: 0.3761| ppl: 227.15599| %_neg_is_pos: 0.00294| lr: 0.0| temp: 1.97059 | loss: 1.14169| constrast_loss: 4.50251| div_loss: 0.64242| %_mask_idx: 0.35354| ppl: 228.85091| %_neg_is_pos: 0.00584| lr: 0.0| temp: 1.97058 | loss: 1.12623| constrast_loss: 4.43908| div_loss: 0.65861| %_mask_idx: 0.39051| ppl: 218.48825| %_neg_is_pos: 0.00398| lr: 0.0| temp: 1.97058 | loss: 1.14327| constrast_loss: 4.51107| div_loss: 0.62016| %_mask_idx: 0.41494| ppl: 243.09888| %_neg_is_pos: 0.00292| lr: 0.0| temp: 1.97057 | loss: 1.13275| constrast_loss: 4.46692| div_loss: 0.64079| %_mask_idx: 0.461| ppl: 229.89478| %_neg_is_pos: 0.00191| lr: 0.0| temp: 1.97057 | loss: 1.14096| constrast_loss: 4.49916| div_loss: 0.64661| %_mask_idx: 0.39192| ppl: 226.16803| %_neg_is_pos: 0.0029| lr: 0.0| temp: 1.97056 | loss: 1.13378| constrast_loss: 4.47064| div_loss: 0.64474| %_mask_idx: 0.40398| ppl: 227.36932| %_neg_is_pos: 0.00276| lr: 0.0| temp: 1.97056 | loss: 1.12837| constrast_loss: 4.44852| div_loss: 0.64975| %_mask_idx: 0.3938| ppl: 224.15883| %_neg_is_pos: 0.00332| lr: 0.0| temp: 1.97054 | loss: 1.12922| constrast_loss: 4.45042| div_loss: 0.66465| %_mask_idx: 0.35495| ppl: 214.62407| %_neg_is_pos: 0.00439| lr: 0.0| temp: 1.97054 | loss: 1.13347| constrast_loss: 4.46894| div_loss: 0.6494| %_mask_idx: 0.34555| ppl: 224.38126| %_neg_is_pos: 0.00532| lr: 0.0| temp: 1.97053 | loss: 1.13624| constrast_loss: 4.47952| div_loss: 0.65434| %_mask_idx: 0.40946| ppl: 221.21992| %_neg_is_pos: 0.00214| lr: 0.0| temp: 1.97053 | loss: 1.13408| constrast_loss: 4.47227| div_loss: 0.64043| %_mask_idx: 0.42466| ppl: 230.12486| %_neg_is_pos: 0.00322| lr: 0.0| temp: 1.97051 | loss: 1.12582| constrast_loss: 4.43797| div_loss: 0.6529| %_mask_idx: 0.39536| ppl: 222.14331| %_neg_is_pos: 0.00225| lr: 0.0| temp: 1.97051 | loss: 1.12304| constrast_loss: 4.42685| div_loss: 0.65315| %_mask_idx: 0.35605| ppl: 221.98412| %_neg_is_pos: 0.0033| lr: 0.0| temp: 1.9705 | loss: 1.13503| constrast_loss: 4.47513| div_loss: 0.64995| %_mask_idx: 0.38628| ppl: 224.0329| %_neg_is_pos: 0.00365| lr: 0.0| temp: 1.9705 | loss: 1.1332| constrast_loss: 4.46777| div_loss: 0.65025| %_mask_idx: 0.38549| ppl: 223.84268| %_neg_is_pos: 0.00255| lr: 0.0| temp: 1.97049 | loss: 1.12737| constrast_loss: 4.44072| div_loss: 0.68744| %_mask_idx: 0.3714| ppl: 200.03577| %_neg_is_pos: 0.00638| lr: 0.0| temp: 1.97049 | loss: 1.13433| constrast_loss: 4.47409| div_loss: 0.63229| %_mask_idx: 0.44032| ppl: 235.33173| %_neg_is_pos: 0.00197| lr: 0.0| temp: 1.97048 | loss: 1.14587| constrast_loss: 4.52012| div_loss: 0.63339| %_mask_idx: 0.3631| ppl: 234.62784| %_neg_is_pos: 0.00257| lr: 0.0| temp: 1.97048 | loss: 1.12577| constrast_loss: 4.43777| div_loss: 0.6531| %_mask_idx: 0.3526| ppl: 222.01294| %_neg_is_pos: 0.0034| lr: 0.0| temp: 1.97046 | loss: 1.13428| constrast_loss: 4.47204| div_loss: 0.65067| %_mask_idx: 0.41761| ppl: 223.56982| %_neg_is_pos: 0.00253| lr: 0.0| temp: 1.97046 | loss: 1.12836| constrast_loss: 4.44663| div_loss: 0.66831| %_mask_idx: 0.40695| ppl: 212.27856| %_neg_is_pos: 0.0039| lr: 0.0| temp: 1.97045 | loss: 1.1261| constrast_loss: 4.43763| div_loss: 0.66756| %_mask_idx: 0.40194| ppl: 212.75928| %_neg_is_pos: 0.00388| lr: 0.0| temp: 1.97045 | loss: 1.13397| constrast_loss: 4.47264| div_loss: 0.63241| %_mask_idx: 0.33349| ppl: 235.2587| %_neg_is_pos: 0.00374| lr: 0.0| temp: 1.97044 | loss: 1.12766| constrast_loss: 4.44741| div_loss: 0.63231| %_mask_idx: 0.38878| ppl: 235.32419| %_neg_is_pos: 0.00312| lr: 0.0| temp: 1.97044 | loss: 1.12749| constrast_loss: 4.44474| div_loss: 0.6523| %_mask_idx: 0.36106| ppl: 222.52869| %_neg_is_pos: 0.00351| lr: 0.0| temp: 1.97043 | loss: 1.12744| constrast_loss: 4.44495| div_loss: 0.64793| %_mask_idx: 0.35056| ppl: 225.32367| %_neg_is_pos: 0.00357| lr: 0.0| temp: 1.97043 | loss: 1.14285| constrast_loss: 4.50877| div_loss: 0.62643| %_mask_idx: 0.4187| ppl: 239.08292| %_neg_is_pos: 0.00143| lr: 0.0| temp: 1.97041 | loss: 1.12159| constrast_loss: 4.42103| div_loss: 0.65336| %_mask_idx: 0.37077| ppl: 221.849| %_neg_is_pos: 0.00373| lr: 0.0| temp: 1.97041 | loss: 1.13047| constrast_loss: 4.45515| div_loss: 0.66746| %_mask_idx: 0.3927| ppl: 212.82703| %_neg_is_pos: 0.00629| lr: 0.0| temp: 1.9704 | loss: 1.13561| constrast_loss: 4.47766| div_loss: 0.64793| %_mask_idx: 0.41776| ppl: 225.32559| %_neg_is_pos: 0.00289| lr: 0.0| temp: 1.9704 | loss: 1.12312| constrast_loss: 4.42612| div_loss: 0.66351| %_mask_idx: 0.41197| ppl: 215.35568| %_neg_is_pos: 0.00638| lr: 0.0| temp: 1.97039 | loss: 1.1376| constrast_loss: 4.48543| div_loss: 0.64971| %_mask_idx: 0.40742| ppl: 224.18292| %_neg_is_pos: 0.00295| lr: 0.0| temp: 1.97039 | loss: 1.12959| constrast_loss: 4.45229| div_loss: 0.66076| %_mask_idx: 0.36842| ppl: 217.11678| %_neg_is_pos: 0.00286| lr: 0.0| temp: 1.97038 | loss: 1.14061| constrast_loss: 4.49831| div_loss: 0.64116| %_mask_idx: 0.37061| ppl: 229.65677| %_neg_is_pos: 0.00284| lr: 0.0| temp: 1.97038 | loss: 1.1286| constrast_loss: 4.44958| div_loss: 0.64821| %_mask_idx: 0.40946| ppl: 225.14401| %_neg_is_pos: 0.0027| lr: 0.0| temp: 1.97036 | loss: 1.13243| constrast_loss: 4.46477| div_loss: 0.64952| %_mask_idx: 0.34696| ppl: 224.30765| %_neg_is_pos: 0.00334| lr: 0.0| temp: 1.97036 | loss: 1.13192| constrast_loss: 4.46304| div_loss: 0.64635| %_mask_idx: 0.38894| ppl: 226.33656| %_neg_is_pos: 0.00277| lr: 0.0| temp: 1.97035 | loss: 1.11506| constrast_loss: 4.39299| div_loss: 0.67243| %_mask_idx: 0.33067| ppl: 209.64389| %_neg_is_pos: 0.00668| lr: 0.0| temp: 1.97035 | loss: 1.12525| constrast_loss: 4.4354| div_loss: 0.65585| %_mask_idx: 0.40915| ppl: 220.25439| %_neg_is_pos: 0.00297| lr: 0.0| temp: 1.97033 | loss: 1.12372| constrast_loss: 4.42924| div_loss: 0.6562| %_mask_idx: 0.39176| ppl: 220.03009| %_neg_is_pos: 0.00349| lr: 0.0| temp: 1.97033 | loss: 1.13862| constrast_loss: 4.48941| div_loss: 0.65057| %_mask_idx: 0.39865| ppl: 223.63313| %_neg_is_pos: 0.00211| lr: 0.0| temp: 1.97032 | loss: 1.12232| constrast_loss: 4.42428| div_loss: 0.64996| %_mask_idx: 0.35667| ppl: 224.02765| %_neg_is_pos: 0.0045| lr: 0.0| temp: 1.97032 | loss: 1.12684| constrast_loss: 4.43958| div_loss: 0.67772| %_mask_idx: 0.40367| ppl: 206.26068| %_neg_is_pos: 0.0041| lr: 0.0| temp: 1.97031 | loss: 1.13257| constrast_loss: 4.46754| div_loss: 0.62741| %_mask_idx: 0.36795| ppl: 238.45782| %_neg_is_pos: 0.00142| lr: 0.0| temp: 1.97031 | loss: 1.12666| constrast_loss: 4.44004| div_loss: 0.66615| %_mask_idx: 0.39662| ppl: 213.66663| %_neg_is_pos: 0.0031| lr: 0.0| temp: 1.9703 | loss: 1.12765| constrast_loss: 4.44455| div_loss: 0.66054| %_mask_idx: 0.37155| ppl: 217.25507| %_neg_is_pos: 0.00315| lr: 0.0| temp: 1.9703 | loss: 1.14771| constrast_loss: 4.52456| div_loss: 0.66287| %_mask_idx: 0.38252| ppl: 215.76062| %_neg_is_pos: 0.00282| lr: 0.0| temp: 1.97028 | loss: 1.12734| constrast_loss: 4.44504| div_loss: 0.6431| %_mask_idx: 0.39051| ppl: 228.41469| %_neg_is_pos: 0.00266| lr: 0.0| temp: 1.97028 | loss: 1.11636| constrast_loss: 4.39936| div_loss: 0.66091| %_mask_idx: 0.35182| ppl: 217.01967| %_neg_is_pos: 0.00428| lr: 0.0| temp: 1.97027 | loss: 1.12842| constrast_loss: 4.44967| div_loss: 0.64027| %_mask_idx: 0.37093| ppl: 230.22932| %_neg_is_pos: 0.00417| lr: 0.0| temp: 1.97027 | loss: 1.13036| constrast_loss: 4.4565| div_loss: 0.64958| %_mask_idx: 0.39568| ppl: 224.26932| %_neg_is_pos: 0.00253| lr: 0.0| temp: 1.97026 | loss: 1.13422| constrast_loss: 4.47063| div_loss: 0.66238| %_mask_idx: 0.41197| ppl: 216.07907| %_neg_is_pos: 0.00389| lr: 0.0| temp: 1.97026 | loss: 1.13769| constrast_loss: 4.48695| div_loss: 0.63811| %_mask_idx: 0.40116| ppl: 231.61119| %_neg_is_pos: 0.00211| lr: 0.0| temp: 1.97025 | loss: 1.136| constrast_loss: 4.47809| div_loss: 0.65912| %_mask_idx: 0.38784| ppl: 218.16615| %_neg_is_pos: 0.00293| lr: 0.0| temp: 1.97025 | loss: 1.123| constrast_loss: 4.42466| div_loss: 0.67356| %_mask_idx: 0.39818| ppl: 208.91937| %_neg_is_pos: 0.00692| lr: 0.0| temp: 1.97023 | loss: 1.12444| constrast_loss: 4.43259| div_loss: 0.65182| %_mask_idx: 0.3927| ppl: 222.83409| %_neg_is_pos: 0.00495| lr: 0.0| temp: 1.97023 | loss: 1.13252| constrast_loss: 4.4663| div_loss: 0.63774| %_mask_idx: 0.41071| ppl: 231.84572| %_neg_is_pos: 0.00519| lr: 0.0| temp: 1.97022 | loss: 1.1257| constrast_loss: 4.4373| div_loss: 0.65509| %_mask_idx: 0.40147| ppl: 220.73964| %_neg_is_pos: 0.00349| lr: 0.0| temp: 1.97022 | loss: 1.12803| constrast_loss: 4.44586| div_loss: 0.66254| %_mask_idx: 0.36075| ppl: 215.97752| %_neg_is_pos: 0.00424| lr: 0.0| temp: 1.97021 | loss: 1.13462| constrast_loss: 4.47516| div_loss: 0.63327| %_mask_idx: 0.375| ppl: 234.70526| %_neg_is_pos: 0.00254| lr: 0.0| temp: 1.97021 | loss: 1.12811| constrast_loss: 4.44868| div_loss: 0.63768| %_mask_idx: 0.36795| ppl: 231.88461| %_neg_is_pos: 0.00323| lr: 0.0| temp: 1.9702 | loss: 1.13605| constrast_loss: 4.48077| div_loss: 0.63439| %_mask_idx: 0.36388| ppl: 233.99271| %_neg_is_pos: 0.00242| lr: 0.0| temp: 1.9702 | loss: 1.13596| constrast_loss: 4.47796| div_loss: 0.65885| %_mask_idx: 0.36122| ppl: 218.33463| %_neg_is_pos: 0.00429| lr: 0.0| temp: 1.97018 | loss: 1.12829| constrast_loss: 4.44631| div_loss: 0.66868| %_mask_idx: 0.35652| ppl: 212.04263| %_neg_is_pos: 0.00488| lr: 0.0| temp: 1.97018 | loss: 1.12541| constrast_loss: 4.43486| div_loss: 0.66764| %_mask_idx: 0.31595| ppl: 212.70755| %_neg_is_pos: 0.00617| lr: 0.0| temp: 1.97017 | loss: 1.13258| constrast_loss: 4.46627| div_loss: 0.64042| %_mask_idx: 0.42528| ppl: 230.13156| %_neg_is_pos: 0.00254| lr: 0.0| temp: 1.97017 [2021-09-02 03:43:33,518] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 03:43:33,518] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.14052| constrast_loss: 4.4974| div_loss: 0.64683| %_mask_idx: 0.40085| ppl: 226.02737| %_neg_is_pos: 0.00204| lr: 0.0| temp: 1.97015 | loss: 1.13213| constrast_loss: 4.46304| div_loss: 0.65465| %_mask_idx: 0.34759| ppl: 221.02328| %_neg_is_pos: 0.00349| lr: 0.0| temp: 1.97015 | loss: 1.12422| constrast_loss: 4.43164| div_loss: 0.6523| %_mask_idx: 0.40586| ppl: 222.5282| %_neg_is_pos: 0.00349| lr: 0.0| temp: 1.97014 | loss: 1.13334| constrast_loss: 4.4677| div_loss: 0.65669| %_mask_idx: 0.38831| ppl: 219.72098| %_neg_is_pos: 0.00488| lr: 0.0| temp: 1.97014 | loss: 1.13066| constrast_loss: 4.4571| div_loss: 0.65532| %_mask_idx: 0.37892| ppl: 220.59521| %_neg_is_pos: 0.00504| lr: 0.0| temp: 1.97014 | loss: 1.13085| constrast_loss: 4.45656| div_loss: 0.66852| %_mask_idx: 0.32002| ppl: 212.14456| %_neg_is_pos: 0.00426| lr: 0.0| temp: 1.97014 | loss: 1.13404| constrast_loss: 4.47163| div_loss: 0.64538| %_mask_idx: 0.38784| ppl: 226.95805| %_neg_is_pos: 0.00339| lr: 0.0| temp: 1.97013 | loss: 1.13307| constrast_loss: 4.46759| div_loss: 0.64684| %_mask_idx: 0.375| ppl: 226.0206| %_neg_is_pos: 0.00447| lr: 0.0| temp: 1.97013 | loss: 1.13442| constrast_loss: 4.47255| div_loss: 0.65128| %_mask_idx: 0.39881| ppl: 223.18054| %_neg_is_pos: 0.00458| lr: 0.0| temp: 1.97011 | loss: 1.13244| constrast_loss: 4.46332| div_loss: 0.66452| %_mask_idx: 0.37343| ppl: 214.7085| %_neg_is_pos: 0.00491| lr: 0.0| temp: 1.97011 | loss: 1.1419| constrast_loss: 4.50394| div_loss: 0.63653| %_mask_idx: 0.42356| ppl: 232.62054| %_neg_is_pos: 0.00294| lr: 0.0| temp: 1.9701 | loss: 1.12871| constrast_loss: 4.45044| div_loss: 0.64415| %_mask_idx: 0.39599| ppl: 227.742| %_neg_is_pos: 0.00412| lr: 0.0| temp: 1.9701 | loss: 1.13316| constrast_loss: 4.46845| div_loss: 0.64198| %_mask_idx: 0.38878| ppl: 229.13287| %_neg_is_pos: 0.00214| lr: 0.0| temp: 1.97009 | loss: 1.13955| constrast_loss: 4.49297| div_loss: 0.65239| %_mask_idx: 0.40304| ppl: 222.46884| %_neg_is_pos: 0.00331| lr: 0.0| temp: 1.97009 | loss: 1.12675| constrast_loss: 4.44238| div_loss: 0.64633| %_mask_idx: 0.37406| ppl: 226.35112| %_neg_is_pos: 0.00411| lr: 0.0| temp: 1.97008 | loss: 1.13484| constrast_loss: 4.47566| div_loss: 0.63714| %_mask_idx: 0.42607| ppl: 232.22943| %_neg_is_pos: 0.00256| lr: 0.0| temp: 1.97008 | loss: 1.13134| constrast_loss: 4.4614| div_loss: 0.63973| %_mask_idx: 0.38033| ppl: 230.57396| %_neg_is_pos: 0.00618| lr: 0.0| temp: 1.97006| loss: 1.13463| constrast_loss: 4.47556| div_loss: 0.62946| %_mask_idx: 0.40664| ppl: 237.14655| %_neg_is_pos: 0.00411| lr: 0.0| temp: 1.97006 | loss: 1.11751| constrast_loss: 4.40455| div_loss: 0.65466| %_mask_idx: 0.38549| ppl: 221.02007| %_neg_is_pos: 0.00658| lr: 0.0| temp: 1.97005 | loss: 1.128| constrast_loss: 4.44607| div_loss: 0.65935| %_mask_idx: 0.38456| ppl: 218.01773| %_neg_is_pos: 0.00546| lr: 0.0| temp: 1.97005 | loss: 1.11826| constrast_loss: 4.40572| div_loss: 0.67322| %_mask_idx: 0.37657| ppl: 209.14102| %_neg_is_pos: 0.01002| lr: 0.0| temp: 1.97004 | loss: 1.13369| constrast_loss: 4.46874| div_loss: 0.66027| %_mask_idx: 0.33991| ppl: 217.42749| %_neg_is_pos: 0.00561| lr: 0.0| temp: 1.97004 | loss: 1.13372| constrast_loss: 4.46998| div_loss: 0.64897| %_mask_idx: 0.4198| ppl: 224.65854| %_neg_is_pos: 0.00317| lr: 0.0| temp: 1.97003 | loss: 1.12024| constrast_loss: 4.41341| div_loss: 0.67545| %_mask_idx: 0.38972| ppl: 207.71436| %_neg_is_pos: 0.00546| lr: 0.0| temp: 1.97003 | loss: 1.13221| constrast_loss: 4.46571| div_loss: 0.63144| %_mask_idx: 0.35558| ppl: 235.879| %_neg_is_pos: 0.00346| lr: 0.0| temp: 1.97001 | loss: 1.13492| constrast_loss: 4.47475| div_loss: 0.64943| %_mask_idx: 0.3927| ppl: 224.36169| %_neg_is_pos: 0.00258| lr: 0.0| temp: 1.97001 | loss: 1.13958| constrast_loss: 4.49164| div_loss: 0.66681| %_mask_idx: 0.40821| ppl: 213.23854| %_neg_is_pos: 0.00357| lr: 0.0| temp: 1.97 | loss: 1.14197| constrast_loss: 4.50244| div_loss: 0.65448| %_mask_idx: 0.37704| ppl: 221.13568| %_neg_is_pos: 0.00212| lr: 0.0| temp: 1.97 | loss: 1.13173| constrast_loss: 4.46197| div_loss: 0.64955| %_mask_idx: 0.42763| ppl: 224.28967| %_neg_is_pos: 0.00169| lr: 0.0| temp: 1.96998 | loss: 1.13408| constrast_loss: 4.47093| div_loss: 0.65411| %_mask_idx: 0.38769| ppl: 221.36703| %_neg_is_pos: 0.00271| lr: 0.0| temp: 1.96998 | loss: 1.13575| constrast_loss: 4.47766| div_loss: 0.65337| %_mask_idx: 0.40758| ppl: 221.84604| %_neg_is_pos: 0.00199| lr: 0.0| temp: 1.96997 | loss: 1.12299| constrast_loss: 4.42523| div_loss: 0.6672| %_mask_idx: 0.30545| ppl: 212.99432| %_neg_is_pos: 0.00396| lr: 0.0| temp: 1.96997 | loss: 1.13771| constrast_loss: 4.48699| div_loss: 0.63861| %_mask_idx: 0.37798| ppl: 231.29123| %_neg_is_pos: 0.00198| lr: 0.0| temp: 1.96996 | loss: 1.1381| constrast_loss: 4.4875| div_loss: 0.64885| %_mask_idx: 0.40257| ppl: 224.73822| %_neg_is_pos: 0.0025| lr: 0.0| temp: 1.96996 | loss: 1.14452| constrast_loss: 4.51473| div_loss: 0.6335| %_mask_idx: 0.4115| ppl: 234.56268| %_neg_is_pos: 0.00148| lr: 0.0| temp: 1.96995 | loss: 1.1211| constrast_loss: 4.41529| div_loss: 0.69098| %_mask_idx: 0.33741| ppl: 197.77304| %_neg_is_pos: 0.00693| lr: 0.0| temp: 1.96995 | loss: 1.12641| constrast_loss: 4.43952| div_loss: 0.66129| %_mask_idx: 0.40789| ppl: 216.77161| %_neg_is_pos: 0.00276| lr: 0.0| temp: 1.96993 | loss: 1.13934| constrast_loss: 4.49091| div_loss: 0.6643| %_mask_idx: 0.39724| ppl: 214.84529| %_neg_is_pos: 0.00265| lr: 0.0| temp: 1.96993 | loss: 1.14287| constrast_loss: 4.50849| div_loss: 0.62988| %_mask_idx: 0.3562| ppl: 236.87605| %_neg_is_pos: 0.0027| lr: 0.0| temp: 1.96992 | loss: 1.13736| constrast_loss: 4.48302| div_loss: 0.66428| %_mask_idx: 0.38127| ppl: 214.86328| %_neg_is_pos: 0.00332| lr: 0.0| temp: 1.96992 | loss: 1.13503| constrast_loss: 4.47451| div_loss: 0.65606| %_mask_idx: 0.36435| ppl: 220.12444| %_neg_is_pos: 0.0031| lr: 0.0| temp: 1.96991 | loss: 1.13593| constrast_loss: 4.48003| div_loss: 0.63671| %_mask_idx: 0.40602| ppl: 232.50607| %_neg_is_pos: 0.00161| lr: 0.0| temp: 1.96991 | loss: 1.12604| constrast_loss: 4.4373| div_loss: 0.66855| %_mask_idx: 0.34774| ppl: 212.12961| %_neg_is_pos: 0.0039| lr: 0.0| temp: 1.9699 | loss: 1.13311| constrast_loss: 4.46638| div_loss: 0.66074| %_mask_idx: 0.39411| ppl: 217.12708| %_neg_is_pos: 0.00224| lr: 0.0| temp: 1.9699 | loss: 1.13322| constrast_loss: 4.46959| div_loss: 0.63301| %_mask_idx: 0.38863| ppl: 234.87469| %_neg_is_pos: 0.00232| lr: 0.0| temp: 1.96988 | loss: 1.14392| constrast_loss: 4.51161| div_loss: 0.64076| %_mask_idx: 0.43468| ppl: 229.91187| %_neg_is_pos: 0.00243| lr: 0.0| temp: 1.96988 | loss: 1.13198| constrast_loss: 4.4613| div_loss: 0.6662| %_mask_idx: 0.40868| ppl: 213.63464| %_neg_is_pos: 0.00182| lr: 0.0| temp: 1.96987 | loss: 1.13734| constrast_loss: 4.48394| div_loss: 0.65423| %_mask_idx: 0.31939| ppl: 221.29092| %_neg_is_pos: 0.00291| lr: 0.0| temp: 1.96987 | loss: 1.14078| constrast_loss: 4.49732| div_loss: 0.65799| %_mask_idx: 0.39223| ppl: 218.88882| %_neg_is_pos: 0.00386| lr: 0.0| temp: 1.96986 | loss: 1.13792| constrast_loss: 4.48588| div_loss: 0.65812| %_mask_idx: 0.3927| ppl: 218.80569| %_neg_is_pos: 0.00225| lr: 0.0| temp: 1.96986 | loss: 1.13557| constrast_loss: 4.47612| div_loss: 0.66169| %_mask_idx: 0.37108| ppl: 216.51572| %_neg_is_pos: 0.00268| lr: 0.0| temp: 1.96985 | loss: 1.14532| constrast_loss: 4.51664| div_loss: 0.64623| %_mask_idx: 0.36764| ppl: 226.41162| %_neg_is_pos: 0.00232| lr: 0.0| temp: 1.96985 | loss: 1.13122| constrast_loss: 4.45922| div_loss: 0.65667| %_mask_idx: 0.42513| ppl: 219.72847| %_neg_is_pos: 0.00226| lr: 0.0| temp: 1.96983 | loss: 1.13925| constrast_loss: 4.49192| div_loss: 0.6508| %_mask_idx: 0.38549| ppl: 223.48564| %_neg_is_pos: 0.0029| lr: 0.0| temp: 1.96983 | loss: 1.14534| constrast_loss: 4.51723| div_loss: 0.64117| %_mask_idx: 0.36811| ppl: 229.65402| %_neg_is_pos: 0.00194| lr: 0.0| temp: 1.96982 | loss: 1.13744| constrast_loss: 4.48529| div_loss: 0.6446| %_mask_idx: 0.42998| ppl: 227.45389| %_neg_is_pos: 0.00297| lr: 0.0| temp: 1.96982 | loss: 1.13037| constrast_loss: 4.45634| div_loss: 0.6513| %_mask_idx: 0.41416| ppl: 223.16736| %_neg_is_pos: 0.00216| lr: 0.0| temp: 1.9698 | loss: 1.13506| constrast_loss: 4.4744| div_loss: 0.65838| %_mask_idx: 0.35135| ppl: 218.63834| %_neg_is_pos: 0.00322| lr: 0.0| temp: 1.9698 | loss: 1.13446| constrast_loss: 4.47274| div_loss: 0.65088| %_mask_idx: 0.35401| ppl: 223.43384| %_neg_is_pos: 0.0023| lr: 0.0| temp: 1.96979 | loss: 1.13305| constrast_loss: 4.46686| div_loss: 0.6535| %_mask_idx: 0.40539| ppl: 221.76132| %_neg_is_pos: 0.00314| lr: 0.0| temp: 1.96979 | loss: 1.13955| constrast_loss: 4.49395| div_loss: 0.64252| %_mask_idx: 0.38753| ppl: 228.78596| %_neg_is_pos: 0.00333| lr: 0.0| temp: 1.96978 | loss: 1.1369| constrast_loss: 4.48251| div_loss: 0.65095| %_mask_idx: 0.39427| ppl: 223.38936| %_neg_is_pos: 0.00265| lr: 0.0| temp: 1.96978 | loss: 1.13537| constrast_loss: 4.47455| div_loss: 0.66945| %_mask_idx: 0.36623| ppl: 211.55016| %_neg_is_pos: 0.00298| lr: 0.0| temp: 1.96977 | loss: 1.13879| constrast_loss: 4.49082| div_loss: 0.6432| %_mask_idx: 0.35479| ppl: 228.35495| %_neg_is_pos: 0.00214| lr: 0.0| temp: 1.96977 | loss: 1.13413| constrast_loss: 4.47176| div_loss: 0.64761| %_mask_idx: 0.39881| ppl: 225.52786| %_neg_is_pos: 0.00238| lr: 0.0| temp: 1.96975 | loss: 1.14211| constrast_loss: 4.50453| div_loss: 0.63921| %_mask_idx: 0.44173| ppl: 230.90283| %_neg_is_pos: 0.00188| lr: 0.0| temp: 1.96975 | loss: 1.13485| constrast_loss: 4.47381| div_loss: 0.65569| %_mask_idx: 0.42199| ppl: 220.35635| %_neg_is_pos: 0.00237| lr: 0.0| temp: 1.96974 | loss: 1.12815| constrast_loss: 4.4469| div_loss: 0.65699| %_mask_idx: 0.38236| ppl: 219.5239| %_neg_is_pos: 0.0043| lr: 0.0| temp: 1.96974 | loss: 1.13372| constrast_loss: 4.46846| div_loss: 0.66426| %_mask_idx: 0.34539| ppl: 214.8714| %_neg_is_pos: 0.00346| lr: 0.0| temp: 1.96973 | loss: 1.13344| constrast_loss: 4.46784| div_loss: 0.65928| %_mask_idx: 0.34101| ppl: 218.06277| %_neg_is_pos: 0.00212| lr: 0.0| temp: 1.96973 | loss: 1.13253| constrast_loss: 4.46418| div_loss: 0.65927| %_mask_idx: 0.33255| ppl: 218.06435| %_neg_is_pos: 0.00409| lr: 0.0| temp: 1.96972 | loss: 1.1378| constrast_loss: 4.48486| div_loss: 0.66361| %_mask_idx: 0.45692| ppl: 215.29065| %_neg_is_pos: 0.00202| lr: 0.0| temp: 1.96972 | loss: 1.14451| constrast_loss: 4.51229| div_loss: 0.65768| %_mask_idx: 0.37735| ppl: 219.08649| %_neg_is_pos: 0.00341| lr: 0.0| temp: 1.9697 | loss: 1.13745| constrast_loss: 4.48408| div_loss: 0.65739| %_mask_idx: 0.35213| ppl: 219.27254| %_neg_is_pos: 0.00258| lr: 0.0| temp: 1.9697 | loss: 1.13373| constrast_loss: 4.46957| div_loss: 0.65363| %_mask_idx: 0.39568| ppl: 221.67426| %_neg_is_pos: 0.00313| lr: 0.0| temp: 1.96969 | loss: 1.1416| constrast_loss: 4.50217| div_loss: 0.6425| %_mask_idx: 0.36983| ppl: 228.80042| %_neg_is_pos: 0.00291| lr: 0.0| temp: 1.96969 | loss: 1.14065| constrast_loss: 4.49736| div_loss: 0.6525| %_mask_idx: 0.44095| ppl: 222.40302| %_neg_is_pos: 0.00155| lr: 0.0| temp: 1.96968 | loss: 1.13132| constrast_loss: 4.4601| div_loss: 0.65166| %_mask_idx: 0.36278| ppl: 222.93979| %_neg_is_pos: 0.00271| lr: 0.0| temp: 1.96968 | loss: 1.14078| constrast_loss: 4.49777| div_loss: 0.65341| %_mask_idx: 0.41009| ppl: 221.81749| %_neg_is_pos: 0.00219| lr: 0.0| temp: 1.96967 | loss: 1.13623| constrast_loss: 4.47958| div_loss: 0.65347| %_mask_idx: 0.39834| ppl: 221.78145| %_neg_is_pos: 0.00342| lr: 0.0| temp: 1.96967 | loss: 1.14492| constrast_loss: 4.51592| div_loss: 0.63756| %_mask_idx: 0.40555| ppl: 231.95924| %_neg_is_pos: 0.00149| lr: 0.0| temp: 1.96965 | loss: 1.13784| constrast_loss: 4.48557| div_loss: 0.65774| %_mask_idx: 0.38988| ppl: 219.04453| %_neg_is_pos: 0.00258| lr: 0.0| temp: 1.96965 | loss: 1.13275| constrast_loss: 4.46417| div_loss: 0.66814| %_mask_idx: 0.34712| ppl: 212.38821| %_neg_is_pos: 0.00382| lr: 0.0| temp: 1.96964 | loss: 1.14011| constrast_loss: 4.49587| div_loss: 0.64552| %_mask_idx: 0.41385| ppl: 226.87039| %_neg_is_pos: 0.00237| lr: 0.0| temp: 1.96964 | loss: 1.13073| constrast_loss: 4.45791| div_loss: 0.64991| %_mask_idx: 0.401| ppl: 224.05557| %_neg_is_pos: 0.00338| lr: 0.0| temp: 1.96962 | loss: 1.138| constrast_loss: 4.48772| div_loss: 0.64276| %_mask_idx: 0.42137| ppl: 228.63449| %_neg_is_pos: 0.00175| lr: 0.0| temp: 1.96962 | loss: 1.13749| constrast_loss: 4.48548| div_loss: 0.64461| %_mask_idx: 0.37892| ppl: 227.45056| %_neg_is_pos: 0.00335| lr: 0.0| temp: 1.96961 | loss: 1.14109| constrast_loss: 4.49988| div_loss: 0.64474| %_mask_idx: 0.39333| ppl: 227.36795| %_neg_is_pos: 0.00192| lr: 0.0| temp: 1.96961 | loss: 1.14197| constrast_loss: 4.5044| div_loss: 0.63489| %_mask_idx: 0.40727| ppl: 233.66937| %_neg_is_pos: 0.00171| lr: 0.0| temp: 1.9696 | loss: 1.1328| constrast_loss: 4.46521| div_loss: 0.65986| %_mask_idx: 0.36591| ppl: 217.68803| %_neg_is_pos: 0.00246| lr: 0.0| temp: 1.9696 | loss: 1.14249| constrast_loss: 4.50572| div_loss: 0.64235| %_mask_idx: 0.44596| ppl: 228.89917| %_neg_is_pos: 0.00118| lr: 0.0| temp: 1.96959 | loss: 1.14505| constrast_loss: 4.51455| div_loss: 0.65667| %_mask_idx: 0.34179| ppl: 219.73312| %_neg_is_pos: 0.00346| lr: 0.0| temp: 1.96959 | loss: 1.12759| constrast_loss: 4.44214| div_loss: 0.68225| %_mask_idx: 0.37751| ppl: 203.36087| %_neg_is_pos: 0.00312| lr: 0.0| temp: 1.96957 | loss: 1.13262| constrast_loss: 4.46661| div_loss: 0.63851| %_mask_idx: 0.41212| ppl: 231.35558| %_neg_is_pos: 0.00259| lr: 0.0| temp: 1.96957 | loss: 1.13062| constrast_loss: 4.45734| div_loss: 0.65136| %_mask_idx: 0.35793| ppl: 223.12686| %_neg_is_pos: 0.0033| lr: 0.0| temp: 1.96956 | loss: 1.1131| constrast_loss: 4.38054| div_loss: 0.71875| %_mask_idx: 0.3114| ppl: 180.00156| %_neg_is_pos: 0.0054| lr: 0.0| temp: 1.96956 | loss: 1.1295| constrast_loss: 4.45199| div_loss: 0.66004| %_mask_idx: 0.38456| ppl: 217.57297| %_neg_is_pos: 0.00322| lr: 0.0| temp: 1.96955 | loss: 1.12852| constrast_loss: 4.44728| div_loss: 0.66804| %_mask_idx: 0.34853| ppl: 212.45277| %_neg_is_pos: 0.00413| lr: 0.0| temp: 1.96955 | loss: 1.1272| constrast_loss: 4.44371| div_loss: 0.65104| %_mask_idx: 0.37453| ppl: 223.3363| %_neg_is_pos: 0.00333| lr: 0.0| temp: 1.96954 | loss: 1.13189| constrast_loss: 4.46138| div_loss: 0.66196| %_mask_idx: 0.39458| ppl: 216.34644| %_neg_is_pos: 0.00315| lr: 0.0| temp: 1.96954 | loss: 1.12904| constrast_loss: 4.44917| div_loss: 0.66989| %_mask_idx: 0.37954| ppl: 211.27283| %_neg_is_pos: 0.00285| lr: 0.0| temp: 1.96952 | loss: 1.14603| constrast_loss: 4.51895| div_loss: 0.65154| %_mask_idx: 0.39301| ppl: 223.01271| %_neg_is_pos: 0.00286| lr: 0.0| temp: 1.96952 | loss: 1.1351| constrast_loss: 4.47807| div_loss: 0.62338| %_mask_idx: 0.40179| ppl: 241.0394| %_neg_is_pos: 0.00141| lr: 0.0| temp: 1.96951 | loss: 1.13136| constrast_loss: 4.4597| div_loss: 0.65743| %_mask_idx: 0.39959| ppl: 219.24753| %_neg_is_pos: 0.00404| lr: 0.0| temp: 1.96951 | loss: 1.14061| constrast_loss: 4.49661| div_loss: 0.65843| %_mask_idx: 0.37907| ppl: 218.60562| %_neg_is_pos: 0.00303| lr: 0.0| temp: 1.9695 | loss: 1.13023| constrast_loss: 4.4563| div_loss: 0.64617| %_mask_idx: 0.36889| ppl: 226.4519| %_neg_is_pos: 0.0036| lr: 0.0| temp: 1.9695 | loss: 1.1297| constrast_loss: 4.45243| div_loss: 0.66352| %_mask_idx: 0.37735| ppl: 215.34422| %_neg_is_pos: 0.00295| lr: 0.0| temp: 1.96949 | loss: 1.14542| constrast_loss: 4.51722| div_loss: 0.64472| %_mask_idx: 0.35323| ppl: 227.37723| %_neg_is_pos: 0.00258| lr: 0.0| temp: 1.96949 | loss: 1.1354| constrast_loss: 4.47711| div_loss: 0.64497| %_mask_idx: 0.37453| ppl: 227.22195| %_neg_is_pos: 0.00327| lr: 0.0| temp: 1.96948 | loss: 1.14122| constrast_loss: 4.49981| div_loss: 0.65084| %_mask_idx: 0.42638| ppl: 223.46439| %_neg_is_pos: 0.00162| lr: 0.0| temp: 1.96948 | loss: 1.1403| constrast_loss: 4.49584| div_loss: 0.65374| %_mask_idx: 0.40852| ppl: 221.60707| %_neg_is_pos: 0.00188| lr: 0.0| temp: 1.96947 | loss: 1.13705| constrast_loss: 4.48275| div_loss: 0.65468| %_mask_idx: 0.414| ppl: 221.00768| %_neg_is_pos: 0.00193| lr: 0.0| temp: 1.96947 [2021-09-02 03:52:48,405] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 03:52:48,405] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.14152| constrast_loss: 4.50034| div_loss: 0.65741| %_mask_idx: 0.39536| ppl: 219.26053| %_neg_is_pos: 0.00277| lr: 0.0| temp: 1.96945 | loss: 1.13837| constrast_loss: 4.48893| div_loss: 0.64543| %_mask_idx: 0.31924| ppl: 226.92216| %_neg_is_pos: 0.00296| lr: 0.0| temp: 1.96945 | loss: 1.13734| constrast_loss: 4.48506| div_loss: 0.64293| %_mask_idx: 0.41949| ppl: 228.52673| %_neg_is_pos: 0.00288| lr: 0.0| temp: 1.96944 | loss: 1.12773| constrast_loss: 4.44368| div_loss: 0.67232| %_mask_idx: 0.36012| ppl: 209.71655| %_neg_is_pos: 0.00512| lr: 0.0| temp: 1.96944 | loss: 1.14091| constrast_loss: 4.49901| div_loss: 0.64648| %_mask_idx: 0.38941| ppl: 226.25494| %_neg_is_pos: 0.00425| lr: 0.0| temp: 1.96943 | loss: 1.14283| constrast_loss: 4.5076| div_loss: 0.63699| %_mask_idx: 0.32566| ppl: 232.32343| %_neg_is_pos: 0.00406| lr: 0.0| temp: 1.96943 | loss: 1.12796| constrast_loss: 4.44645| div_loss: 0.65368| %_mask_idx: 0.36905| ppl: 221.64235| %_neg_is_pos: 0.00339| lr: 0.0| temp: 1.96942 | loss: 1.12419| constrast_loss: 4.42974| div_loss: 0.67012| %_mask_idx: 0.38033| ppl: 211.12263| %_neg_is_pos: 0.00691| lr: 0.0| temp: 1.96942 | loss: 1.14746| constrast_loss: 4.52643| div_loss: 0.63411| %_mask_idx: 0.40695| ppl: 234.16919| %_neg_is_pos: 0.0015| lr: 0.0| temp: 1.9694 | loss: 1.12822| constrast_loss: 4.44587| div_loss: 0.67011| %_mask_idx: 0.39489| ppl: 211.12738| %_neg_is_pos: 0.00688| lr: 0.0| temp: 1.9694 | loss: 1.13446| constrast_loss: 4.47228| div_loss: 0.65558| %_mask_idx: 0.40241| ppl: 220.43181| %_neg_is_pos: 0.0041| lr: 0.0| temp: 1.96939 | loss: 1.13407| constrast_loss: 4.47147| div_loss: 0.64808| %_mask_idx: 0.47071| ppl: 225.22586| %_neg_is_pos: 0.00294| lr: 0.0| temp: 1.96939 | loss: 1.13559| constrast_loss: 4.47699| div_loss: 0.65365| %_mask_idx: 0.36623| ppl: 221.66174| %_neg_is_pos: 0.00359| lr: 0.0| temp: 1.96938 | loss: 1.13458| constrast_loss: 4.47249| div_loss: 0.65827| %_mask_idx: 0.37751| ppl: 218.7059| %_neg_is_pos: 0.0039| lr: 0.0| temp: 1.96938 | loss: 1.11997| constrast_loss: 4.41317| div_loss: 0.66716| %_mask_idx: 0.37453| ppl: 213.01485| %_neg_is_pos: 0.00498| lr: 0.0| temp: 1.96937 | loss: 1.13345| constrast_loss: 4.46755| div_loss: 0.66266| %_mask_idx: 0.39004| ppl: 215.89955| %_neg_is_pos: 0.00561| lr: 0.0| temp: 1.96937 | loss: 1.12395| constrast_loss: 4.42905| div_loss: 0.66761| %_mask_idx: 0.39646| ppl: 212.72784| %_neg_is_pos: 0.00418| lr: 0.0| temp: 1.96935 | loss: 1.12539| constrast_loss: 4.43561| div_loss: 0.65954| %_mask_idx: 0.39317| ppl: 217.8938| %_neg_is_pos: 0.00405| lr: 0.0| temp: 1.96935 | loss: 1.13389| constrast_loss: 4.47091| div_loss: 0.64668| %_mask_idx: 0.39051| ppl: 226.12762| %_neg_is_pos: 0.00309| lr: 0.0| temp: 1.96934 | loss: 1.12579| constrast_loss: 4.43469| div_loss: 0.6848| %_mask_idx: 0.3808| ppl: 201.72707| %_neg_is_pos: 0.00314| lr: 0.0| temp: 1.96934 | loss: 1.12529| constrast_loss: 4.4349| div_loss: 0.66278| %_mask_idx: 0.34242| ppl: 215.82088| %_neg_is_pos: 0.00503| lr: 0.0| temp: 1.96933 | loss: 1.13516| constrast_loss: 4.47682| div_loss: 0.63818| %_mask_idx: 0.401| ppl: 231.56485| %_neg_is_pos: 0.00299| lr: 0.0| temp: 1.96933 | loss: 1.1398| constrast_loss: 4.49596| div_loss: 0.63256| %_mask_idx: 0.40633| ppl: 235.16376| %_neg_is_pos: 0.00383| lr: 0.0| temp: 1.96932 | loss: 1.13135| constrast_loss: 4.46044| div_loss: 0.64954| %_mask_idx: 0.42513| ppl: 224.29143| %_neg_is_pos: 0.00259| lr: 0.0| temp: 1.96932 | loss: 1.13402| constrast_loss: 4.47128| div_loss: 0.64796| %_mask_idx: 0.375| ppl: 225.30769| %_neg_is_pos: 0.00294| lr: 0.0| temp: 1.9693| loss: 1.13498| constrast_loss: 4.474| div_loss: 0.65933| %_mask_idx: 0.45113| ppl: 218.02667| %_neg_is_pos: 0.00397| lr: 0.0| temp: 1.9693 | loss: 1.12896| constrast_loss: 4.4504| div_loss: 0.65457| %_mask_idx: 0.35385| ppl: 221.07281| %_neg_is_pos: 0.00478| lr: 0.0| temp: 1.96929 | loss: 1.14236| constrast_loss: 4.50475| div_loss: 0.64687| %_mask_idx: 0.39803| ppl: 226.00009| %_neg_is_pos: 0.00223| lr: 0.0| temp: 1.96929 | loss: 1.13329| constrast_loss: 4.46851| div_loss: 0.64649| %_mask_idx: 0.37437| ppl: 226.24612| %_neg_is_pos: 0.00279| lr: 0.0| temp: 1.96927 | loss: 1.1343| constrast_loss: 4.47086| div_loss: 0.66346| %_mask_idx: 0.37719| ppl: 215.38422| %_neg_is_pos: 0.0028| lr: 0.0| temp: 1.96927 | loss: 1.1368| constrast_loss: 4.48295| div_loss: 0.64241| %_mask_idx: 0.40226| ppl: 228.8577| %_neg_is_pos: 0.00361| lr: 0.0| temp: 1.96926 | loss: 1.12748| constrast_loss: 4.44434| div_loss: 0.65578| %_mask_idx: 0.37375| ppl: 220.29941| %_neg_is_pos: 0.00292| lr: 0.0| temp: 1.96926 | loss: 1.14098| constrast_loss: 4.49945| div_loss: 0.64466| %_mask_idx: 0.35119| ppl: 227.41481| %_neg_is_pos: 0.00291| lr: 0.0| temp: 1.96925 | loss: 1.13117| constrast_loss: 4.45944| div_loss: 0.65225| %_mask_idx: 0.41056| ppl: 222.55888| %_neg_is_pos: 0.00507| lr: 0.0| temp: 1.96925 | loss: 1.12778| constrast_loss: 4.4454| div_loss: 0.65709| %_mask_idx: 0.40241| ppl: 219.46198| %_neg_is_pos: 0.00242| lr: 0.0| temp: 1.96924 | loss: 1.12993| constrast_loss: 4.45441| div_loss: 0.65315| %_mask_idx: 0.34994| ppl: 221.98273| %_neg_is_pos: 0.00465| lr: 0.0| temp: 1.96924 | loss: 1.14641| constrast_loss: 4.52188| div_loss: 0.63784| %_mask_idx: 0.4057| ppl: 231.78046| %_neg_is_pos: 0.00169| lr: 0.0| temp: 1.96922 | loss: 1.13091| constrast_loss: 4.45761| div_loss: 0.66031| %_mask_idx: 0.42669| ppl: 217.40176| %_neg_is_pos: 0.00253| lr: 0.0| temp: 1.96922 | loss: 1.14175| constrast_loss: 4.50296| div_loss: 0.64043| %_mask_idx: 0.33521| ppl: 230.12416| %_neg_is_pos: 0.00241| lr: 0.0| temp: 1.96921 | loss: 1.13691| constrast_loss: 4.48224| div_loss: 0.65404| %_mask_idx: 0.40163| ppl: 221.414| %_neg_is_pos: 0.00235| lr: 0.0| temp: 1.96921 | loss: 1.13195| constrast_loss: 4.4629| div_loss: 0.64912| %_mask_idx: 0.32331| ppl: 224.5607| %_neg_is_pos: 0.0028| lr: 0.0| temp: 1.9692 | loss: 1.14199| constrast_loss: 4.50268| div_loss: 0.65262| %_mask_idx: 0.37594| ppl: 222.32498| %_neg_is_pos: 0.00336| lr: 0.0| temp: 1.9692 | loss: 1.13195| constrast_loss: 4.46266| div_loss: 0.65124| %_mask_idx: 0.36263| ppl: 223.20322| %_neg_is_pos: 0.00516| lr: 0.0| temp: 1.96919 | loss: 1.13109| constrast_loss: 4.45651| div_loss: 0.67836| %_mask_idx: 0.36638| ppl: 205.85104| %_neg_is_pos: 0.00484| lr: 0.0| temp: 1.96919 | loss: 1.13282| constrast_loss: 4.46754| div_loss: 0.63752| %_mask_idx: 0.40461| ppl: 231.98596| %_neg_is_pos: 0.0029| lr: 0.0| temp: 1.96917 | loss: 1.14074| constrast_loss: 4.49843| div_loss: 0.64523| %_mask_idx: 0.38737| ppl: 227.05562| %_neg_is_pos: 0.0014| lr: 0.0| temp: 1.96917 | loss: 1.13966| constrast_loss: 4.49601| div_loss: 0.62637| %_mask_idx: 0.35526| ppl: 239.1256| %_neg_is_pos: 0.00102| lr: 0.0| temp: 1.96916 | loss: 1.13162| constrast_loss: 4.46162| div_loss: 0.64846| %_mask_idx: 0.41432| ppl: 224.98294| %_neg_is_pos: 0.00179| lr: 0.0| temp: 1.96916 | loss: 1.13822| constrast_loss: 4.48703| div_loss: 0.65841| %_mask_idx: 0.39897| ppl: 218.6183| %_neg_is_pos: 0.00222| lr: 0.0| temp: 1.96915 | loss: 1.12455| constrast_loss: 4.43149| div_loss: 0.66718| %_mask_idx: 0.3338| ppl: 213.00168| %_neg_is_pos: 0.00324| lr: 0.0| temp: 1.96915 | loss: 1.13912| constrast_loss: 4.49123| div_loss: 0.65254| %_mask_idx: 0.40586| ppl: 222.37674| %_neg_is_pos: 0.00473| lr: 0.0| temp: 1.96914 | loss: 1.13666| constrast_loss: 4.4812| div_loss: 0.65455| %_mask_idx: 0.42747| ppl: 221.08778| %_neg_is_pos: 0.0027| lr: 0.0| temp: 1.96914 | loss: 1.12978| constrast_loss: 4.45245| div_loss: 0.66685| %_mask_idx: 0.38643| ppl: 213.2189| %_neg_is_pos: 0.00387| lr: 0.0| temp: 1.96912 | loss: 1.13415| constrast_loss: 4.47205| div_loss: 0.64539| %_mask_idx: 0.40147| ppl: 226.94965| %_neg_is_pos: 0.00245| lr: 0.0| temp: 1.96912 | loss: 1.13582| constrast_loss: 4.47846| div_loss: 0.64827| %_mask_idx: 0.40132| ppl: 225.11041| %_neg_is_pos: 0.00288| lr: 0.0| temp: 1.96911 | loss: 1.12696| constrast_loss: 4.44262| div_loss: 0.65205| %_mask_idx: 0.38064| ppl: 222.68805| %_neg_is_pos: 0.00246| lr: 0.0| temp: 1.96911 | loss: 1.12995| constrast_loss: 4.45446| div_loss: 0.65362| %_mask_idx: 0.43045| ppl: 221.6806| %_neg_is_pos: 0.00314| lr: 0.0| temp: 1.96909 | loss: 1.1365| constrast_loss: 4.48061| div_loss: 0.65387| %_mask_idx: 0.40758| ppl: 221.52531| %_neg_is_pos: 0.00213| lr: 0.0| temp: 1.96909 | loss: 1.14453| constrast_loss: 4.51352| div_loss: 0.64606| %_mask_idx: 0.36905| ppl: 226.52444| %_neg_is_pos: 0.00213| lr: 0.0| temp: 1.96908 | loss: 1.13598| constrast_loss: 4.47814| div_loss: 0.65792| %_mask_idx: 0.39442| ppl: 218.9292| %_neg_is_pos: 0.00317| lr: 0.0| temp: 1.96908 | loss: 1.13938| constrast_loss: 4.49278| div_loss: 0.64722| %_mask_idx: 0.42011| ppl: 225.78027| %_neg_is_pos: 0.00169| lr: 0.0| temp: 1.96907 | loss: 1.13465| constrast_loss: 4.47449| div_loss: 0.6412| %_mask_idx: 0.37782| ppl: 229.63258| %_neg_is_pos: 0.00287| lr: 0.0| temp: 1.96907 | loss: 1.1238| constrast_loss: 4.42966| div_loss: 0.65538| %_mask_idx: 0.40194| ppl: 220.55779| %_neg_is_pos: 0.0022| lr: 0.0| temp: 1.96906 | loss: 1.12991| constrast_loss: 4.45342| div_loss: 0.66237| %_mask_idx: 0.37563| ppl: 216.08124| %_neg_is_pos: 0.00466| lr: 0.0| temp: 1.96906 | loss: 1.13231| constrast_loss: 4.46485| div_loss: 0.64383| %_mask_idx: 0.38283| ppl: 227.94791| %_neg_is_pos: 0.00199| lr: 0.0| temp: 1.96904 | loss: 1.13087| constrast_loss: 4.45704| div_loss: 0.66441| %_mask_idx: 0.37829| ppl: 214.78079| %_neg_is_pos: 0.00211| lr: 0.0| temp: 1.96904 | loss: 1.12496| constrast_loss: 4.43435| div_loss: 0.65488| %_mask_idx: 0.43233| ppl: 220.8756| %_neg_is_pos: 0.00194| lr: 0.0| temp: 1.96903 | loss: 1.12046| constrast_loss: 4.41503| div_loss: 0.66796| %_mask_idx: 0.35103| ppl: 212.5031| %_neg_is_pos: 0.00365| lr: 0.0| temp: 1.96903 | loss: 1.12649| constrast_loss: 4.43944| div_loss: 0.66532| %_mask_idx: 0.39568| ppl: 214.19629| %_neg_is_pos: 0.00318| lr: 0.0| temp: 1.96902 | loss: 1.13154| constrast_loss: 4.46023| div_loss: 0.65936| %_mask_idx: 0.34367| ppl: 218.00951| %_neg_is_pos: 0.00354| lr: 0.0| temp: 1.96902 | loss: 1.13781| constrast_loss: 4.486| div_loss: 0.65233| %_mask_idx: 0.41479| ppl: 222.50681| %_neg_is_pos: 0.00506| lr: 0.0| temp: 1.96901 | loss: 1.13341| constrast_loss: 4.46745| div_loss: 0.6618| %_mask_idx: 0.36028| ppl: 216.45108| %_neg_is_pos: 0.00289| lr: 0.0| temp: 1.96901 | loss: 1.13197| constrast_loss: 4.46268| div_loss: 0.65209| %_mask_idx: 0.36482| ppl: 222.66487| %_neg_is_pos: 0.00366| lr: 0.0| temp: 1.96899 | loss: 1.12968| constrast_loss: 4.45425| div_loss: 0.64462| %_mask_idx: 0.34242| ppl: 227.44069| %_neg_is_pos: 0.00295| lr: 0.0| temp: 1.96899 | loss: 1.13622| constrast_loss: 4.48005| div_loss: 0.64821| %_mask_idx: 0.42904| ppl: 225.14383| %_neg_is_pos: 0.00168| lr: 0.0| temp: 1.96898 | loss: 1.1432| constrast_loss: 4.50832| div_loss: 0.64494| %_mask_idx: 0.44925| ppl: 227.23688| %_neg_is_pos: 0.00215| lr: 0.0| temp: 1.96898 | loss: 1.14083| constrast_loss: 4.49892| div_loss: 0.64404| %_mask_idx: 0.36544| ppl: 227.8125| %_neg_is_pos: 0.00447| lr: 0.0| temp: 1.96897 | loss: 1.13331| constrast_loss: 4.46771| div_loss: 0.65548| %_mask_idx: 0.40695| ppl: 220.48996| %_neg_is_pos: 0.00272| lr: 0.0| temp: 1.96897 | loss: 1.13209| constrast_loss: 4.464| div_loss: 0.64368| %_mask_idx: 0.36153| ppl: 228.0461| %_neg_is_pos: 0.00334| lr: 0.0| temp: 1.96896 | loss: 1.13041| constrast_loss: 4.45583| div_loss: 0.65795| %_mask_idx: 0.43202| ppl: 218.90923| %_neg_is_pos: 0.00286| lr: 0.0| temp: 1.96896 | loss: 1.13251| constrast_loss: 4.46567| div_loss: 0.64354| %_mask_idx: 0.41573| ppl: 228.13461| %_neg_is_pos: 0.00181| lr: 0.0| temp: 1.96894 | loss: 1.13164| constrast_loss: 4.45987| div_loss: 0.66695| %_mask_idx: 0.40993| ppl: 213.15436| %_neg_is_pos: 0.00261| lr: 0.0| temp: 1.96894 | loss: 1.13085| constrast_loss: 4.45748| div_loss: 0.65916| %_mask_idx: 0.40053| ppl: 218.13632| %_neg_is_pos: 0.00229| lr: 0.0| temp: 1.96893 | loss: 1.13308| constrast_loss: 4.46633| div_loss: 0.65995| %_mask_idx: 0.38737| ppl: 217.63139| %_neg_is_pos: 0.00315| lr: 0.0| temp: 1.96893 | loss: 1.12573| constrast_loss: 4.43788| div_loss: 0.65046| %_mask_idx: 0.37829| ppl: 223.70691| %_neg_is_pos: 0.00242| lr: 0.0| temp: 1.96891 | loss: 1.1317| constrast_loss: 4.46102| div_loss: 0.658| %_mask_idx: 0.36451| ppl: 218.87991| %_neg_is_pos: 0.00285| lr: 0.0| temp: 1.96891 | loss: 1.13278| constrast_loss: 4.46408| div_loss: 0.67047| %_mask_idx: 0.31845| ppl: 210.90027| %_neg_is_pos: 0.00549| lr: 0.0| temp: 1.9689 | loss: 1.13347| constrast_loss: 4.46814| div_loss: 0.65739| %_mask_idx: 0.37907| ppl: 219.2724| %_neg_is_pos: 0.00339| lr: 0.0| temp: 1.9689 | loss: 1.13723| constrast_loss: 4.48304| div_loss: 0.65889| %_mask_idx: 0.4234| ppl: 218.30959| %_neg_is_pos: 0.00229| lr: 0.0| temp: 1.96889 | loss: 1.13415| constrast_loss: 4.47165| div_loss: 0.64963| %_mask_idx: 0.36497| ppl: 224.2364| %_neg_is_pos: 0.00494| lr: 0.0| temp: 1.96889 | loss: 1.11951| constrast_loss: 4.41101| div_loss: 0.67029| %_mask_idx: 0.37093| ppl: 211.01515| %_neg_is_pos: 0.00358| lr: 0.0| temp: 1.96888 | loss: 1.13385| constrast_loss: 4.46992| div_loss: 0.65475| %_mask_idx: 0.37234| ppl: 220.96136| %_neg_is_pos: 0.00242| lr: 0.0| temp: 1.96888 | loss: 1.12961| constrast_loss: 4.45231| div_loss: 0.6611| %_mask_idx: 0.36952| ppl: 216.89757| %_neg_is_pos: 0.00281| lr: 0.0| temp: 1.96886 | loss: 1.14931| constrast_loss: 4.53367| div_loss: 0.63571| %_mask_idx: 0.42419| ppl: 233.14532| %_neg_is_pos: 0.00127| lr: 0.0| temp: 1.96886 | loss: 1.13486| constrast_loss: 4.47532| div_loss: 0.64108| %_mask_idx: 0.42278| ppl: 229.70767| %_neg_is_pos: 0.00195| lr: 0.0| temp: 1.96886 | loss: 1.12358| constrast_loss: 4.4268| div_loss: 0.6753| %_mask_idx: 0.32863| ppl: 207.80728| %_neg_is_pos: 0.00582| lr: 0.0| temp: 1.96886 | loss: 1.11998| constrast_loss: 4.41241| div_loss: 0.67499| %_mask_idx: 0.35291| ppl: 208.00928| %_neg_is_pos: 0.00465| lr: 0.0| temp: 1.96885 | loss: 1.13239| constrast_loss: 4.46483| div_loss: 0.64748| %_mask_idx: 0.40648| ppl: 225.61452| %_neg_is_pos: 0.00325| lr: 0.0| temp: 1.96885 | loss: 1.11788| constrast_loss: 4.40462| div_loss: 0.6688| %_mask_idx: 0.3407| ppl: 211.97107| %_neg_is_pos: 0.00366| lr: 0.0| temp: 1.96884 | loss: 1.13263| constrast_loss: 4.46569| div_loss: 0.64832| %_mask_idx: 0.40022| ppl: 225.07401| %_neg_is_pos: 0.00247| lr: 0.0| temp: 1.96884 | loss: 1.14227| constrast_loss: 4.50477| div_loss: 0.6433| %_mask_idx: 0.38424| ppl: 228.28845| %_neg_is_pos: 0.00299| lr: 0.0| temp: 1.96882 | loss: 1.12546| constrast_loss: 4.43523| div_loss: 0.66604| %_mask_idx: 0.37876| ppl: 213.7326| %_neg_is_pos: 0.00294| lr: 0.0| temp: 1.96882 | loss: 1.13794| constrast_loss: 4.48743| div_loss: 0.64343| %_mask_idx: 0.38675| ppl: 228.20522| %_neg_is_pos: 0.00261| lr: 0.0| temp: 1.96881 | loss: 1.12736| constrast_loss: 4.44439| div_loss: 0.65067| %_mask_idx: 0.3808| ppl: 223.56909| %_neg_is_pos: 0.00306| lr: 0.0| temp: 1.96881 | loss: 1.1302| constrast_loss: 4.45488| div_loss: 0.65907| %_mask_idx: 0.3974| ppl: 218.19672| %_neg_is_pos: 0.00222| lr: 0.0| temp: 1.9688 | loss: 1.13105| constrast_loss: 4.45831| div_loss: 0.65899| %_mask_idx: 0.36967| ppl: 218.24898| %_neg_is_pos: 0.00304| lr: 0.0| temp: 1.9688 | loss: 1.13694| constrast_loss: 4.48382| div_loss: 0.63953| %_mask_idx: 0.36357| ppl: 230.70004| %_neg_is_pos: 0.00178| lr: 0.0| temp: 1.96879 | loss: 1.12646| constrast_loss: 4.44026| div_loss: 0.65599| %_mask_idx: 0.34915| ppl: 220.16418| %_neg_is_pos: 0.00297| lr: 0.0| temp: 1.96879 | loss: 1.1233| constrast_loss: 4.42681| div_loss: 0.66402| %_mask_idx: 0.37328| ppl: 215.02826| %_neg_is_pos: 0.00314| lr: 0.0| temp: 1.96877 | loss: 1.13208| constrast_loss: 4.46347| div_loss: 0.64842| %_mask_idx: 0.41651| ppl: 225.01236| %_neg_is_pos: 0.00229| lr: 0.0| temp: 1.96877 | loss: 1.13398| constrast_loss: 4.47011| div_loss: 0.65809| %_mask_idx: 0.3692| ppl: 218.823| %_neg_is_pos: 0.00287| lr: 0.0| temp: 1.96876 | loss: 1.13708| constrast_loss: 4.48446| div_loss: 0.63869| %_mask_idx: 0.38283| ppl: 231.24146| %_neg_is_pos: 0.00239| lr: 0.0| temp: 1.96876 [2021-09-02 04:02:02,453] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 04:02:02,453] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.12338| constrast_loss: 4.42649| div_loss: 0.67026| %_mask_idx: 0.36513| ppl: 211.03152| %_neg_is_pos: 0.00194| lr: 0.0| temp: 1.96874 | loss: 1.13363| constrast_loss: 4.46896| div_loss: 0.65582| %_mask_idx: 0.38628| ppl: 220.27304| %_neg_is_pos: 0.00507| lr: 0.0| temp: 1.96874 | loss: 1.13121| constrast_loss: 4.46004| div_loss: 0.64817| %_mask_idx: 0.41573| ppl: 225.16943| %_neg_is_pos: 0.00184| lr: 0.0| temp: 1.96873 | loss: 1.1353| constrast_loss: 4.47622| div_loss: 0.6499| %_mask_idx: 0.35636| ppl: 224.06245| %_neg_is_pos: 0.00202| lr: 0.0| temp: 1.96873 | loss: 1.12945| constrast_loss: 4.45095| div_loss: 0.66853| %_mask_idx: 0.34273| ppl: 212.14383| %_neg_is_pos: 0.00305| lr: 0.0| temp: 1.96872 | loss: 1.13685| constrast_loss: 4.48222| div_loss: 0.65165| %_mask_idx: 0.38769| ppl: 222.94177| %_neg_is_pos: 0.00267| lr: 0.0| temp: 1.96872 | loss: 1.11639| constrast_loss: 4.39918| div_loss: 0.66392| %_mask_idx: 0.35981| ppl: 215.08899| %_neg_is_pos: 0.0041| lr: 0.0| temp: 1.96871 | loss: 1.12768| constrast_loss: 4.44495| div_loss: 0.65761| %_mask_idx: 0.37422| ppl: 219.12927| %_neg_is_pos: 0.00409| lr: 0.0| temp: 1.96871 | loss: 1.13603| constrast_loss: 4.47854| div_loss: 0.65591| %_mask_idx: 0.3761| ppl: 220.21741| %_neg_is_pos: 0.00234| lr: 0.0| temp: 1.96869 | loss: 1.142| constrast_loss: 4.50276| div_loss: 0.6523| %_mask_idx: 0.40727| ppl: 222.5282| %_neg_is_pos: 0.00269| lr: 0.0| temp: 1.96869 | loss: 1.12744| constrast_loss: 4.44353| div_loss: 0.66211| %_mask_idx: 0.39583| ppl: 216.24893| %_neg_is_pos: 0.00404| lr: 0.0| temp: 1.96868 | loss: 1.14537| constrast_loss: 4.51683| div_loss: 0.64636| %_mask_idx: 0.42215| ppl: 226.33145| %_neg_is_pos: 0.00342| lr: 0.0| temp: 1.96868 | loss: 1.12843| constrast_loss: 4.44734| div_loss: 0.66398| %_mask_idx: 0.35746| ppl: 215.05341| %_neg_is_pos: 0.00393| lr: 0.0| temp: 1.96867 | loss: 1.11872| constrast_loss: 4.40923| div_loss: 0.6565| %_mask_idx: 0.37719| ppl: 219.83826| %_neg_is_pos: 0.00373| lr: 0.0| temp: 1.96867 | loss: 1.14275| constrast_loss: 4.50786| div_loss: 0.63148| %_mask_idx: 0.43296| ppl: 235.85162| %_neg_is_pos: 0.00159| lr: 0.0| temp: 1.96866 | loss: 1.13024| constrast_loss: 4.45405| div_loss: 0.6691| %_mask_idx: 0.35072| ppl: 211.7757| %_neg_is_pos: 0.00529| lr: 0.0| temp: 1.96866 | loss: 1.14018| constrast_loss: 4.4957| div_loss: 0.65024| %_mask_idx: 0.38565| ppl: 223.84605| %_neg_is_pos: 0.00315| lr: 0.0| temp: 1.96864 | loss: 1.137| constrast_loss: 4.48166| div_loss: 0.66334| %_mask_idx: 0.41714| ppl: 215.46503| %_neg_is_pos: 0.00244| lr: 0.0| temp: 1.96864 | loss: 1.12657| constrast_loss: 4.43886| div_loss: 0.67436| %_mask_idx: 0.40774| ppl: 208.40782| %_neg_is_pos: 0.00283| lr: 0.0| temp: 1.96863 | loss: 1.13357| constrast_loss: 4.46836| div_loss: 0.65935| %_mask_idx: 0.38706| ppl: 218.01651| %_neg_is_pos: 0.00251| lr: 0.0| temp: 1.96863 | loss: 1.13894| constrast_loss: 4.49053| div_loss: 0.65239| %_mask_idx: 0.42794| ppl: 222.4722| %_neg_is_pos: 0.00158| lr: 0.0| temp: 1.96862 | loss: 1.13149| constrast_loss: 4.46035| div_loss: 0.65602| %_mask_idx: 0.401| ppl: 220.14658| %_neg_is_pos: 0.00245| lr: 0.0| temp: 1.96862 | loss: 1.12597| constrast_loss: 4.43757| div_loss: 0.66325| %_mask_idx: 0.38612| ppl: 215.52017| %_neg_is_pos: 0.0029| lr: 0.0| temp: 1.96861 | loss: 1.13922| constrast_loss: 4.49073| div_loss: 0.66131| %_mask_idx: 0.37954| ppl: 216.76056| %_neg_is_pos: 0.00219| lr: 0.0| temp: 1.96861 | loss: 1.13565| constrast_loss: 4.47739| div_loss: 0.65222| %_mask_idx: 0.41009| ppl: 222.58183| %_neg_is_pos: 0.00312| lr: 0.0| temp: 1.96859| loss: 1.13459| constrast_loss: 4.47215| div_loss: 0.66196| %_mask_idx: 0.3067| ppl: 216.34344| %_neg_is_pos: 0.00517| lr: 0.0| temp: 1.96859 | loss: 1.13194| constrast_loss: 4.46127| div_loss: 0.66495| %_mask_idx: 0.35495| ppl: 214.43336| %_neg_is_pos: 0.00381| lr: 0.0| temp: 1.96858 | loss: 1.13744| constrast_loss: 4.48497| div_loss: 0.64804| %_mask_idx: 0.3385| ppl: 225.25539| %_neg_is_pos: 0.00262| lr: 0.0| temp: 1.96858 | loss: 1.12338| constrast_loss: 4.42712| div_loss: 0.66398| %_mask_idx: 0.40445| ppl: 215.05054| %_neg_is_pos: 0.00347| lr: 0.0| temp: 1.96856 | loss: 1.12675| constrast_loss: 4.44143| div_loss: 0.6557| %_mask_idx: 0.40977| ppl: 220.35223| %_neg_is_pos: 0.00563| lr: 0.0| temp: 1.96856 | loss: 1.12359| constrast_loss: 4.42749| div_loss: 0.66858| %_mask_idx: 0.37061| ppl: 212.1089| %_neg_is_pos: 0.00247| lr: 0.0| temp: 1.96855 | loss: 1.12688| constrast_loss: 4.441| div_loss: 0.66527| %_mask_idx: 0.37422| ppl: 214.23003| %_neg_is_pos: 0.00469| lr: 0.0| temp: 1.96855 | loss: 1.12575| constrast_loss: 4.43656| div_loss: 0.66455| %_mask_idx: 0.38315| ppl: 214.68645| %_neg_is_pos: 0.00286| lr: 0.0| temp: 1.96854 | loss: 1.13082| constrast_loss: 4.45639| div_loss: 0.66889| %_mask_idx: 0.38769| ppl: 211.9079| %_neg_is_pos: 0.00346| lr: 0.0| temp: 1.96854 | loss: 1.13293| constrast_loss: 4.4659| div_loss: 0.65815| %_mask_idx: 0.36497| ppl: 218.78613| %_neg_is_pos: 0.00428| lr: 0.0| temp: 1.96853 | loss: 1.13793| constrast_loss: 4.48642| div_loss: 0.65294| %_mask_idx: 0.35589| ppl: 222.11664| %_neg_is_pos: 0.00371| lr: 0.0| temp: 1.96853 | loss: 1.12614| constrast_loss: 4.43661| div_loss: 0.6793| %_mask_idx: 0.41494| ppl: 205.24789| %_neg_is_pos: 0.0045| lr: 0.0| temp: 1.96851 | loss: 1.15587| constrast_loss: 4.55934| div_loss: 0.6414| %_mask_idx: 0.39286| ppl: 229.50372| %_neg_is_pos: 0.00218| lr: 0.0| temp: 1.96851 | loss: 1.13142| constrast_loss: 4.46084| div_loss: 0.64835| %_mask_idx: 0.41463| ppl: 225.05559| %_neg_is_pos: 0.00344| lr: 0.0| temp: 1.9685 | loss: 1.13269| constrast_loss: 4.465| div_loss: 0.65764| %_mask_idx: 0.3396| ppl: 219.11171| %_neg_is_pos: 0.00387| lr: 0.0| temp: 1.9685 | loss: 1.14084| constrast_loss: 4.49779| div_loss: 0.65554| %_mask_idx: 0.42873| ppl: 220.45555| %_neg_is_pos: 0.0028| lr: 0.0| temp: 1.96849 | loss: 1.13848| constrast_loss: 4.48791| div_loss: 0.66006| %_mask_idx: 0.35448| ppl: 217.56033| %_neg_is_pos: 0.00217| lr: 0.0| temp: 1.96849 | loss: 1.13977| constrast_loss: 4.49328| div_loss: 0.65798| %_mask_idx: 0.36247| ppl: 218.89391| %_neg_is_pos: 0.0026| lr: 0.0| temp: 1.96848 | loss: 1.12523| constrast_loss: 4.43284| div_loss: 0.68074| %_mask_idx: 0.41212| ppl: 204.32336| %_neg_is_pos: 0.00451| lr: 0.0| temp: 1.96848 | loss: 1.13815| constrast_loss: 4.48793| div_loss: 0.64683| %_mask_idx: 0.38221| ppl: 226.02762| %_neg_is_pos: 0.00378| lr: 0.0| temp: 1.96846 | loss: 1.11621| constrast_loss: 4.3965| div_loss: 0.68327| %_mask_idx: 0.4162| ppl: 202.70493| %_neg_is_pos: 0.00491| lr: 0.0| temp: 1.96846 | loss: 1.13314| constrast_loss: 4.46604| div_loss: 0.66518| %_mask_idx: 0.44659| ppl: 214.28273| %_neg_is_pos: 0.00244| lr: 0.0| temp: 1.96845 | loss: 1.13351| constrast_loss: 4.46843| div_loss: 0.65609| %_mask_idx: 0.37578| ppl: 220.10165| %_neg_is_pos: 0.0034| lr: 0.0| temp: 1.96845 | loss: 1.13782| constrast_loss: 4.48623| div_loss: 0.65058| %_mask_idx: 0.40727| ppl: 223.62952| %_neg_is_pos: 0.00358| lr: 0.0| temp: 1.96844 | loss: 1.12969| constrast_loss: 4.4531| div_loss: 0.65673| %_mask_idx: 0.42904| ppl: 219.69194| %_neg_is_pos: 0.00211| lr: 0.0| temp: 1.96844 | loss: 1.12805| constrast_loss: 4.44634| div_loss: 0.65854| %_mask_idx: 0.35793| ppl: 218.53717| %_neg_is_pos: 0.00493| lr: 0.0| temp: 1.96843 | loss: 1.13236| constrast_loss: 4.46258| div_loss: 0.66862| %_mask_idx: 0.43985| ppl: 212.08566| %_neg_is_pos: 0.00443| lr: 0.0| temp: 1.96843 | loss: 1.11068| constrast_loss: 4.37458| div_loss: 0.68154| %_mask_idx: 0.36122| ppl: 203.81519| %_neg_is_pos: 0.00408| lr: 0.0| temp: 1.96841 | loss: 1.12779| constrast_loss: 4.44236| div_loss: 0.68801| %_mask_idx: 0.32926| ppl: 199.67392| %_neg_is_pos: 0.00765| lr: 0.0| temp: 1.96841 | loss: 1.12227| constrast_loss: 4.42068| div_loss: 0.68398| %_mask_idx: 0.38753| ppl: 202.25479| %_neg_is_pos: 0.00693| lr: 0.0| temp: 1.9684 | loss: 1.13102| constrast_loss: 4.45751| div_loss: 0.66574| %_mask_idx: 0.40069| ppl: 213.92725| %_neg_is_pos: 0.00299| lr: 0.0| temp: 1.9684 | loss: 1.14482| constrast_loss: 4.51451| div_loss: 0.64789| %_mask_idx: 0.3703| ppl: 225.35269| %_neg_is_pos: 0.00227| lr: 0.0| temp: 1.96838 | loss: 1.13324| constrast_loss: 4.46666| div_loss: 0.66301| %_mask_idx: 0.42591| ppl: 215.6741| %_neg_is_pos: 0.00358| lr: 0.0| temp: 1.96838 | loss: 1.12426| constrast_loss: 4.43051| div_loss: 0.66544| %_mask_idx: 0.37688| ppl: 214.12| %_neg_is_pos: 0.00334| lr: 0.0| temp: 1.96837 | loss: 1.1431| constrast_loss: 4.50714| div_loss: 0.65265| %_mask_idx: 0.36889| ppl: 222.3045| %_neg_is_pos: 0.00183| lr: 0.0| temp: 1.96837 | loss: 1.13139| constrast_loss: 4.45984| div_loss: 0.65709| %_mask_idx: 0.38252| ppl: 219.46439| %_neg_is_pos: 0.00422| lr: 0.0| temp: 1.96836 | loss: 1.13322| constrast_loss: 4.46721| div_loss: 0.65676| %_mask_idx: 0.4339| ppl: 219.67053| %_neg_is_pos: 0.0022| lr: 0.0| temp: 1.96836 | loss: 1.13411| constrast_loss: 4.47018| div_loss: 0.66273| %_mask_idx: 0.37704| ppl: 215.85268| %_neg_is_pos: 0.00482| lr: 0.0| temp: 1.96835 | loss: 1.13289| constrast_loss: 4.46567| div_loss: 0.65906| %_mask_idx: 0.35182| ppl: 218.20146| %_neg_is_pos: 0.0037| lr: 0.0| temp: 1.96835 | loss: 1.13312| constrast_loss: 4.46721| div_loss: 0.65266| %_mask_idx: 0.40774| ppl: 222.29469| %_neg_is_pos: 0.00281| lr: 0.0| temp: 1.96833 | loss: 1.12892| constrast_loss: 4.44996| div_loss: 0.65734| %_mask_idx: 0.43249| ppl: 219.30463| %_neg_is_pos: 0.00247| lr: 0.0| temp: 1.96833 | loss: 1.14162| constrast_loss: 4.50218| div_loss: 0.64317| %_mask_idx: 0.39004| ppl: 228.3708| %_neg_is_pos: 0.00461| lr: 0.0| temp: 1.96832 | loss: 1.12538| constrast_loss: 4.43462| div_loss: 0.669| %_mask_idx: 0.36497| ppl: 211.84076| %_neg_is_pos: 0.00487| lr: 0.0| temp: 1.96832 | loss: 1.11946| constrast_loss: 4.41052| div_loss: 0.67321| %_mask_idx: 0.35072| ppl: 209.1432| %_neg_is_pos: 0.00303| lr: 0.0| temp: 1.96831 | loss: 1.13248| constrast_loss: 4.46373| div_loss: 0.66176| %_mask_idx: 0.3891| ppl: 216.47256| %_neg_is_pos: 0.00409| lr: 0.0| temp: 1.96831 | loss: 1.13198| constrast_loss: 4.46075| div_loss: 0.67155| %_mask_idx: 0.38988| ppl: 210.20868| %_neg_is_pos: 0.00296| lr: 0.0| temp: 1.9683 | loss: 1.1452| constrast_loss: 4.51745| div_loss: 0.63333| %_mask_idx: 0.38456| ppl: 234.66989| %_neg_is_pos: 0.00362| lr: 0.0| temp: 1.9683 | loss: 1.1325| constrast_loss: 4.46542| div_loss: 0.64583| %_mask_idx: 0.38142| ppl: 226.66663| %_neg_is_pos: 0.00217| lr: 0.0| temp: 1.96828 | loss: 1.13032| constrast_loss: 4.45378| div_loss: 0.67491| %_mask_idx: 0.39646| ppl: 208.05817| %_neg_is_pos: 0.00401| lr: 0.0| temp: 1.96828 | loss: 1.1326| constrast_loss: 4.46494| div_loss: 0.65441| %_mask_idx: 0.36764| ppl: 221.17616| %_neg_is_pos: 0.00427| lr: 0.0| temp: 1.96827 | loss: 1.11877| constrast_loss: 4.40832| div_loss: 0.66741| %_mask_idx: 0.39803| ppl: 212.85583| %_neg_is_pos: 0.00537| lr: 0.0| temp: 1.96827 | loss: 1.13541| constrast_loss: 4.4753| div_loss: 0.66329| %_mask_idx: 0.38925| ppl: 215.49219| %_neg_is_pos: 0.0042| lr: 0.0| temp: 1.96826 | loss: 1.13322| constrast_loss: 4.46711| div_loss: 0.65762| %_mask_idx: 0.40335| ppl: 219.12234| %_neg_is_pos: 0.00324| lr: 0.0| temp: 1.96826 | loss: 1.12025| constrast_loss: 4.4122| div_loss: 0.68821| %_mask_idx: 0.3396| ppl: 199.54678| %_neg_is_pos: 0.00498| lr: 0.0| temp: 1.96825 | loss: 1.12437| constrast_loss: 4.42928| div_loss: 0.68188| %_mask_idx: 0.33286| ppl: 203.59874| %_neg_is_pos: 0.00401| lr: 0.0| temp: 1.96825 | loss: 1.12956| constrast_loss: 4.45196| div_loss: 0.66295| %_mask_idx: 0.43703| ppl: 215.7146| %_neg_is_pos: 0.00276| lr: 0.0| temp: 1.96823 | loss: 1.13432| constrast_loss: 4.47167| div_loss: 0.65605| %_mask_idx: 0.38315| ppl: 220.12698| %_neg_is_pos: 0.00524| lr: 0.0| temp: 1.96823 | loss: 1.12616| constrast_loss: 4.43852| div_loss: 0.66106| %_mask_idx: 0.39239| ppl: 216.91925| %_neg_is_pos: 0.00265| lr: 0.0| temp: 1.96823 | loss: 1.12976| constrast_loss: 4.45226| div_loss: 0.66762| %_mask_idx: 0.37923| ppl: 212.72482| %_neg_is_pos: 0.00376| lr: 0.0| temp: 1.96823 | loss: 1.12762| constrast_loss: 4.44336| div_loss: 0.67135| %_mask_idx: 0.35996| ppl: 210.33876| %_neg_is_pos: 0.00431| lr: 0.0| temp: 1.96821 | loss: 1.12885| constrast_loss: 4.45006| div_loss: 0.6533| %_mask_idx: 0.38221| ppl: 221.89056| %_neg_is_pos: 0.00231| lr: 0.0| temp: 1.96821 | loss: 1.1272| constrast_loss: 4.44287| div_loss: 0.65951| %_mask_idx: 0.375| ppl: 217.91081| %_neg_is_pos: 0.00648| lr: 0.0| temp: 1.9682 | loss: 1.14601| constrast_loss: 4.5199| div_loss: 0.64132| %_mask_idx: 0.44298| ppl: 229.5574| %_neg_is_pos: 0.00231| lr: 0.0| temp: 1.9682 | loss: 1.13093| constrast_loss: 4.45776| div_loss: 0.6595| %_mask_idx: 0.40038| ppl: 217.91772| %_neg_is_pos: 0.00353| lr: 0.0| temp: 1.96819 | loss: 1.13637| constrast_loss: 4.48159| div_loss: 0.63874| %_mask_idx: 0.44001| ppl: 231.20639| %_neg_is_pos: 0.00184| lr: 0.0| temp: 1.96819 | loss: 1.12059| constrast_loss: 4.41589| div_loss: 0.66481| %_mask_idx: 0.40946| ppl: 214.52298| %_neg_is_pos: 0.00299| lr: 0.0| temp: 1.96818 | loss: 1.12254| constrast_loss: 4.42291| div_loss: 0.67237| %_mask_idx: 0.38456| ppl: 209.68584| %_neg_is_pos: 0.0032| lr: 0.0| temp: 1.96818 | loss: 1.13249| constrast_loss: 4.46347| div_loss: 0.66479| %_mask_idx: 0.33709| ppl: 214.53415| %_neg_is_pos: 0.00479| lr: 0.0| temp: 1.96816 | loss: 1.12571| constrast_loss: 4.43535| div_loss: 0.67501| %_mask_idx: 0.35824| ppl: 207.99046| %_neg_is_pos: 0.00403| lr: 0.0| temp: 1.96816 | loss: 1.12821| constrast_loss: 4.44561| div_loss: 0.67246| %_mask_idx: 0.37312| ppl: 209.62285| %_neg_is_pos: 0.00447| lr: 0.0| temp: 1.96815 | loss: 1.13374| constrast_loss: 4.4702| div_loss: 0.64775| %_mask_idx: 0.37218| ppl: 225.44263| %_neg_is_pos: 0.00227| lr: 0.0| temp: 1.96815 | loss: 1.1193| constrast_loss: 4.40838| div_loss: 0.68829| %_mask_idx: 0.37907| ppl: 199.49567| %_neg_is_pos: 0.00422| lr: 0.0| temp: 1.96814 | loss: 1.14314| constrast_loss: 4.50657| div_loss: 0.66006| %_mask_idx: 0.44486| ppl: 217.56363| %_neg_is_pos: 0.00359| lr: 0.0| temp: 1.96814 | loss: 1.12654| constrast_loss: 4.44046| div_loss: 0.65686| %_mask_idx: 0.38487| ppl: 219.61096| %_neg_is_pos: 0.0036| lr: 0.0| temp: 1.96813 | loss: 1.14329| constrast_loss: 4.50841| div_loss: 0.64733| %_mask_idx: 0.39051| ppl: 225.71082| %_neg_is_pos: 0.00212| lr: 0.0| temp: 1.96813 | loss: 1.12892| constrast_loss: 4.44973| div_loss: 0.6594| %_mask_idx: 0.42888| ppl: 217.98697| %_neg_is_pos: 0.00333| lr: 0.0| temp: 1.96811 | loss: 1.13476| constrast_loss: 4.47278| div_loss: 0.66238| %_mask_idx: 0.40445| ppl: 216.07446| %_neg_is_pos: 0.00185| lr: 0.0| temp: 1.96811 | loss: 1.13157| constrast_loss: 4.4596| div_loss: 0.66663| %_mask_idx: 0.38205| ppl: 213.35939| %_neg_is_pos: 0.00491| lr: 0.0| temp: 1.9681 | loss: 1.12487| constrast_loss: 4.43361| div_loss: 0.65885| %_mask_idx: 0.36717| ppl: 218.33424| %_neg_is_pos: 0.00505| lr: 0.0| temp: 1.9681 | loss: 1.13183| constrast_loss: 4.46123| div_loss: 0.66079| %_mask_idx: 0.31798| ppl: 217.09454| %_neg_is_pos: 0.00208| lr: 0.0| temp: 1.96809 | loss: 1.13013| constrast_loss: 4.45419| div_loss: 0.66318| %_mask_idx: 0.41385| ppl: 215.56598| %_neg_is_pos: 0.00237| lr: 0.0| temp: 1.96809 | loss: 1.13536| constrast_loss: 4.47568| div_loss: 0.6576| %_mask_idx: 0.38409| ppl: 219.13379| %_neg_is_pos: 0.00238| lr: 0.0| temp: 1.96808 | loss: 1.13925| constrast_loss: 4.48904| div_loss: 0.67952| %_mask_idx: 0.401| ppl: 205.10608| %_neg_is_pos: 0.00414| lr: 0.0| temp: 1.96808 | loss: 1.12455| constrast_loss: 4.43105| div_loss: 0.67129| %_mask_idx: 0.35009| ppl: 210.3748| %_neg_is_pos: 0.00417| lr: 0.0| temp: 1.96806 | loss: 1.13071| constrast_loss: 4.45623| div_loss: 0.66599| %_mask_idx: 0.34978| ppl: 213.7674| %_neg_is_pos: 0.00515| lr: 0.0| temp: 1.96806 | loss: 1.12178| constrast_loss: 4.42| div_loss: 0.67124| %_mask_idx: 0.38315| ppl: 210.40594| %_neg_is_pos: 0.0045| lr: 0.0| temp: 1.96805 | loss: 1.13924| constrast_loss: 4.49193| div_loss: 0.65046| %_mask_idx: 0.36858| ppl: 223.70496| %_neg_is_pos: 0.00281| lr: 0.0| temp: 1.96805 [2021-09-02 04:11:15,546] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 04:11:15,546] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.125| constrast_loss: 4.43442| div_loss: 0.65568| %_mask_idx: 0.4104| ppl: 220.3676| %_neg_is_pos: 0.00231| lr: 0.0| temp: 1.96803 | loss: 1.13469| constrast_loss: 4.47482| div_loss: 0.63925| %_mask_idx: 0.37782| ppl: 230.8782| %_neg_is_pos: 0.0028| lr: 0.0| temp: 1.96803 | loss: 1.13991| constrast_loss: 4.49405| div_loss: 0.65605| %_mask_idx: 0.38471| ppl: 220.12784| %_neg_is_pos: 0.00272| lr: 0.0| temp: 1.96802 | loss: 1.14445| constrast_loss: 4.51329| div_loss: 0.64514| %_mask_idx: 0.3338| ppl: 227.11319| %_neg_is_pos: 0.00181| lr: 0.0| temp: 1.96802 | loss: 1.12805| constrast_loss: 4.44516| div_loss: 0.67061| %_mask_idx: 0.37531| ppl: 210.81239| %_neg_is_pos: 0.00474| lr: 0.0| temp: 1.96801 | loss: 1.12823| constrast_loss: 4.44477| div_loss: 0.68148| %_mask_idx: 0.4104| ppl: 203.85229| %_neg_is_pos: 0.0036| lr: 0.0| temp: 1.96801 | loss: 1.1289| constrast_loss: 4.45006| div_loss: 0.65547| %_mask_idx: 0.41369| ppl: 220.5023| %_neg_is_pos: 0.00293| lr: 0.0| temp: 1.968 | loss: 1.1406| constrast_loss: 4.49507| div_loss: 0.67345| %_mask_idx: 0.4151| ppl: 208.99216| %_neg_is_pos: 0.00573| lr: 0.0| temp: 1.968 | loss: 1.12024| constrast_loss: 4.41352| div_loss: 0.67446| %_mask_idx: 0.32895| ppl: 208.34689| %_neg_is_pos: 0.00745| lr: 0.0| temp: 1.96798 | loss: 1.14156| constrast_loss: 4.50165| div_loss: 0.64602| %_mask_idx: 0.33302| ppl: 226.55005| %_neg_is_pos: 0.00792| lr: 0.0| temp: 1.96798 | loss: 1.12901| constrast_loss: 4.44996| div_loss: 0.66081| %_mask_idx: 0.3432| ppl: 217.08392| %_neg_is_pos: 0.00382| lr: 0.0| temp: 1.96797 | loss: 1.1412| constrast_loss: 4.4999| div_loss: 0.64892| %_mask_idx: 0.41526| ppl: 224.6908| %_neg_is_pos: 0.00325| lr: 0.0| temp: 1.96797 | loss: 1.13256| constrast_loss: 4.46448| div_loss: 0.6578| %_mask_idx: 0.41024| ppl: 219.00778| %_neg_is_pos: 0.00361| lr: 0.0| temp: 1.96796 | loss: 1.13063| constrast_loss: 4.45503| div_loss: 0.67471| %_mask_idx: 0.35448| ppl: 208.18437| %_neg_is_pos: 0.00469| lr: 0.0| temp: 1.96796 | loss: 1.124| constrast_loss: 4.42975| div_loss: 0.66236| %_mask_idx: 0.35714| ppl: 216.09216| %_neg_is_pos: 0.00391| lr: 0.0| temp: 1.96795 | loss: 1.13453| constrast_loss: 4.47321| div_loss: 0.64896| %_mask_idx: 0.40789| ppl: 224.66571| %_neg_is_pos: 0.0038| lr: 0.0| temp: 1.96795 | loss: 1.12994| constrast_loss: 4.45314| div_loss: 0.66638| %_mask_idx: 0.33239| ppl: 213.51727| %_neg_is_pos: 0.00519| lr: 0.0| temp: 1.96793 | loss: 1.13448| constrast_loss: 4.47167| div_loss: 0.66233| %_mask_idx: 0.37296| ppl: 216.10805| %_neg_is_pos: 0.00388| lr: 0.0| temp: 1.96793 | loss: 1.12669| constrast_loss: 4.43927| div_loss: 0.67471| %_mask_idx: 0.39254| ppl: 208.18689| %_neg_is_pos: 0.00388| lr: 0.0| temp: 1.96792 | loss: 1.14164| constrast_loss: 4.50073| div_loss: 0.65841| %_mask_idx: 0.41902| ppl: 218.61633| %_neg_is_pos: 0.00153| lr: 0.0| temp: 1.96792 | loss: 1.12389| constrast_loss: 4.43003| div_loss: 0.65537| %_mask_idx: 0.35777| ppl: 220.56143| %_neg_is_pos: 0.00326| lr: 0.0| temp: 1.96791 | loss: 1.12219| constrast_loss: 4.4213| div_loss: 0.67442| %_mask_idx: 0.34258| ppl: 208.37209| %_neg_is_pos: 0.00378| lr: 0.0| temp: 1.96791 | loss: 1.13262| constrast_loss: 4.4642| div_loss: 0.66265| %_mask_idx: 0.34759| ppl: 215.9035| %_neg_is_pos: 0.00416| lr: 0.0| temp: 1.9679 | loss: 1.14451| constrast_loss: 4.51098| div_loss: 0.67048| %_mask_idx: 0.40147| ppl: 210.89575| %_neg_is_pos: 0.00294| lr: 0.0| temp: 1.9679 | loss: 1.12968| constrast_loss: 4.45356| div_loss: 0.65167| %_mask_idx: 0.40179| ppl: 222.93375| %_neg_is_pos: 0.00248| lr: 0.0| temp: 1.96788 | loss: 1.13597| constrast_loss: 4.47967| div_loss: 0.64191| %_mask_idx: 0.37954| ppl: 229.17619| %_neg_is_pos: 0.00188| lr: 0.0| temp: 1.96788 | loss: 1.12486| constrast_loss: 4.43302| div_loss: 0.66425| %_mask_idx: 0.38424| ppl: 214.88242| %_neg_is_pos: 0.00209| lr: 0.0| temp: 1.96787 | loss: 1.12463| constrast_loss: 4.43296| div_loss: 0.6555| %_mask_idx: 0.35025| ppl: 220.48306| %_neg_is_pos: 0.00297| lr: 0.0| temp: 1.96787 | loss: 1.13976| constrast_loss: 4.49348| div_loss: 0.65576| %_mask_idx: 0.41432| ppl: 220.31073| %_neg_is_pos: 0.00154| lr: 0.0| temp: 1.96785 | loss: 1.13648| constrast_loss: 4.48013| div_loss: 0.65798| %_mask_idx: 0.36576| ppl: 218.8916| %_neg_is_pos: 0.00259| lr: 0.0| temp: 1.96785 | loss: 1.13468| constrast_loss: 4.47301| div_loss: 0.65713| %_mask_idx: 0.38706| ppl: 219.43625| %_neg_is_pos: 0.00215| lr: 0.0| temp: 1.96784 | loss: 1.12685| constrast_loss: 4.44033| div_loss: 0.67087| %_mask_idx: 0.36717| ppl: 210.64005| %_neg_is_pos: 0.00427| lr: 0.0| temp: 1.96784 | loss: 1.13439| constrast_loss: 4.47085| div_loss: 0.66717| %_mask_idx: 0.40179| ppl: 213.00864| %_neg_is_pos: 0.00333| lr: 0.0| temp: 1.96783 | loss: 1.12972| constrast_loss: 4.45319| div_loss: 0.65684| %_mask_idx: 0.37234| ppl: 219.62079| %_neg_is_pos: 0.00313| lr: 0.0| temp: 1.96783 | loss: 1.12077| constrast_loss: 4.4151| div_loss: 0.67981| %_mask_idx: 0.34555| ppl: 204.91876| %_neg_is_pos: 0.00488| lr: 0.0| temp: 1.96782 | loss: 1.12594| constrast_loss: 4.43637| div_loss: 0.67398| %_mask_idx: 0.38471| ppl: 208.65561| %_neg_is_pos: 0.00533| lr: 0.0| temp: 1.96782 | loss: 1.14086| constrast_loss: 4.49962| div_loss: 0.6382| %_mask_idx: 0.38659| ppl: 231.55145| %_neg_is_pos: 0.0016| lr: 0.0| temp: 1.9678 | loss: 1.14385| constrast_loss: 4.51088| div_loss: 0.64505| %_mask_idx: 0.37892| ppl: 227.16708| %_neg_is_pos: 0.00302| lr: 0.0| temp: 1.9678 | loss: 1.12968| constrast_loss: 4.45207| div_loss: 0.66645| %_mask_idx: 0.4057| ppl: 213.47202| %_neg_is_pos: 0.00343| lr: 0.0| temp: 1.96779 | loss: 1.14212| constrast_loss: 4.50169| div_loss: 0.66786| %_mask_idx: 0.38722| ppl: 212.56891| %_neg_is_pos: 0.00306| lr: 0.0| temp: 1.96779 | loss: 1.12247| constrast_loss: 4.42289| div_loss: 0.66997| %_mask_idx: 0.43155| ppl: 211.22046| %_neg_is_pos: 0.00288| lr: 0.0| temp: 1.96778 | loss: 1.13584| constrast_loss: 4.47736| div_loss: 0.66| %_mask_idx: 0.39019| ppl: 217.59714| %_neg_is_pos: 0.00404| lr: 0.0| temp: 1.96778 | loss: 1.13722| constrast_loss: 4.48331| div_loss: 0.65567| %_mask_idx: 0.38377| ppl: 220.37149| %_neg_is_pos: 0.00318| lr: 0.0| temp: 1.96777 | loss: 1.11839| constrast_loss: 4.40553| div_loss: 0.68008| %_mask_idx: 0.37876| ppl: 204.74948| %_neg_is_pos: 0.00328| lr: 0.0| temp: 1.96777 | loss: 1.12674| constrast_loss: 4.44105| div_loss: 0.65905| %_mask_idx: 0.4198| ppl: 218.20563| %_neg_is_pos: 0.0033| lr: 0.0| temp: 1.96775 | loss: 1.139| constrast_loss: 4.49025| div_loss: 0.65754| %_mask_idx: 0.40179| ppl: 219.17418| %_neg_is_pos: 0.0012| lr: 0.0| temp: 1.96775 | loss: 1.12126| constrast_loss: 4.41727| div_loss: 0.67779| %_mask_idx: 0.39536| ppl: 206.21759| %_neg_is_pos: 0.00258| lr: 0.0| temp: 1.96774 | loss: 1.12559| constrast_loss: 4.43529| div_loss: 0.67081| %_mask_idx: 0.35495| ppl: 210.67844| %_neg_is_pos: 0.00307| lr: 0.0| temp: 1.96774 | loss: 1.1335| constrast_loss: 4.46749| div_loss: 0.66504| %_mask_idx: 0.40022| ppl: 214.37154| %_neg_is_pos: 0.00352| lr: 0.0| temp: 1.96773 | loss: 1.1289| constrast_loss: 4.4485| div_loss: 0.67082| %_mask_idx: 0.41667| ppl: 210.67667| %_neg_is_pos: 0.00376| lr: 0.0| temp: 1.96773 | loss: 1.12782| constrast_loss: 4.44451| div_loss: 0.6679| %_mask_idx: 0.38221| ppl: 212.54245| %_neg_is_pos: 0.00335| lr: 0.0| temp: 1.96772 | loss: 1.1232| constrast_loss: 4.42607| div_loss: 0.66745| %_mask_idx: 0.38925| ppl: 212.83261| %_neg_is_pos: 0.00347| lr: 0.0| temp: 1.96772 | loss: 1.13383| constrast_loss: 4.4695| div_loss: 0.65821| %_mask_idx: 0.45395| ppl: 218.74341| %_neg_is_pos: 0.00175| lr: 0.0| temp: 1.9677 | loss: 1.1346| constrast_loss: 4.47273| div_loss: 0.65674| %_mask_idx: 0.39458| ppl: 219.68555| %_neg_is_pos: 0.00424| lr: 0.0| temp: 1.9677 | loss: 1.13408| constrast_loss: 4.47033| div_loss: 0.65979| %_mask_idx: 0.39113| ppl: 217.73276| %_neg_is_pos: 0.00212| lr: 0.0| temp: 1.96769 | loss: 1.13526| constrast_loss: 4.47565| div_loss: 0.65377| %_mask_idx: 0.40069| ppl: 221.58623| %_neg_is_pos: 0.00298| lr: 0.0| temp: 1.96769 | loss: 1.13399| constrast_loss: 4.47036| div_loss: 0.65591| %_mask_idx: 0.39129| ppl: 220.21748| %_neg_is_pos: 0.00322| lr: 0.0| temp: 1.96767 | loss: 1.13716| constrast_loss: 4.48247| div_loss: 0.6617| %_mask_idx: 0.37046| ppl: 216.51038| %_neg_is_pos: 0.00546| lr: 0.0| temp: 1.96767 | loss: 1.13224| constrast_loss: 4.46191| div_loss: 0.67058| %_mask_idx: 0.38189| ppl: 210.82651| %_neg_is_pos: 0.00294| lr: 0.0| temp: 1.96766 | loss: 1.11518| constrast_loss: 4.39253| div_loss: 0.68195| %_mask_idx: 0.35934| ppl: 203.54971| %_neg_is_pos: 0.00406| lr: 0.0| temp: 1.96766 | loss: 1.13275| constrast_loss: 4.46506| div_loss: 0.6595| %_mask_idx: 0.42011| ppl: 217.9205| %_neg_is_pos: 0.00172| lr: 0.0| temp: 1.96765 | loss: 1.13938| constrast_loss: 4.49147| div_loss: 0.66038| %_mask_idx: 0.34148| ppl: 217.35947| %_neg_is_pos: 0.00381| lr: 0.0| temp: 1.96765 | loss: 1.1436| constrast_loss: 4.50863| div_loss: 0.65759| %_mask_idx: 0.40445| ppl: 219.14557| %_neg_is_pos: 0.00245| lr: 0.0| temp: 1.96764 | loss: 1.14185| constrast_loss: 4.50316| div_loss: 0.64243| %_mask_idx: 0.39411| ppl: 228.8421| %_neg_is_pos: 0.00282| lr: 0.0| temp: 1.96764 | loss: 1.14058| constrast_loss: 4.49751| div_loss: 0.64816| %_mask_idx: 0.41212| ppl: 225.17616| %_neg_is_pos: 0.00135| lr: 0.0| temp: 1.96762 | loss: 1.12639| constrast_loss: 4.4395| div_loss: 0.66049| %_mask_idx: 0.31281| ppl: 217.28792| %_neg_is_pos: 0.00394| lr: 0.0| temp: 1.96762 | loss: 1.13572| constrast_loss: 4.47761| div_loss: 0.65249| %_mask_idx: 0.37751| ppl: 222.4053| %_neg_is_pos: 0.00248| lr: 0.0| temp: 1.96762 | loss: 1.14309| constrast_loss: 4.50795| div_loss: 0.64421| %_mask_idx: 0.39239| ppl: 227.70572| %_neg_is_pos: 0.00209| lr: 0.0| temp: 1.96762 | loss: 1.12146| constrast_loss: 4.41829| div_loss: 0.67534| %_mask_idx: 0.31767| ppl: 207.77994| %_neg_is_pos: 0.00373| lr: 0.0| temp: 1.96761 | loss: 1.1202| constrast_loss: 4.41282| div_loss: 0.6797| %_mask_idx: 0.36889| ppl: 204.9912| %_neg_is_pos: 0.00426| lr: 0.0| temp: 1.96761 | loss: 1.12502| constrast_loss: 4.43291| div_loss: 0.6718| %_mask_idx: 0.41933| ppl: 210.04663| %_neg_is_pos: 0.00346| lr: 0.0| temp: 1.9676 | loss: 1.12996| constrast_loss: 4.45363| div_loss: 0.66207| %_mask_idx: 0.39223| ppl: 216.27737| %_neg_is_pos: 0.00266| lr: 0.0| temp: 1.9676 | loss: 1.13199| constrast_loss: 4.46256| div_loss: 0.65413| %_mask_idx: 0.39991| ppl: 221.35941| %_neg_is_pos: 0.00331| lr: 0.0| temp: 1.96758 | loss: 1.13542| constrast_loss: 4.47652| div_loss: 0.65139| %_mask_idx: 0.39364| ppl: 223.1087| %_neg_is_pos: 0.00187| lr: 0.0| temp: 1.96758 | loss: 1.1298| constrast_loss: 4.45333| div_loss: 0.65877| %_mask_idx: 0.42387| ppl: 218.38707| %_neg_is_pos: 0.00151| lr: 0.0| temp: 1.96757 | loss: 1.12716| constrast_loss: 4.44275| div_loss: 0.6588| %_mask_idx: 0.34289| ppl: 218.36496| %_neg_is_pos: 0.0021| lr: 0.0| temp: 1.96757 | loss: 1.14028| constrast_loss: 4.49652| div_loss: 0.64614| %_mask_idx: 0.38315| ppl: 226.46902| %_neg_is_pos: 0.00287| lr: 0.0| temp: 1.96756 | loss: 1.13189| constrast_loss: 4.46081| div_loss: 0.6674| %_mask_idx: 0.36513| ppl: 212.86411| %_neg_is_pos: 0.00275| lr: 0.0| temp: 1.96756 | loss: 1.13851| constrast_loss: 4.48919| div_loss: 0.64854| %_mask_idx: 0.39991| ppl: 224.93503| %_neg_is_pos: 0.00167| lr: 0.0| temp: 1.96755 | loss: 1.12911| constrast_loss: 4.44985| div_loss: 0.66571| %_mask_idx: 0.38424| ppl: 213.94777| %_neg_is_pos: 0.00349| lr: 0.0| temp: 1.96755 | loss: 1.12815| constrast_loss: 4.44619| div_loss: 0.66412| %_mask_idx: 0.38362| ppl: 214.96494| %_neg_is_pos: 0.00416| lr: 0.0| temp: 1.96753 | loss: 1.12732| constrast_loss: 4.44287| div_loss: 0.66421| %_mask_idx: 0.39176| ppl: 214.90329| %_neg_is_pos: 0.0054| lr: 0.0| temp: 1.96753 | loss: 1.11811| constrast_loss: 4.40527| div_loss: 0.6716| %_mask_idx: 0.39254| ppl: 210.17673| %_neg_is_pos: 0.00312| lr: 0.0| temp: 1.96752 | loss: 1.13518| constrast_loss: 4.47515| div_loss: 0.65582| %_mask_idx: 0.43703| ppl: 220.27715| %_neg_is_pos: 0.0021| lr: 0.0| temp: 1.96752 | loss: 1.13072| constrast_loss: 4.45617| div_loss: 0.667| %_mask_idx: 0.39458| ppl: 213.12003| %_neg_is_pos: 0.0029| lr: 0.0| temp: 1.9675 | loss: 1.13141| constrast_loss: 4.45891| div_loss: 0.66713| %_mask_idx: 0.39019| ppl: 213.03802| %_neg_is_pos: 0.00299| lr: 0.0| temp: 1.9675 | loss: 1.12945| constrast_loss: 4.45222| div_loss: 0.65585| %_mask_idx: 0.39301| ppl: 220.25839| %_neg_is_pos: 0.00378| lr: 0.0| temp: 1.96749 | loss: 1.13597| constrast_loss: 4.47818| div_loss: 0.65686| %_mask_idx: 0.41118| ppl: 219.61093| %_neg_is_pos: 0.00267| lr: 0.0| temp: 1.96749 | loss: 1.12853| constrast_loss: 4.44772| div_loss: 0.66402| %_mask_idx: 0.42747| ppl: 215.02728| %_neg_is_pos: 0.0044| lr: 0.0| temp: 1.96748 | loss: 1.14161| constrast_loss: 4.50122| div_loss: 0.65206| %_mask_idx: 0.401| ppl: 222.67958| %_neg_is_pos: 0.00162| lr: 0.0| temp: 1.96748 | loss: 1.13201| constrast_loss: 4.46143| div_loss: 0.66593| %_mask_idx: 0.41541| ppl: 213.80536| %_neg_is_pos: 0.00235| lr: 0.0| temp: 1.96747 | loss: 1.12981| constrast_loss: 4.45344| div_loss: 0.65801| %_mask_idx: 0.36419| ppl: 218.87268| %_neg_is_pos: 0.00233| lr: 0.0| temp: 1.96747 | loss: 1.13588| constrast_loss: 4.47874| div_loss: 0.64771| %_mask_idx: 0.42246| ppl: 225.46445| %_neg_is_pos: 0.00184| lr: 0.0| temp: 1.96745 | loss: 1.13623| constrast_loss: 4.47963| div_loss: 0.65278| %_mask_idx: 0.43358| ppl: 222.21965| %_neg_is_pos: 0.00242| lr: 0.0| temp: 1.96745 | loss: 1.11048| constrast_loss: 4.37428| div_loss: 0.67655| %_mask_idx: 0.34336| ppl: 207.01111| %_neg_is_pos: 0.00399| lr: 0.0| temp: 1.96744 | loss: 1.13456| constrast_loss: 4.47368| div_loss: 0.64558| %_mask_idx: 0.4234| ppl: 226.82657| %_neg_is_pos: 0.00195| lr: 0.0| temp: 1.96744 | loss: 1.13223| constrast_loss: 4.46331| div_loss: 0.65619| %_mask_idx: 0.37954| ppl: 220.03809| %_neg_is_pos: 0.00193| lr: 0.0| temp: 1.96743 | loss: 1.12297| constrast_loss: 4.42424| div_loss: 0.67651| %_mask_idx: 0.41682| ppl: 207.03632| %_neg_is_pos: 0.00319| lr: 0.0| temp: 1.96743 | loss: 1.13639| constrast_loss: 4.48103| div_loss: 0.64533| %_mask_idx: 0.36654| ppl: 226.99075| %_neg_is_pos: 0.00384| lr: 0.0| temp: 1.96742 | loss: 1.13376| constrast_loss: 4.46982| div_loss: 0.65218| %_mask_idx: 0.3656| ppl: 222.60544| %_neg_is_pos: 0.00454| lr: 0.0| temp: 1.96742 | loss: 1.1248| constrast_loss: 4.43212| div_loss: 0.67084| %_mask_idx: 0.35589| ppl: 210.66161| %_neg_is_pos: 0.00323| lr: 0.0| temp: 1.9674 | loss: 1.12347| constrast_loss: 4.42521| div_loss: 0.68665| %_mask_idx: 0.36889| ppl: 200.54187| %_neg_is_pos: 0.00513| lr: 0.0| temp: 1.9674 | loss: 1.13041| constrast_loss: 4.45543| div_loss: 0.66206| %_mask_idx: 0.3833| ppl: 216.28271| %_neg_is_pos: 0.0026| lr: 0.0| temp: 1.96739 | loss: 1.12095| constrast_loss: 4.41635| div_loss: 0.67441| %_mask_idx: 0.41792| ppl: 208.37483| %_neg_is_pos: 0.00231| lr: 0.0| temp: 1.96739 | loss: 1.13684| constrast_loss: 4.48277| div_loss: 0.64576| %_mask_idx: 0.39364| ppl: 226.71175| %_neg_is_pos: 0.00317| lr: 0.0| temp: 1.96738 | loss: 1.13005| constrast_loss: 4.45361| div_loss: 0.66601| %_mask_idx: 0.40555| ppl: 213.75078| %_neg_is_pos: 0.00253| lr: 0.0| temp: 1.96738 | loss: 1.12556| constrast_loss: 4.4362| div_loss: 0.66034| %_mask_idx: 0.3927| ppl: 217.38519| %_neg_is_pos: 0.00219| lr: 0.0| temp: 1.96737 | loss: 1.13629| constrast_loss: 4.4786| div_loss: 0.6657| %_mask_idx: 0.40053| ppl: 213.94955| %_neg_is_pos: 0.00308| lr: 0.0| temp: 1.96737 | loss: 1.14363| constrast_loss: 4.51025| div_loss: 0.64283| %_mask_idx: 0.37453| ppl: 228.59062| %_neg_is_pos: 0.00257| lr: 0.0| temp: 1.96735 | loss: 1.12656| constrast_loss: 4.43825| div_loss: 0.67991| %_mask_idx: 0.38941| ppl: 204.85565| %_neg_is_pos: 0.00342| lr: 0.0| temp: 1.96735 | loss: 1.13707| constrast_loss: 4.48234| div_loss: 0.65957| %_mask_idx: 0.40429| ppl: 217.87689| %_neg_is_pos: 0.00359| lr: 0.0| temp: 1.96734 | loss: 1.1359| constrast_loss: 4.47878| div_loss: 0.64821| %_mask_idx: 0.37829| ppl: 225.14563| %_neg_is_pos: 0.00245| lr: 0.0| temp: 1.96734 [2021-09-02 04:20:28,364] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 04:20:28,364] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.12751| constrast_loss: 4.44393| div_loss: 0.66124| %_mask_idx: 0.34633| ppl: 216.80331| %_neg_is_pos: 0.00379| lr: 0.0| temp: 1.96732 | loss: 1.13368| constrast_loss: 4.46896| div_loss: 0.65772| %_mask_idx: 0.36764| ppl: 219.0589| %_neg_is_pos: 0.00362| lr: 0.0| temp: 1.96732 | loss: 1.13575| constrast_loss: 4.47754| div_loss: 0.65437| %_mask_idx: 0.36451| ppl: 221.20212| %_neg_is_pos: 0.00349| lr: 0.0| temp: 1.96731 | loss: 1.12461| constrast_loss: 4.43236| div_loss: 0.66104| %_mask_idx: 0.39646| ppl: 216.93744| %_neg_is_pos: 0.00336| lr: 0.0| temp: 1.96731 | loss: 1.11969| constrast_loss: 4.41252| div_loss: 0.66224| %_mask_idx: 0.35887| ppl: 216.1633| %_neg_is_pos: 0.00408| lr: 0.0| temp: 1.9673 | loss: 1.13152| constrast_loss: 4.45867| div_loss: 0.67398| %_mask_idx: 0.36451| ppl: 208.65016| %_neg_is_pos: 0.00446| lr: 0.0| temp: 1.9673 | loss: 1.12962| constrast_loss: 4.45083| div_loss: 0.67642| %_mask_idx: 0.36936| ppl: 207.09201| %_neg_is_pos: 0.00328| lr: 0.0| temp: 1.96729 | loss: 1.1327| constrast_loss: 4.46447| div_loss: 0.66342| %_mask_idx: 0.38675| ppl: 215.41042| %_neg_is_pos: 0.00287| lr: 0.0| temp: 1.96729 | loss: 1.12994| constrast_loss: 4.45427| div_loss: 0.65491| %_mask_idx: 0.46131| ppl: 220.85709| %_neg_is_pos: 0.00133| lr: 0.0| temp: 1.96727 | loss: 1.1291| constrast_loss: 4.4488| div_loss: 0.67582| %_mask_idx: 0.30608| ppl: 207.47362| %_neg_is_pos: 0.00423| lr: 0.0| temp: 1.96727 | loss: 1.12572| constrast_loss: 4.43566| div_loss: 0.67222| %_mask_idx: 0.38268| ppl: 209.77779| %_neg_is_pos: 0.00246| lr: 0.0| temp: 1.96726 | loss: 1.14047| constrast_loss: 4.49637| div_loss: 0.65499| %_mask_idx: 0.38189| ppl: 220.80486| %_neg_is_pos: 0.00351| lr: 0.0| temp: 1.96726 | loss: 1.12834| constrast_loss: 4.44756| div_loss: 0.65788| %_mask_idx: 0.39004| ppl: 218.95377| %_neg_is_pos: 0.00359| lr: 0.0| temp: 1.96725 | loss: 1.12502| constrast_loss: 4.43334| div_loss: 0.66726| %_mask_idx: 0.40993| ppl: 212.95111| %_neg_is_pos: 0.00233| lr: 0.0| temp: 1.96725 | loss: 1.13786| constrast_loss: 4.48579| div_loss: 0.6566| %_mask_idx: 0.42607| ppl: 219.77576| %_neg_is_pos: 0.00137| lr: 0.0| temp: 1.96724 | loss: 1.12992| constrast_loss: 4.45235| div_loss: 0.67332| %_mask_idx: 0.37547| ppl: 209.07275| %_neg_is_pos: 0.0032| lr: 0.0| temp: 1.96724 | loss: 1.13554| constrast_loss: 4.47435| div_loss: 0.67813| %_mask_idx: 0.38565| ppl: 205.99922| %_neg_is_pos: 0.00333| lr: 0.0| temp: 1.96722 | loss: 1.12347| constrast_loss: 4.42598| div_loss: 0.67918| %_mask_idx: 0.35495| ppl: 205.32558| %_neg_is_pos: 0.00304| lr: 0.0| temp: 1.96722 | loss: 1.13149| constrast_loss: 4.45945| div_loss: 0.66509| %_mask_idx: 0.44142| ppl: 214.34521| %_neg_is_pos: 0.00282| lr: 0.0| temp: 1.96721 | loss: 1.13891| constrast_loss: 4.49035| div_loss: 0.65292| %_mask_idx: 0.39552| ppl: 222.12817| %_neg_is_pos: 0.00339| lr: 0.0| temp: 1.96721 | loss: 1.14037| constrast_loss: 4.49471| div_loss: 0.66773| %_mask_idx: 0.388| ppl: 212.65245| %_neg_is_pos: 0.00411| lr: 0.0| temp: 1.9672 | loss: 1.13833| constrast_loss: 4.48844| div_loss: 0.64887| %_mask_idx: 0.43891| ppl: 224.72159| %_neg_is_pos: 0.00139| lr: 0.0| temp: 1.9672 | loss: 1.13448| constrast_loss: 4.47202| div_loss: 0.65914| %_mask_idx: 0.41197| ppl: 218.15138| %_neg_is_pos: 0.00283| lr: 0.0| temp: 1.96719 | loss: 1.14124| constrast_loss: 4.49858| div_loss: 0.66392| %_mask_idx: 0.42387| ppl: 215.09348| %_neg_is_pos: 0.00271| lr: 0.0| temp: 1.96719 | loss: 1.13702| constrast_loss: 4.48195| div_loss: 0.66139| %_mask_idx: 0.414| ppl: 216.70767| %_neg_is_pos: 0.00481| lr: 0.0| temp: 1.96717| loss: 1.12233| constrast_loss: 4.42262| div_loss: 0.66686| %_mask_idx: 0.36231| ppl: 213.21191| %_neg_is_pos: 0.00822| lr: 0.0| temp: 1.96717 | loss: 1.13493| constrast_loss: 4.47265| div_loss: 0.67071| %_mask_idx: 0.4057| ppl: 210.7438| %_neg_is_pos: 0.00689| lr: 0.0| temp: 1.96716 | loss: 1.1432| constrast_loss: 4.5064| div_loss: 0.66416| %_mask_idx: 0.42935| ppl: 214.93643| %_neg_is_pos: 0.00191| lr: 0.0| temp: 1.96716 | loss: 1.13459| constrast_loss: 4.47072| div_loss: 0.67623| %_mask_idx: 0.40648| ppl: 207.21501| %_neg_is_pos: 0.00641| lr: 0.0| temp: 1.96714 | loss: 1.13546| constrast_loss: 4.47445| div_loss: 0.67371| %_mask_idx: 0.43358| ppl: 208.82739| %_neg_is_pos: 0.00325| lr: 0.0| temp: 1.96714 | loss: 1.1412| constrast_loss: 4.49864| div_loss: 0.66151| %_mask_idx: 0.44753| ppl: 216.63165| %_neg_is_pos: 0.00281| lr: 0.0| temp: 1.96713 | loss: 1.14441| constrast_loss: 4.51194| div_loss: 0.65707| %_mask_idx: 0.38487| ppl: 219.47751| %_neg_is_pos: 0.00102| lr: 0.0| temp: 1.96713 | loss: 1.14594| constrast_loss: 4.51856| div_loss: 0.65209| %_mask_idx: 0.40523| ppl: 222.66196| %_neg_is_pos: 0.0023| lr: 0.0| temp: 1.96712 | loss: 1.13684| constrast_loss: 4.48013| div_loss: 0.67215| %_mask_idx: 0.38534| ppl: 209.827| %_neg_is_pos: 0.00453| lr: 0.0| temp: 1.96712 | loss: 1.1293| constrast_loss: 4.45098| div_loss: 0.66207| %_mask_idx: 0.38722| ppl: 216.27524| %_neg_is_pos: 0.00377| lr: 0.0| temp: 1.96711 | loss: 1.13658| constrast_loss: 4.48051| div_loss: 0.65798| %_mask_idx: 0.34117| ppl: 218.89098| %_neg_is_pos: 0.00289| lr: 0.0| temp: 1.96711 | loss: 1.13769| constrast_loss: 4.48399| div_loss: 0.66768| %_mask_idx: 0.34305| ppl: 212.68427| %_neg_is_pos: 0.006| lr: 0.0| temp: 1.96709 | loss: 1.13059| constrast_loss: 4.45697| div_loss: 0.65405| %_mask_idx: 0.40586| ppl: 221.40549| %_neg_is_pos: 0.00319| lr: 0.0| temp: 1.96709 | loss: 1.12594| constrast_loss: 4.43625| div_loss: 0.67503| %_mask_idx: 0.36325| ppl: 207.97827| %_neg_is_pos: 0.00525| lr: 0.0| temp: 1.96708 | loss: 1.13084| constrast_loss: 4.45673| div_loss: 0.6662| %_mask_idx: 0.35667| ppl: 213.62979| %_neg_is_pos: 0.00569| lr: 0.0| temp: 1.96708 | loss: 1.14066| constrast_loss: 4.49763| div_loss: 0.64989| %_mask_idx: 0.40836| ppl: 224.07013| %_neg_is_pos: 0.00196| lr: 0.0| temp: 1.96707 | loss: 1.13481| constrast_loss: 4.47237| div_loss: 0.66858| %_mask_idx: 0.33882| ppl: 212.11179| %_neg_is_pos: 0.00288| lr: 0.0| temp: 1.96707 | loss: 1.1389| constrast_loss: 4.48917| div_loss: 0.6642| %_mask_idx: 0.37923| ppl: 214.909| %_neg_is_pos: 0.00162| lr: 0.0| temp: 1.96706 | loss: 1.13395| constrast_loss: 4.46939| div_loss: 0.66408| %_mask_idx: 0.38142| ppl: 214.98712| %_neg_is_pos: 0.00505| lr: 0.0| temp: 1.96706 | loss: 1.12971| constrast_loss: 4.45119| div_loss: 0.67656| %_mask_idx: 0.4198| ppl: 206.99875| %_neg_is_pos: 0.00393| lr: 0.0| temp: 1.96704 | loss: 1.14251| constrast_loss: 4.50362| div_loss: 0.66399| %_mask_idx: 0.39442| ppl: 215.04858| %_neg_is_pos: 0.00357| lr: 0.0| temp: 1.96704 | loss: 1.14535| constrast_loss: 4.51585| div_loss: 0.65568| %_mask_idx: 0.34978| ppl: 220.36206| %_neg_is_pos: 0.00142| lr: 0.0| temp: 1.96703 | loss: 1.12997| constrast_loss: 4.4526| div_loss: 0.67294| %_mask_idx: 0.35855| ppl: 209.32056| %_neg_is_pos: 0.00608| lr: 0.0| temp: 1.96703 | loss: 1.13294| constrast_loss: 4.46507| div_loss: 0.66693| %_mask_idx: 0.41996| ppl: 213.16487| %_neg_is_pos: 0.00185| lr: 0.0| temp: 1.96702 | loss: 1.1413| constrast_loss: 4.49902| div_loss: 0.66201| %_mask_idx: 0.36779| ppl: 216.31354| %_neg_is_pos: 0.00371| lr: 0.0| temp: 1.96702 | loss: 1.13061| constrast_loss: 4.4552| div_loss: 0.6724| %_mask_idx: 0.38189| ppl: 209.66371| %_neg_is_pos: 0.00403| lr: 0.0| temp: 1.96701 | loss: 1.12499| constrast_loss: 4.43293| div_loss: 0.67023| %_mask_idx: 0.37594| ppl: 211.05238| %_neg_is_pos: 0.0048| lr: 0.0| temp: 1.96701 | loss: 1.11998| constrast_loss: 4.41327| div_loss: 0.66649| %_mask_idx: 0.37343| ppl: 213.44397| %_neg_is_pos: 0.00346| lr: 0.0| temp: 1.967 | loss: 1.12472| constrast_loss: 4.43006| div_loss: 0.68823| %_mask_idx: 0.38048| ppl: 199.53229| %_neg_is_pos: 0.00453| lr: 0.0| temp: 1.967 | loss: 1.12493| constrast_loss: 4.43234| div_loss: 0.67365| %_mask_idx: 0.35667| ppl: 208.86548| %_neg_is_pos: 0.00382| lr: 0.0| temp: 1.96699 | loss: 1.13602| constrast_loss: 4.47669| div_loss: 0.67375| %_mask_idx: 0.38017| ppl: 208.79857| %_neg_is_pos: 0.0061| lr: 0.0| temp: 1.96699 | loss: 1.13514| constrast_loss: 4.47286| div_loss: 0.6769| %_mask_idx: 0.38111| ppl: 206.78641| %_neg_is_pos: 0.00309| lr: 0.0| temp: 1.96697 | loss: 1.13821| constrast_loss: 4.48726| div_loss: 0.65568| %_mask_idx: 0.40555| ppl: 220.3667| %_neg_is_pos: 0.00687| lr: 0.0| temp: 1.96697 | loss: 1.13452| constrast_loss: 4.47164| div_loss: 0.66445| %_mask_idx: 0.37813| ppl: 214.7529| %_neg_is_pos: 0.00828| lr: 0.0| temp: 1.96696 | loss: 1.13| constrast_loss: 4.45387| div_loss: 0.66118| %_mask_idx: 0.39536| ppl: 216.84273| %_neg_is_pos: 0.00264| lr: 0.0| temp: 1.96696 | loss: 1.1413| constrast_loss: 4.50008| div_loss: 0.65134| %_mask_idx: 0.41369| ppl: 223.13953| %_neg_is_pos: 0.00172| lr: 0.0| temp: 1.96695 | loss: 1.13857| constrast_loss: 4.48821| div_loss: 0.66054| %_mask_idx: 0.38142| ppl: 217.25238| %_neg_is_pos: 0.00178| lr: 0.0| temp: 1.96695 | loss: 1.13455| constrast_loss: 4.47184| div_loss: 0.66346| %_mask_idx: 0.37782| ppl: 215.38605| %_neg_is_pos: 0.00495| lr: 0.0| temp: 1.96694 | loss: 1.13277| constrast_loss: 4.46444| div_loss: 0.66642| %_mask_idx: 0.37876| ppl: 213.49414| %_neg_is_pos: 0.00466| lr: 0.0| temp: 1.96694 | loss: 1.13421| constrast_loss: 4.47028| div_loss: 0.66582| %_mask_idx: 0.36497| ppl: 213.87531| %_neg_is_pos: 0.00439| lr: 0.0| temp: 1.96692 | loss: 1.1316| constrast_loss: 4.46054| div_loss: 0.65871| %_mask_idx: 0.39301| ppl: 218.42651| %_neg_is_pos: 0.00307| lr: 0.0| temp: 1.96692 | loss: 1.13647| constrast_loss: 4.48018| div_loss: 0.65686| %_mask_idx: 0.38957| ppl: 219.61092| %_neg_is_pos: 0.00219| lr: 0.0| temp: 1.96691 | loss: 1.13808| constrast_loss: 4.48717| div_loss: 0.65129| %_mask_idx: 0.42701| ppl: 223.17215| %_neg_is_pos: 0.0017| lr: 0.0| temp: 1.96691 | loss: 1.13571| constrast_loss: 4.47527| div_loss: 0.6758| %_mask_idx: 0.37766| ppl: 207.48535| %_neg_is_pos: 0.0023| lr: 0.0| temp: 1.9669 | loss: 1.13909| constrast_loss: 4.49162| div_loss: 0.64724| %_mask_idx: 0.38001| ppl: 225.76358| %_neg_is_pos: 0.0013| lr: 0.0| temp: 1.9669 | loss: 1.14254| constrast_loss: 4.50546| div_loss: 0.6469| %_mask_idx: 0.43374| ppl: 225.98578| %_neg_is_pos: 0.00298| lr: 0.0| temp: 1.96689 | loss: 1.13545| constrast_loss: 4.47618| div_loss: 0.65609| %_mask_idx: 0.40476| ppl: 220.1022| %_neg_is_pos: 0.00428| lr: 0.0| temp: 1.96689 | loss: 1.13059| constrast_loss: 4.45382| div_loss: 0.68548| %_mask_idx: 0.4093| ppl: 201.29146| %_neg_is_pos: 0.00432| lr: 0.0| temp: 1.96687 | loss: 1.13057| constrast_loss: 4.45474| div_loss: 0.67518| %_mask_idx: 0.41823| ppl: 207.88689| %_neg_is_pos: 0.00266| lr: 0.0| temp: 1.96687 | loss: 1.13662| constrast_loss: 4.47904| div_loss: 0.67452| %_mask_idx: 0.39834| ppl: 208.30954| %_neg_is_pos: 0.0045| lr: 0.0| temp: 1.96686 | loss: 1.13931| constrast_loss: 4.49147| div_loss: 0.65779| %_mask_idx: 0.41165| ppl: 219.01749| %_neg_is_pos: 0.00289| lr: 0.0| temp: 1.96686 | loss: 1.14693| constrast_loss: 4.52182| div_loss: 0.65892| %_mask_idx: 0.43546| ppl: 218.29111| %_neg_is_pos: 0.00104| lr: 0.0| temp: 1.96685 | loss: 1.13796| constrast_loss: 4.48527| div_loss: 0.66569| %_mask_idx: 0.35667| ppl: 213.95718| %_neg_is_pos: 0.00342| lr: 0.0| temp: 1.96685 | loss: 1.13004| constrast_loss: 4.45262| div_loss: 0.67536| %_mask_idx: 0.44659| ppl: 207.77002| %_neg_is_pos: 0.00319| lr: 0.0| temp: 1.96684 | loss: 1.13607| constrast_loss: 4.47807| div_loss: 0.66195| %_mask_idx: 0.35918| ppl: 216.35248| %_neg_is_pos: 0.0027| lr: 0.0| temp: 1.96684 | loss: 1.12457| constrast_loss: 4.43039| div_loss: 0.6787| %_mask_idx: 0.39364| ppl: 205.63361| %_neg_is_pos: 0.00577| lr: 0.0| temp: 1.96682 | loss: 1.13531| constrast_loss: 4.47629| div_loss: 0.64964| %_mask_idx: 0.40335| ppl: 224.23038| %_neg_is_pos: 0.00232| lr: 0.0| temp: 1.96682 | loss: 1.12996| constrast_loss: 4.45297| div_loss: 0.6685| %_mask_idx: 0.39912| ppl: 212.16232| %_neg_is_pos: 0.00452| lr: 0.0| temp: 1.96681 | loss: 1.13488| constrast_loss: 4.4729| div_loss: 0.6663| %_mask_idx: 0.36576| ppl: 213.56912| %_neg_is_pos: 0.00387| lr: 0.0| temp: 1.96681 | loss: 1.13862| constrast_loss: 4.48906| div_loss: 0.65439| %_mask_idx: 0.41808| ppl: 221.18895| %_neg_is_pos: 0.00227| lr: 0.0| temp: 1.96679 | loss: 1.1307| constrast_loss: 4.45671| div_loss: 0.66104| %_mask_idx: 0.37766| ppl: 216.9353| %_neg_is_pos: 0.00313| lr: 0.0| temp: 1.96679 | loss: 1.12662| constrast_loss: 4.43819| div_loss: 0.68306| %_mask_idx: 0.38283| ppl: 202.84024| %_neg_is_pos: 0.00707| lr: 0.0| temp: 1.96678 | loss: 1.13336| constrast_loss: 4.46767| div_loss: 0.65769| %_mask_idx: 0.36012| ppl: 219.0791| %_neg_is_pos: 0.00362| lr: 0.0| temp: 1.96678 | loss: 1.13044| constrast_loss: 4.45523| div_loss: 0.66547| %_mask_idx: 0.375| ppl: 214.10095| %_neg_is_pos: 0.00356| lr: 0.0| temp: 1.96677 | loss: 1.1364| constrast_loss: 4.47843| div_loss: 0.6716| %_mask_idx: 0.38205| ppl: 210.17902| %_neg_is_pos: 0.00371| lr: 0.0| temp: 1.96677 | loss: 1.13517| constrast_loss: 4.47498| div_loss: 0.65692| %_mask_idx: 0.40836| ppl: 219.56833| %_neg_is_pos: 0.00279| lr: 0.0| temp: 1.96676 | loss: 1.14146| constrast_loss: 4.49876| div_loss: 0.67083| %_mask_idx: 0.35714| ppl: 210.66592| %_neg_is_pos: 0.00437| lr: 0.0| temp: 1.96676 | loss: 1.13889| constrast_loss: 4.49001| div_loss: 0.65545| %_mask_idx: 0.3714| ppl: 220.50931| %_neg_is_pos: 0.00638| lr: 0.0| temp: 1.96674 | loss: 1.12555| constrast_loss: 4.43429| div_loss: 0.67894| %_mask_idx: 0.34305| ppl: 205.48032| %_neg_is_pos: 0.00492| lr: 0.0| temp: 1.96674 | loss: 1.14496| constrast_loss: 4.51397| div_loss: 0.65879| %_mask_idx: 0.40116| ppl: 218.37161| %_neg_is_pos: 0.00179| lr: 0.0| temp: 1.96673 | loss: 1.13086| constrast_loss: 4.45571| div_loss: 0.6774| %_mask_idx: 0.36685| ppl: 206.46579| %_neg_is_pos: 0.00483| lr: 0.0| temp: 1.96673 | loss: 1.14| constrast_loss: 4.49437| div_loss: 0.65611| %_mask_idx: 0.40602| ppl: 220.09036| %_neg_is_pos: 0.00302| lr: 0.0| temp: 1.96672 | loss: 1.1327| constrast_loss: 4.46476| div_loss: 0.66023| %_mask_idx: 0.42027| ppl: 217.45282| %_neg_is_pos: 0.00349| lr: 0.0| temp: 1.96672 | loss: 1.13828| constrast_loss: 4.48742| div_loss: 0.65709| %_mask_idx: 0.36169| ppl: 219.46179| %_neg_is_pos: 0.00415| lr: 0.0| temp: 1.96671 | loss: 1.13378| constrast_loss: 4.46713| div_loss: 0.67988| %_mask_idx: 0.36247| ppl: 204.8786| %_neg_is_pos: 0.00809| lr: 0.0| temp: 1.96671 | loss: 1.13481| constrast_loss: 4.47269| div_loss: 0.66542| %_mask_idx: 0.40226| ppl: 214.12973| %_neg_is_pos: 0.00662| lr: 0.0| temp: 1.96669 | loss: 1.12342| constrast_loss: 4.42645| div_loss: 0.6723| %_mask_idx: 0.38675| ppl: 209.72665| %_neg_is_pos: 0.00425| lr: 0.0| temp: 1.96669 | loss: 1.12934| constrast_loss: 4.45082| div_loss: 0.66546| %_mask_idx: 0.39959| ppl: 214.10728| %_neg_is_pos: 0.00404| lr: 0.0| temp: 1.96668 | loss: 1.13306| constrast_loss: 4.46532| div_loss: 0.66912| %_mask_idx: 0.38659| ppl: 211.76604| %_neg_is_pos: 0.00296| lr: 0.0| temp: 1.96668 | loss: 1.13039| constrast_loss: 4.45454| div_loss: 0.67026| %_mask_idx: 0.40915| ppl: 211.03406| %_neg_is_pos: 0.0036| lr: 0.0| temp: 1.96667 | loss: 1.13067| constrast_loss: 4.45504| div_loss: 0.6762| %_mask_idx: 0.41181| ppl: 207.23062| %_neg_is_pos: 0.00557| lr: 0.0| temp: 1.96667 | loss: 1.1312| constrast_loss: 4.45799| div_loss: 0.66807| %_mask_idx: 0.39897| ppl: 212.43605| %_neg_is_pos: 0.00361| lr: 0.0| temp: 1.96666 | loss: 1.14109| constrast_loss: 4.49821| div_loss: 0.66133| %_mask_idx: 0.40664| ppl: 216.75116| %_neg_is_pos: 0.00125| lr: 0.0| temp: 1.96666 | loss: 1.12822| constrast_loss: 4.44594| div_loss: 0.6695| %_mask_idx: 0.40147| ppl: 211.52097| %_neg_is_pos: 0.00422| lr: 0.0| temp: 1.96664 | loss: 1.13255| constrast_loss: 4.46355| div_loss: 0.6663| %_mask_idx: 0.41447| ppl: 213.5687| %_neg_is_pos: 0.00367| lr: 0.0| temp: 1.96664 | loss: 1.1274| constrast_loss: 4.44149| div_loss: 0.68117| %_mask_idx: 0.38252| ppl: 204.04834| %_neg_is_pos: 0.0049| lr: 0.0| temp: 1.96663 | loss: 1.1295| constrast_loss: 4.45243| div_loss: 0.65575| %_mask_idx: 0.39427| ppl: 220.31781| %_neg_is_pos: 0.00284| lr: 0.0| temp: 1.96663 [2021-09-02 04:29:42,867] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 04:29:42,867] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.12672| constrast_loss: 4.43813| div_loss: 0.6876| %_mask_idx: 0.38769| ppl: 199.93539| %_neg_is_pos: 0.00265| lr: 0.0| temp: 1.96661 | loss: 1.12743| constrast_loss: 4.44302| div_loss: 0.6671| %_mask_idx: 0.37578| ppl: 213.05556| %_neg_is_pos: 0.0029| lr: 0.0| temp: 1.96661 | loss: 1.12738| constrast_loss: 4.44076| div_loss: 0.68773| %_mask_idx: 0.37343| ppl: 199.85451| %_neg_is_pos: 0.00304| lr: 0.0| temp: 1.9666 | loss: 1.13809| constrast_loss: 4.48508| div_loss: 0.67293| %_mask_idx: 0.35417| ppl: 209.32396| %_neg_is_pos: 0.00564| lr: 0.0| temp: 1.9666 | loss: 1.1401| constrast_loss: 4.49268| div_loss: 0.67718| %_mask_idx: 0.37093| ppl: 206.60428| %_neg_is_pos: 0.00201| lr: 0.0| temp: 1.96659 | loss: 1.12659| constrast_loss: 4.43913| div_loss: 0.67222| %_mask_idx: 0.39818| ppl: 209.77985| %_neg_is_pos: 0.00179| lr: 0.0| temp: 1.96659 | loss: 1.13158| constrast_loss: 4.45906| div_loss: 0.6725| %_mask_idx: 0.36482| ppl: 209.59732| %_neg_is_pos: 0.00333| lr: 0.0| temp: 1.96658 | loss: 1.12447| constrast_loss: 4.42988| div_loss: 0.67999| %_mask_idx: 0.33835| ppl: 204.8045| %_neg_is_pos: 0.00334| lr: 0.0| temp: 1.96658 | loss: 1.13411| constrast_loss: 4.47055| div_loss: 0.65907| %_mask_idx: 0.42575| ppl: 218.19576| %_neg_is_pos: 0.00165| lr: 0.0| temp: 1.96656| loss: 1.13002| constrast_loss: 4.45253| div_loss: 0.67549| %_mask_idx: 0.38283| ppl: 207.68832| %_neg_is_pos: 0.00287| lr: 0.0| temp: 1.96656 | loss: 1.12872| constrast_loss: 4.44793| div_loss: 0.66933| %_mask_idx: 0.44142| ppl: 211.63113| %_neg_is_pos: 0.00262| lr: 0.0| temp: 1.96655 | loss: 1.12958| constrast_loss: 4.45123| div_loss: 0.67088| %_mask_idx: 0.41385| ppl: 210.63586| %_neg_is_pos: 0.00249| lr: 0.0| temp: 1.96655 | loss: 1.13153| constrast_loss: 4.46036| div_loss: 0.65773| %_mask_idx: 0.39254| ppl: 219.05353| %_neg_is_pos: 0.00298| lr: 0.0| temp: 1.96654 | loss: 1.13984| constrast_loss: 4.49453| div_loss: 0.64841| %_mask_idx: 0.3667| ppl: 225.0148| %_neg_is_pos: 0.00196| lr: 0.0| temp: 1.96654 | loss: 1.1311| constrast_loss: 4.45747| div_loss: 0.66912| %_mask_idx: 0.38878| ppl: 211.76128| %_neg_is_pos: 0.00269| lr: 0.0| temp: 1.96653 | loss: 1.13878| constrast_loss: 4.48907| div_loss: 0.66033| %_mask_idx: 0.42935| ppl: 217.38882| %_neg_is_pos: 0.00265| lr: 0.0| temp: 1.96653 | loss: 1.12152| constrast_loss: 4.41966| div_loss: 0.66433| %_mask_idx: 0.36826| ppl: 214.82774| %_neg_is_pos: 0.00445| lr: 0.0| temp: 1.96651 | loss: 1.13155| constrast_loss: 4.45785| div_loss: 0.68359| %_mask_idx: 0.33474| ppl: 202.50232| %_neg_is_pos: 0.00302| lr: 0.0| temp: 1.96651 | loss: 1.12163| constrast_loss: 4.41798| div_loss: 0.6854| %_mask_idx: 0.38941| ppl: 201.34578| %_neg_is_pos: 0.00419| lr: 0.0| temp: 1.9665 | loss: 1.13163| constrast_loss: 4.45936| div_loss: 0.67159| %_mask_idx: 0.41009| ppl: 210.18495| %_neg_is_pos: 0.00421| lr: 0.0| temp: 1.9665 | loss: 1.13776| constrast_loss: 4.48543| div_loss: 0.65606| %_mask_idx: 0.39568| ppl: 220.12292| %_neg_is_pos: 0.00352| lr: 0.0| temp: 1.96649 | loss: 1.12592| constrast_loss: 4.43668| div_loss: 0.67018| %_mask_idx: 0.34571| ppl: 211.08286| %_neg_is_pos: 0.00437| lr: 0.0| temp: 1.96649 | loss: 1.13971| constrast_loss: 4.49278| div_loss: 0.66057| %_mask_idx: 0.36576| ppl: 217.23288| %_neg_is_pos: 0.00277| lr: 0.0| temp: 1.96648 | loss: 1.13962| constrast_loss: 4.49196| div_loss: 0.66509| %_mask_idx: 0.42779| ppl: 214.34116| %_neg_is_pos: 0.00274| lr: 0.0| temp: 1.96648 | loss: 1.13125| constrast_loss: 4.45759| div_loss: 0.67401| %_mask_idx: 0.39364| ppl: 208.63065| %_neg_is_pos: 0.00426| lr: 0.0| temp: 1.96646 | loss: 1.12344| constrast_loss: 4.42526| div_loss: 0.68513| %_mask_idx: 0.4256| ppl: 201.51471| %_neg_is_pos: 0.00363| lr: 0.0| temp: 1.96646 | loss: 1.13792| constrast_loss: 4.48622| div_loss: 0.65449| %_mask_idx: 0.37296| ppl: 221.12427| %_neg_is_pos: 0.00242| lr: 0.0| temp: 1.96645 | loss: 1.12245| constrast_loss: 4.42446| div_loss: 0.65335| %_mask_idx: 0.35213| ppl: 221.8591| %_neg_is_pos: 0.0037| lr: 0.0| temp: 1.96645 | loss: 1.1317| constrast_loss: 4.45917| div_loss: 0.6764| %_mask_idx: 0.34931| ppl: 207.10402| %_neg_is_pos: 0.0033| lr: 0.0| temp: 1.96643 | loss: 1.12358| constrast_loss: 4.42706| div_loss: 0.67254| %_mask_idx: 0.39176| ppl: 209.57169| %_neg_is_pos: 0.00342| lr: 0.0| temp: 1.96643 | loss: 1.14128| constrast_loss: 4.49963| div_loss: 0.65492| %_mask_idx: 0.40132| ppl: 220.85251| %_neg_is_pos: 0.00278| lr: 0.0| temp: 1.96642 | loss: 1.12501| constrast_loss: 4.43273| div_loss: 0.67314| %_mask_idx: 0.37735| ppl: 209.18878| %_neg_is_pos: 0.00373| lr: 0.0| temp: 1.96642 | loss: 1.13793| constrast_loss: 4.48613| div_loss: 0.65584| %_mask_idx: 0.33318| ppl: 220.26431| %_neg_is_pos: 0.0029| lr: 0.0| temp: 1.96642 | loss: 1.12807| constrast_loss: 4.44516| div_loss: 0.67127| %_mask_idx: 0.39928| ppl: 210.38882| %_neg_is_pos: 0.00351| lr: 0.0| temp: 1.96642 | loss: 1.1282| constrast_loss: 4.44477| div_loss: 0.68036| %_mask_idx: 0.36153| ppl: 204.56757| %_neg_is_pos: 0.0042| lr: 0.0| temp: 1.96641 | loss: 1.12372| constrast_loss: 4.42796| div_loss: 0.66935| %_mask_idx: 0.39975| ppl: 211.61301| %_neg_is_pos: 0.00277| lr: 0.0| temp: 1.96641 | loss: 1.13215| constrast_loss: 4.4623| div_loss: 0.6631| %_mask_idx: 0.39662| ppl: 215.61856| %_neg_is_pos: 0.00328| lr: 0.0| temp: 1.96639 | loss: 1.13447| constrast_loss: 4.47148| div_loss: 0.66407| %_mask_idx: 0.4787| ppl: 214.99643| %_neg_is_pos: 0.00193| lr: 0.0| temp: 1.96639 | loss: 1.12789| constrast_loss: 4.44496| div_loss: 0.66618| %_mask_idx: 0.36184| ppl: 213.64583| %_neg_is_pos: 0.00282| lr: 0.0| temp: 1.96638 | loss: 1.13883| constrast_loss: 4.48829| div_loss: 0.67026| %_mask_idx: 0.39051| ppl: 211.0358| %_neg_is_pos: 0.00228| lr: 0.0| temp: 1.96638 | loss: 1.14069| constrast_loss: 4.49795| div_loss: 0.64793| %_mask_idx: 0.38346| ppl: 225.32185| %_neg_is_pos: 0.0026| lr: 0.0| temp: 1.96637 | loss: 1.13401| constrast_loss: 4.46972| div_loss: 0.66326| %_mask_idx: 0.42027| ppl: 215.5139| %_neg_is_pos: 0.00221| lr: 0.0| temp: 1.96637 | loss: 1.13159| constrast_loss: 4.45806| div_loss: 0.68281| %_mask_idx: 0.34962| ppl: 203.00156| %_neg_is_pos: 0.00443| lr: 0.0| temp: 1.96636 | loss: 1.13057| constrast_loss: 4.45583| div_loss: 0.66433| %_mask_idx: 0.41902| ppl: 214.82686| %_neg_is_pos: 0.00216| lr: 0.0| temp: 1.96636 | loss: 1.1267| constrast_loss: 4.44| div_loss: 0.66815| %_mask_idx: 0.3808| ppl: 212.38104| %_neg_is_pos: 0.00244| lr: 0.0| temp: 1.96634 | loss: 1.14109| constrast_loss: 4.49763| div_loss: 0.66717| %_mask_idx: 0.38549| ppl: 213.01082| %_neg_is_pos: 0.00265| lr: 0.0| temp: 1.96634 | loss: 1.12378| constrast_loss: 4.4276| div_loss: 0.67541| %_mask_idx: 0.39881| ppl: 207.74034| %_neg_is_pos: 0.003| lr: 0.0| temp: 1.96633 | loss: 1.13236| constrast_loss: 4.46327| div_loss: 0.66187| %_mask_idx: 0.39991| ppl: 216.40182| %_neg_is_pos: 0.00147| lr: 0.0| temp: 1.96633 | loss: 1.13191| constrast_loss: 4.46165| div_loss: 0.6598| %_mask_idx: 0.3891| ppl: 217.73022| %_neg_is_pos: 0.00324| lr: 0.0| temp: 1.96632 | loss: 1.12589| constrast_loss: 4.43632| div_loss: 0.67258| %_mask_idx: 0.41651| ppl: 209.54993| %_neg_is_pos: 0.00264| lr: 0.0| temp: 1.96632 | loss: 1.13315| constrast_loss: 4.465| div_loss: 0.67612| %_mask_idx: 0.38878| ppl: 207.28378| %_neg_is_pos: 0.00329| lr: 0.0| temp: 1.96631 | loss: 1.12795| constrast_loss: 4.44414| div_loss: 0.67676| %_mask_idx: 0.37249| ppl: 206.87598| %_neg_is_pos: 0.00253| lr: 0.0| temp: 1.96631 | loss: 1.14016| constrast_loss: 4.4947| div_loss: 0.65946| %_mask_idx: 0.42293| ppl: 217.94727| %_neg_is_pos: 0.00154| lr: 0.0| temp: 1.96629 | loss: 1.13223| constrast_loss: 4.46141| div_loss: 0.67509| %_mask_idx: 0.43734| ppl: 207.9408| %_neg_is_pos: 0.00289| lr: 0.0| temp: 1.96629 | loss: 1.12475| constrast_loss: 4.43224| div_loss: 0.66759| %_mask_idx: 0.40288| ppl: 212.74478| %_neg_is_pos: 0.00326| lr: 0.0| temp: 1.96628 | loss: 1.13468| constrast_loss: 4.47222| div_loss: 0.66489| %_mask_idx: 0.42951| ppl: 214.47037| %_neg_is_pos: 0.00223| lr: 0.0| temp: 1.96628 | loss: 1.13617| constrast_loss: 4.47777| div_loss: 0.66923| %_mask_idx: 0.42027| ppl: 211.69144| %_neg_is_pos: 0.00213| lr: 0.0| temp: 1.96626 | loss: 1.1246| constrast_loss: 4.43184| div_loss: 0.66582| %_mask_idx: 0.34414| ppl: 213.8783| %_neg_is_pos: 0.00484| lr: 0.0| temp: 1.96626 | loss: 1.13677| constrast_loss: 4.4818| div_loss: 0.65263| %_mask_idx: 0.42152| ppl: 222.31612| %_neg_is_pos: 0.00215| lr: 0.0| temp: 1.96625 | loss: 1.13642| constrast_loss: 4.47994| div_loss: 0.65726| %_mask_idx: 0.39552| ppl: 219.35513| %_neg_is_pos: 0.00146| lr: 0.0| temp: 1.96625 | loss: 1.14082| constrast_loss: 4.49728| div_loss: 0.65991| %_mask_idx: 0.41259| ppl: 217.65854| %_neg_is_pos: 0.00287| lr: 0.0| temp: 1.96624 | loss: 1.13578| constrast_loss: 4.47676| div_loss: 0.66354| %_mask_idx: 0.38612| ppl: 215.33704| %_neg_is_pos: 0.00355| lr: 0.0| temp: 1.96624 | loss: 1.12963| constrast_loss: 4.45301| div_loss: 0.65495| %_mask_idx: 0.36184| ppl: 220.83243| %_neg_is_pos: 0.00293| lr: 0.0| temp: 1.96623 | loss: 1.12784| constrast_loss: 4.44385| div_loss: 0.67518| %_mask_idx: 0.375| ppl: 207.88306| %_neg_is_pos: 0.00282| lr: 0.0| temp: 1.96623 | loss: 1.13372| constrast_loss: 4.46892| div_loss: 0.65972| %_mask_idx: 0.3891| ppl: 217.77762| %_neg_is_pos: 0.00193| lr: 0.0| temp: 1.96621 | loss: 1.13937| constrast_loss: 4.49217| div_loss: 0.6531| %_mask_idx: 0.40273| ppl: 222.01492| %_neg_is_pos: 0.00347| lr: 0.0| temp: 1.96621 | loss: 1.14029| constrast_loss: 4.49472| div_loss: 0.66452| %_mask_idx: 0.44471| ppl: 214.70685| %_neg_is_pos: 0.00153| lr: 0.0| temp: 1.9662 | loss: 1.12551| constrast_loss: 4.43365| div_loss: 0.68386| %_mask_idx: 0.35761| ppl: 202.33087| %_neg_is_pos: 0.00442| lr: 0.0| temp: 1.9662 | loss: 1.13184| constrast_loss: 4.46138| div_loss: 0.65982| %_mask_idx: 0.40962| ppl: 217.71556| %_neg_is_pos: 0.00235| lr: 0.0| temp: 1.96619 | loss: 1.13015| constrast_loss: 4.45425| div_loss: 0.66368| %_mask_idx: 0.3938| ppl: 215.24423| %_neg_is_pos: 0.00227| lr: 0.0| temp: 1.96619 | loss: 1.13076| constrast_loss: 4.45642| div_loss: 0.66625| %_mask_idx: 0.43719| ppl: 213.60164| %_neg_is_pos: 0.0023| lr: 0.0| temp: 1.96618 | loss: 1.1249| constrast_loss: 4.43333| div_loss: 0.66254| %_mask_idx: 0.41103| ppl: 215.97491| %_neg_is_pos: 0.00482| lr: 0.0| temp: 1.96618 | loss: 1.13084| constrast_loss: 4.45727| div_loss: 0.66085| %_mask_idx: 0.35918| ppl: 217.05832| %_neg_is_pos: 0.00251| lr: 0.0| temp: 1.96616 | loss: 1.12349| constrast_loss: 4.42587| div_loss: 0.68077| %_mask_idx: 0.37688| ppl: 204.30878| %_neg_is_pos: 0.00444| lr: 0.0| temp: 1.96616 | loss: 1.13381| constrast_loss: 4.46883| div_loss: 0.6642| %_mask_idx: 0.3891| ppl: 214.91049| %_neg_is_pos: 0.00267| lr: 0.0| temp: 1.96615 | loss: 1.14335| constrast_loss: 4.50854| div_loss: 0.64842| %_mask_idx: 0.4281| ppl: 225.0105| %_neg_is_pos: 0.00171| lr: 0.0| temp: 1.96615 | loss: 1.1226| constrast_loss: 4.42246| div_loss: 0.6795| %_mask_idx: 0.35855| ppl: 205.12006| %_neg_is_pos: 0.00372| lr: 0.0| temp: 1.96614 | loss: 1.13488| constrast_loss: 4.47339| div_loss: 0.66139| %_mask_idx: 0.4021| ppl: 216.71323| %_neg_is_pos: 0.00247| lr: 0.0| temp: 1.96614 | loss: 1.13396| constrast_loss: 4.471| div_loss: 0.64829| %_mask_idx: 0.39834| ppl: 225.09482| %_neg_is_pos: 0.00206| lr: 0.0| temp: 1.96613 | loss: 1.12419| constrast_loss: 4.42984| div_loss: 0.6691| %_mask_idx: 0.34164| ppl: 211.77461| %_neg_is_pos: 0.00247| lr: 0.0| temp: 1.96613 | loss: 1.1303| constrast_loss: 4.45444| div_loss: 0.66765| %_mask_idx: 0.35276| ppl: 212.70544| %_neg_is_pos: 0.00274| lr: 0.0| temp: 1.96611 | loss: 1.13349| constrast_loss: 4.46752| div_loss: 0.66442| %_mask_idx: 0.37328| ppl: 214.77142| %_neg_is_pos: 0.00263| lr: 0.0| temp: 1.96611 | loss: 1.1278| constrast_loss: 4.44329| div_loss: 0.67898| %_mask_idx: 0.35291| ppl: 205.45575| %_neg_is_pos: 0.00414| lr: 0.0| temp: 1.9661 | loss: 1.13873| constrast_loss: 4.48797| div_loss: 0.6694| %_mask_idx: 0.39881| ppl: 211.58232| %_neg_is_pos: 0.00298| lr: 0.0| temp: 1.9661 | loss: 1.11353| constrast_loss: 4.38349| div_loss: 0.70625| %_mask_idx: 0.37672| ppl: 188.00064| %_neg_is_pos: 0.00541| lr: 0.0| temp: 1.96608 | loss: 1.14235| constrast_loss: 4.50322| div_loss: 0.66169| %_mask_idx: 0.40962| ppl: 216.52023| %_neg_is_pos: 0.00214| lr: 0.0| temp: 1.96608 | loss: 1.12898| constrast_loss: 4.44857| div_loss: 0.67357| %_mask_idx: 0.38221| ppl: 208.91379| %_neg_is_pos: 0.00356| lr: 0.0| temp: 1.96607 | loss: 1.13442| constrast_loss: 4.47121| div_loss: 0.66476| %_mask_idx: 0.4234| ppl: 214.55167| %_neg_is_pos: 0.00155| lr: 0.0| temp: 1.96607 | loss: 1.13089| constrast_loss: 4.45818| div_loss: 0.65372| %_mask_idx: 0.39834| ppl: 221.62213| %_neg_is_pos: 0.00358| lr: 0.0| temp: 1.96606 | loss: 1.1249| constrast_loss: 4.4331| div_loss: 0.66493| %_mask_idx: 0.37453| ppl: 214.44162| %_neg_is_pos: 0.00447| lr: 0.0| temp: 1.96606 | loss: 1.13171| constrast_loss: 4.45929| div_loss: 0.67572| %_mask_idx: 0.38299| ppl: 207.541| %_neg_is_pos: 0.0037| lr: 0.0| temp: 1.96605 | loss: 1.13456| constrast_loss: 4.47256| div_loss: 0.65694| %_mask_idx: 0.44549| ppl: 219.5564| %_neg_is_pos: 0.00229| lr: 0.0| temp: 1.96605 | loss: 1.12399| constrast_loss: 4.42766| div_loss: 0.68295| %_mask_idx: 0.36216| ppl: 202.91061| %_neg_is_pos: 0.00399| lr: 0.0| temp: 1.96603 | loss: 1.14739| constrast_loss: 4.52486| div_loss: 0.64706| %_mask_idx: 0.38221| ppl: 225.88417| %_neg_is_pos: 0.00156| lr: 0.0| temp: 1.96603 | loss: 1.12755| constrast_loss: 4.44468| div_loss: 0.6551| %_mask_idx: 0.40711| ppl: 220.73474| %_neg_is_pos: 0.0021| lr: 0.0| temp: 1.96602 | loss: 1.12636| constrast_loss: 4.43966| div_loss: 0.65774| %_mask_idx: 0.36106| ppl: 219.04428| %_neg_is_pos: 0.00346| lr: 0.0| temp: 1.96602 | loss: 1.13701| constrast_loss: 4.48183| div_loss: 0.66192| %_mask_idx: 0.38315| ppl: 216.37212| %_neg_is_pos: 0.00375| lr: 0.0| temp: 1.96601 | loss: 1.14243| constrast_loss: 4.50321| div_loss: 0.66522| %_mask_idx: 0.39583| ppl: 214.26007| %_neg_is_pos: 0.00375| lr: 0.0| temp: 1.96601 | loss: 1.14102| constrast_loss: 4.4994| div_loss: 0.64668| %_mask_idx: 0.41463| ppl: 226.12582| %_neg_is_pos: 0.00167| lr: 0.0| temp: 1.966 | loss: 1.13099| constrast_loss: 4.4571| div_loss: 0.66881| %_mask_idx: 0.37641| ppl: 211.96042| %_neg_is_pos: 0.00181| lr: 0.0| temp: 1.966 | loss: 1.14155| constrast_loss: 4.50111| div_loss: 0.65083| %_mask_idx: 0.35166| ppl: 223.4668| %_neg_is_pos: 0.00234| lr: 0.0| temp: 1.96598 | loss: 1.13363| constrast_loss: 4.46767| div_loss: 0.66849| %_mask_idx: 0.37234| ppl: 212.16319| %_neg_is_pos: 0.00385| lr: 0.0| temp: 1.96598 | loss: 1.13976| constrast_loss: 4.49379| div_loss: 0.65258| %_mask_idx: 0.4093| ppl: 222.351| %_neg_is_pos: 0.00181| lr: 0.0| temp: 1.96597 | loss: 1.12712| constrast_loss: 4.44151| div_loss: 0.66954| %_mask_idx: 0.42826| ppl: 211.49384| %_neg_is_pos: 0.00323| lr: 0.0| temp: 1.96597 | loss: 1.12682| constrast_loss: 4.44097| div_loss: 0.66315| %_mask_idx: 0.40962| ppl: 215.58572| %_neg_is_pos: 0.00236| lr: 0.0| temp: 1.96596 | loss: 1.13517| constrast_loss: 4.47419| div_loss: 0.66503| %_mask_idx: 0.40085| ppl: 214.38316| %_neg_is_pos: 0.00244| lr: 0.0| temp: 1.96596 | loss: 1.1373| constrast_loss: 4.48393| div_loss: 0.65289| %_mask_idx: 0.45818| ppl: 222.15182| %_neg_is_pos: 0.00156| lr: 0.0| temp: 1.96595 | loss: 1.14106| constrast_loss: 4.49789| div_loss: 0.66353| %_mask_idx: 0.33631| ppl: 215.34032| %_neg_is_pos: 0.00269| lr: 0.0| temp: 1.96595 | loss: 1.12616| constrast_loss: 4.43832| div_loss: 0.66326| %_mask_idx: 0.42215| ppl: 215.51492| %_neg_is_pos: 0.00271| lr: 0.0| temp: 1.96593 | loss: 1.12423| constrast_loss: 4.42859| div_loss: 0.68317| %_mask_idx: 0.33631| ppl: 202.77158| %_neg_is_pos: 0.00591| lr: 0.0| temp: 1.96593 | loss: 1.13122| constrast_loss: 4.45969| div_loss: 0.65194| %_mask_idx: 0.3714| ppl: 222.75813| %_neg_is_pos: 0.0033| lr: 0.0| temp: 1.96592 | loss: 1.12949| constrast_loss: 4.45131| div_loss: 0.6664| %_mask_idx: 0.34618| ppl: 213.50241| %_neg_is_pos: 0.00319| lr: 0.0| temp: 1.96592 [2021-09-02 04:38:55,661] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 04:38:55,661] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.13459| constrast_loss: 4.47257| div_loss: 0.65785| %_mask_idx: 0.33067| ppl: 218.97678| %_neg_is_pos: 0.00315| lr: 0.0| temp: 1.9659 | loss: 1.1389| constrast_loss: 4.48979| div_loss: 0.65801| %_mask_idx: 0.36482| ppl: 218.87527| %_neg_is_pos: 0.00199| lr: 0.0| temp: 1.9659 | loss: 1.13476| constrast_loss: 4.47262| div_loss: 0.66434| %_mask_idx: 0.40069| ppl: 214.82416| %_neg_is_pos: 0.00292| lr: 0.0| temp: 1.96589 | loss: 1.1364| constrast_loss: 4.47896| div_loss: 0.66635| %_mask_idx: 0.33459| ppl: 213.53586| %_neg_is_pos: 0.00333| lr: 0.0| temp: 1.96589 | loss: 1.13784| constrast_loss: 4.48505| div_loss: 0.66289| %_mask_idx: 0.41087| ppl: 215.7489| %_neg_is_pos: 0.00246| lr: 0.0| temp: 1.96588 | loss: 1.13395| constrast_loss: 4.46985| div_loss: 0.65935| %_mask_idx: 0.33083| ppl: 218.01572| %_neg_is_pos: 0.00245| lr: 0.0| temp: 1.96588 | loss: 1.13486| constrast_loss: 4.47315| div_loss: 0.66273| %_mask_idx: 0.43437| ppl: 215.85182| %_neg_is_pos: 0.00205| lr: 0.0| temp: 1.96587 | loss: 1.13388| constrast_loss: 4.46772| div_loss: 0.67788| %_mask_idx: 0.38409| ppl: 206.15985| %_neg_is_pos: 0.0027| lr: 0.0| temp: 1.96587 | loss: 1.13141| constrast_loss: 4.45888| div_loss: 0.66751| %_mask_idx: 0.38127| ppl: 212.79211| %_neg_is_pos: 0.00251| lr: 0.0| temp: 1.96585| loss: 1.13006| constrast_loss: 4.45462| div_loss: 0.65624| %_mask_idx: 0.39787| ppl: 220.00615| %_neg_is_pos: 0.00391| lr: 0.0| temp: 1.96585 | loss: 1.1336| constrast_loss: 4.46796| div_loss: 0.66432| %_mask_idx: 0.37108| ppl: 214.83765| %_neg_is_pos: 0.00193| lr: 0.0| temp: 1.96584 | loss: 1.13736| constrast_loss: 4.48308| div_loss: 0.66372| %_mask_idx: 0.39348| ppl: 215.22174| %_neg_is_pos: 0.0029| lr: 0.0| temp: 1.96584 | loss: 1.12366| constrast_loss: 4.427| div_loss: 0.67636| %_mask_idx: 0.41212| ppl: 207.12662| %_neg_is_pos: 0.00418| lr: 0.0| temp: 1.96584 | loss: 1.13728| constrast_loss: 4.48219| div_loss: 0.66915| %_mask_idx: 0.40006| ppl: 211.74506| %_neg_is_pos: 0.00293| lr: 0.0| temp: 1.96584 | loss: 1.12703| constrast_loss: 4.44107| div_loss: 0.67059| %_mask_idx: 0.38957| ppl: 210.82214| %_neg_is_pos: 0.00223| lr: 0.0| temp: 1.96583 | loss: 1.12958| constrast_loss: 4.45139| div_loss: 0.66944| %_mask_idx: 0.3761| ppl: 211.55994| %_neg_is_pos: 0.00332| lr: 0.0| temp: 1.96583 | loss: 1.13348| constrast_loss: 4.46793| div_loss: 0.65986| %_mask_idx: 0.34727| ppl: 217.69168| %_neg_is_pos: 0.00295| lr: 0.0| temp: 1.96581 | loss: 1.13552| constrast_loss: 4.47555| div_loss: 0.66545| %_mask_idx: 0.41651| ppl: 214.11481| %_neg_is_pos: 0.00307| lr: 0.0| temp: 1.96581 | loss: 1.13076| constrast_loss: 4.45605| div_loss: 0.67001| %_mask_idx: 0.39928| ppl: 211.19659| %_neg_is_pos: 0.00341| lr: 0.0| temp: 1.9658 | loss: 1.14011| constrast_loss: 4.49487| div_loss: 0.65546| %_mask_idx: 0.39756| ppl: 220.50714| %_neg_is_pos: 0.00476| lr: 0.0| temp: 1.9658 | loss: 1.13734| constrast_loss: 4.48298| div_loss: 0.66373| %_mask_idx: 0.35511| ppl: 215.21109| %_neg_is_pos: 0.00495| lr: 0.0| temp: 1.96579 | loss: 1.13837| constrast_loss: 4.48749| div_loss: 0.66005| %_mask_idx: 0.4162| ppl: 217.5659| %_neg_is_pos: 0.00273| lr: 0.0| temp: 1.96579 | loss: 1.12803| constrast_loss: 4.44567| div_loss: 0.66465| %_mask_idx: 0.41181| ppl: 214.62386| %_neg_is_pos: 0.00341| lr: 0.0| temp: 1.96578 | loss: 1.13702| constrast_loss: 4.48239| div_loss: 0.65685| %_mask_idx: 0.42497| ppl: 219.61789| %_neg_is_pos: 0.00405| lr: 0.0| temp: 1.96578 | loss: 1.14014| constrast_loss: 4.49452| div_loss: 0.66031| %_mask_idx: 0.3963| ppl: 217.40248| %_neg_is_pos: 0.00398| lr: 0.0| temp: 1.96576 | loss: 1.14202| constrast_loss: 4.50232| div_loss: 0.65768| %_mask_idx: 0.41526| ppl: 219.08453| %_neg_is_pos: 0.00209| lr: 0.0| temp: 1.96576 | loss: 1.13063| constrast_loss: 4.45577| div_loss: 0.66752| %_mask_idx: 0.38518| ppl: 212.78751| %_neg_is_pos: 0.00215| lr: 0.0| temp: 1.96575 | loss: 1.12927| constrast_loss: 4.45072| div_loss: 0.66384| %_mask_idx: 0.36685| ppl: 215.14328| %_neg_is_pos: 0.00363| lr: 0.0| temp: 1.96575 | loss: 1.1388| constrast_loss: 4.48891| div_loss: 0.66304| %_mask_idx: 0.36122| ppl: 215.65477| %_neg_is_pos: 0.00316| lr: 0.0| temp: 1.96573 | loss: 1.12788| constrast_loss: 4.44325| div_loss: 0.68286| %_mask_idx: 0.35573| ppl: 202.96834| %_neg_is_pos: 0.00557| lr: 0.0| temp: 1.96573 | loss: 1.13958| constrast_loss: 4.4915| div_loss: 0.66805| %_mask_idx: 0.40288| ppl: 212.45119| %_neg_is_pos: 0.00254| lr: 0.0| temp: 1.96572 | loss: 1.14768| constrast_loss: 4.52469| div_loss: 0.66048| %_mask_idx: 0.39082| ppl: 217.29575| %_neg_is_pos: 0.00136| lr: 0.0| temp: 1.96572 | loss: 1.13313| constrast_loss: 4.46484| div_loss: 0.67692| %_mask_idx: 0.388| ppl: 206.77408| %_neg_is_pos: 0.00514| lr: 0.0| temp: 1.96571 | loss: 1.12958| constrast_loss: 4.4513| div_loss: 0.67032| %_mask_idx: 0.40163| ppl: 210.99365| %_neg_is_pos: 0.00181| lr: 0.0| temp: 1.96571 | loss: 1.13088| constrast_loss: 4.45493| div_loss: 0.68566| %_mask_idx: 0.33412| ppl: 201.17503| %_neg_is_pos: 0.00306| lr: 0.0| temp: 1.9657 | loss: 1.12996| constrast_loss: 4.45169| div_loss: 0.68138| %_mask_idx: 0.42199| ppl: 203.91858| %_neg_is_pos: 0.00236| lr: 0.0| temp: 1.9657 | loss: 1.12513| constrast_loss: 4.43125| div_loss: 0.69268| %_mask_idx: 0.34414| ppl: 196.68538| %_neg_is_pos: 0.00677| lr: 0.0| temp: 1.96568 | loss: 1.14008| constrast_loss: 4.49358| div_loss: 0.66762| %_mask_idx: 0.41322| ppl: 212.72021| %_neg_is_pos: 0.00299| lr: 0.0| temp: 1.96568 | loss: 1.13595| constrast_loss: 4.47788| div_loss: 0.65898| %_mask_idx: 0.43296| ppl: 218.25047| %_neg_is_pos: 0.00222| lr: 0.0| temp: 1.96567 | loss: 1.13135| constrast_loss: 4.45779| div_loss: 0.67609| %_mask_idx: 0.35103| ppl: 207.30002| %_neg_is_pos: 0.00363| lr: 0.0| temp: 1.96567 | loss: 1.12496| constrast_loss: 4.4315| div_loss: 0.68342| %_mask_idx: 0.3042| ppl: 202.61115| %_neg_is_pos: 0.00474| lr: 0.0| temp: 1.96566 | loss: 1.12638| constrast_loss: 4.43815| div_loss: 0.67389| %_mask_idx: 0.39912| ppl: 208.70847| %_neg_is_pos: 0.00475| lr: 0.0| temp: 1.96566 | loss: 1.1318| constrast_loss: 4.45998| div_loss: 0.67226| %_mask_idx: 0.37077| ppl: 209.75673| %_neg_is_pos: 0.00299| lr: 0.0| temp: 1.96565 | loss: 1.12162| constrast_loss: 4.41669| div_loss: 0.69797| %_mask_idx: 0.40147| ppl: 193.29678| %_neg_is_pos: 0.0057| lr: 0.0| temp: 1.96565 | loss: 1.13068| constrast_loss: 4.45524| div_loss: 0.67472| %_mask_idx: 0.44079| ppl: 208.18109| %_neg_is_pos: 0.00235| lr: 0.0| temp: 1.96563 | loss: 1.13725| constrast_loss: 4.48192| div_loss: 0.67081| %_mask_idx: 0.41197| ppl: 210.67909| %_neg_is_pos: 0.00201| lr: 0.0| temp: 1.96563 | loss: 1.13383| constrast_loss: 4.4689| div_loss: 0.66423| %_mask_idx: 0.34712| ppl: 214.89328| %_neg_is_pos: 0.0042| lr: 0.0| temp: 1.96562 | loss: 1.13741| constrast_loss: 4.48297| div_loss: 0.66663| %_mask_idx: 0.38753| ppl: 213.35422| %_neg_is_pos: 0.0039| lr: 0.0| temp: 1.96562 | loss: 1.12619| constrast_loss: 4.43801| div_loss: 0.66757| %_mask_idx: 0.38518| ppl: 212.75276| %_neg_is_pos: 0.00569| lr: 0.0| temp: 1.96561 | loss: 1.13285| constrast_loss: 4.46463| div_loss: 0.66772| %_mask_idx: 0.31657| ppl: 212.66037| %_neg_is_pos: 0.00424| lr: 0.0| temp: 1.96561 | loss: 1.13369| constrast_loss: 4.46788| div_loss: 0.66879| %_mask_idx: 0.37547| ppl: 211.97694| %_neg_is_pos: 0.00369| lr: 0.0| temp: 1.9656 | loss: 1.13684| constrast_loss: 4.48045| div_loss: 0.66898| %_mask_idx: 0.3974| ppl: 211.85468| %_neg_is_pos: 0.00204| lr: 0.0| temp: 1.9656 | loss: 1.13018| constrast_loss: 4.45411| div_loss: 0.66593| %_mask_idx: 0.39693| ppl: 213.80533| %_neg_is_pos: 0.00542| lr: 0.0| temp: 1.96558 | loss: 1.13355| constrast_loss: 4.46728| div_loss: 0.66931| %_mask_idx: 0.36497| ppl: 211.64377| %_neg_is_pos: 0.003| lr: 0.0| temp: 1.96558 | loss: 1.1268| constrast_loss: 4.43939| div_loss: 0.67811| %_mask_idx: 0.37343| ppl: 206.00911| %_neg_is_pos: 0.00297| lr: 0.0| temp: 1.96557 | loss: 1.12405| constrast_loss: 4.4291| div_loss: 0.67116| %_mask_idx: 0.39129| ppl: 210.45651| %_neg_is_pos: 0.00346| lr: 0.0| temp: 1.96557 | loss: 1.12346| constrast_loss: 4.42681| div_loss: 0.6703| %_mask_idx: 0.33866| ppl: 211.00943| %_neg_is_pos: 0.00474| lr: 0.0| temp: 1.96555 | loss: 1.12589| constrast_loss: 4.43568| div_loss: 0.67887| %_mask_idx: 0.40038| ppl: 205.52083| %_neg_is_pos: 0.00253| lr: 0.0| temp: 1.96555 | loss: 1.14313| constrast_loss: 4.50653| div_loss: 0.65997| %_mask_idx: 0.45034| ppl: 217.62024| %_neg_is_pos: 0.00287| lr: 0.0| temp: 1.96554 | loss: 1.1274| constrast_loss: 4.44178| div_loss: 0.67833| %_mask_idx: 0.35135| ppl: 205.86974| %_neg_is_pos: 0.00288| lr: 0.0| temp: 1.96554 | loss: 1.13758| constrast_loss: 4.48351| div_loss: 0.66822| %_mask_idx: 0.38565| ppl: 212.3403| %_neg_is_pos: 0.00216| lr: 0.0| temp: 1.96553 | loss: 1.13243| constrast_loss: 4.46185| div_loss: 0.67858| %_mask_idx: 0.39254| ppl: 205.7081| %_neg_is_pos: 0.00325| lr: 0.0| temp: 1.96553 | loss: 1.12699| constrast_loss: 4.43981| div_loss: 0.68142| %_mask_idx: 0.35652| ppl: 203.89224| %_neg_is_pos: 0.00257| lr: 0.0| temp: 1.96552 | loss: 1.13072| constrast_loss: 4.45499| div_loss: 0.67885| %_mask_idx: 0.37735| ppl: 205.53918| %_neg_is_pos: 0.00364| lr: 0.0| temp: 1.96552 | loss: 1.12851| constrast_loss: 4.44577| div_loss: 0.68274| %_mask_idx: 0.41933| ppl: 203.04716| %_neg_is_pos: 0.00442| lr: 0.0| temp: 1.9655 | loss: 1.1347| constrast_loss: 4.47185| div_loss: 0.66955| %_mask_idx: 0.38753| ppl: 211.48982| %_neg_is_pos: 0.00448| lr: 0.0| temp: 1.9655 | loss: 1.13929| constrast_loss: 4.49009| div_loss: 0.67077| %_mask_idx: 0.37907| ppl: 210.70499| %_neg_is_pos: 0.00382| lr: 0.0| temp: 1.96549 | loss: 1.14119| constrast_loss: 4.49698| div_loss: 0.67768| %_mask_idx: 0.4115| ppl: 206.28224| %_neg_is_pos: 0.00261| lr: 0.0| temp: 1.96549 | loss: 1.13486| constrast_loss: 4.47214| div_loss: 0.67296| %_mask_idx: 0.39724| ppl: 209.30743| %_neg_is_pos: 0.00307| lr: 0.0| temp: 1.96548 | loss: 1.14145| constrast_loss: 4.49929| div_loss: 0.66508| %_mask_idx: 0.40069| ppl: 214.34586| %_neg_is_pos: 0.00276| lr: 0.0| temp: 1.96548 | loss: 1.13881| constrast_loss: 4.48836| div_loss: 0.66863| %_mask_idx: 0.38565| ppl: 212.07895| %_neg_is_pos: 0.00236| lr: 0.0| temp: 1.96547 | loss: 1.13323| constrast_loss: 4.46607| div_loss: 0.66837| %_mask_idx: 0.36607| ppl: 212.24454| %_neg_is_pos: 0.00342| lr: 0.0| temp: 1.96547 | loss: 1.13854| constrast_loss: 4.48624| div_loss: 0.67931| %_mask_idx: 0.3739| ppl: 205.24033| %_neg_is_pos: 0.00396| lr: 0.0| temp: 1.96545 | loss: 1.13619| constrast_loss: 4.47832| div_loss: 0.66448| %_mask_idx: 0.39505| ppl: 214.73065| %_neg_is_pos: 0.00224| lr: 0.0| temp: 1.96545 | loss: 1.13611| constrast_loss: 4.47721| div_loss: 0.6725| %_mask_idx: 0.38174| ppl: 209.59708| %_neg_is_pos: 0.00342| lr: 0.0| temp: 1.96544 | loss: 1.13844| constrast_loss: 4.48671| div_loss: 0.67066| %_mask_idx: 0.42544| ppl: 210.77715| %_neg_is_pos: 0.00168| lr: 0.0| temp: 1.96544 | loss: 1.12521| constrast_loss: 4.43291| div_loss: 0.67921| %_mask_idx: 0.35793| ppl: 205.30788| %_neg_is_pos: 0.00281| lr: 0.0| temp: 1.96543 | loss: 1.12179| constrast_loss: 4.41913| div_loss: 0.68024| %_mask_idx: 0.36451| ppl: 204.64371| %_neg_is_pos: 0.00612| lr: 0.0| temp: 1.96543 | loss: 1.12134| constrast_loss: 4.41758| div_loss: 0.67779| %_mask_idx: 0.37923| ppl: 206.21585| %_neg_is_pos: 0.00417| lr: 0.0| temp: 1.96542 | loss: 1.13496| constrast_loss: 4.47217| div_loss: 0.67671| %_mask_idx: 0.40821| ppl: 206.90842| %_neg_is_pos: 0.00381| lr: 0.0| temp: 1.96542 | loss: 1.12593| constrast_loss: 4.43602| div_loss: 0.67704| %_mask_idx: 0.39192| ppl: 206.69753| %_neg_is_pos: 0.00402| lr: 0.0| temp: 1.9654 | loss: 1.14201| constrast_loss: 4.50117| div_loss: 0.66878| %_mask_idx: 0.41761| ppl: 211.9812| %_neg_is_pos: 0.00185| lr: 0.0| temp: 1.9654 | loss: 1.13575| constrast_loss: 4.47641| div_loss: 0.66572| %_mask_idx: 0.37845| ppl: 213.93924| %_neg_is_pos: 0.0028| lr: 0.0| temp: 1.96539 | loss: 1.13315| constrast_loss: 4.46568| div_loss: 0.66919| %_mask_idx: 0.40977| ppl: 211.71829| %_neg_is_pos: 0.00328| lr: 0.0| temp: 1.96539 | loss: 1.13727| constrast_loss: 4.48227| div_loss: 0.66814| %_mask_idx: 0.38737| ppl: 212.39163| %_neg_is_pos: 0.00283| lr: 0.0| temp: 1.96537 | loss: 1.13749| constrast_loss: 4.48179| div_loss: 0.68169| %_mask_idx: 0.42685| ppl: 203.71576| %_neg_is_pos: 0.00202| lr: 0.0| temp: 1.96537 | loss: 1.13347| constrast_loss: 4.46813| div_loss: 0.65746| %_mask_idx: 0.42121| ppl: 219.2256| %_neg_is_pos: 0.00242| lr: 0.0| temp: 1.96536 | loss: 1.13373| constrast_loss: 4.46729| div_loss: 0.67618| %_mask_idx: 0.43092| ppl: 207.24379| %_neg_is_pos: 0.00216| lr: 0.0| temp: 1.96536 | loss: 1.13274| constrast_loss: 4.46414| div_loss: 0.66837| %_mask_idx: 0.35667| ppl: 212.24548| %_neg_is_pos: 0.00406| lr: 0.0| temp: 1.96535 | loss: 1.14196| constrast_loss: 4.50036| div_loss: 0.67465| %_mask_idx: 0.41463| ppl: 208.22446| %_neg_is_pos: 0.00304| lr: 0.0| temp: 1.96535 | loss: 1.13181| constrast_loss: 4.45873| div_loss: 0.68513| %_mask_idx: 0.39756| ppl: 201.51462| %_neg_is_pos: 0.00342| lr: 0.0| temp: 1.96534 | loss: 1.13344| constrast_loss: 4.46688| div_loss: 0.66866| %_mask_idx: 0.41996| ppl: 212.05812| %_neg_is_pos: 0.00145| lr: 0.0| temp: 1.96534 | loss: 1.13464| constrast_loss: 4.47258| div_loss: 0.65994| %_mask_idx: 0.39834| ppl: 217.63734| %_neg_is_pos: 0.00181| lr: 0.0| temp: 1.96532 | loss: 1.12656| constrast_loss: 4.43918| div_loss: 0.67052| %_mask_idx: 0.36497| ppl: 210.86945| %_neg_is_pos: 0.00305| lr: 0.0| temp: 1.96532 | loss: 1.13823| constrast_loss: 4.48597| div_loss: 0.66938| %_mask_idx: 0.34696| ppl: 211.59607| %_neg_is_pos: 0.00306| lr: 0.0| temp: 1.96531 | loss: 1.13938| constrast_loss: 4.49012| div_loss: 0.67383| %_mask_idx: 0.36858| ppl: 208.74913| %_neg_is_pos: 0.00242| lr: 0.0| temp: 1.96531 | loss: 1.13139| constrast_loss: 4.45835| div_loss: 0.67194| %_mask_idx: 0.38831| ppl: 209.9577| %_neg_is_pos: 0.00408| lr: 0.0| temp: 1.9653 | loss: 1.14429| constrast_loss: 4.51063| div_loss: 0.66549| %_mask_idx: 0.39223| ppl: 214.08945| %_neg_is_pos: 0.00175| lr: 0.0| temp: 1.9653 | loss: 1.12854| constrast_loss: 4.44617| div_loss: 0.67977| %_mask_idx: 0.38612| ppl: 204.94551| %_neg_is_pos: 0.00313| lr: 0.0| temp: 1.96529 | loss: 1.12613| constrast_loss: 4.43604| div_loss: 0.68464| %_mask_idx: 0.35902| ppl: 201.82831| %_neg_is_pos: 0.00542| lr: 0.0| temp: 1.96529 | loss: 1.12686| constrast_loss: 4.43928| div_loss: 0.68163| %_mask_idx: 0.34915| ppl: 203.75453| %_neg_is_pos: 0.00318| lr: 0.0| temp: 1.96527 | loss: 1.13248| constrast_loss: 4.46252| div_loss: 0.67417| %_mask_idx: 0.39709| ppl: 208.52931| %_neg_is_pos: 0.0032| lr: 0.0| temp: 1.96527 | loss: 1.12792| constrast_loss: 4.44329| div_loss: 0.68381| %_mask_idx: 0.39458| ppl: 202.35857| %_neg_is_pos: 0.00316| lr: 0.0| temp: 1.96527 | loss: 1.12243| constrast_loss: 4.42173| div_loss: 0.67975| %_mask_idx: 0.40883| ppl: 204.96219| %_neg_is_pos: 0.00275| lr: 0.0| temp: 1.96527 | loss: 1.13677| constrast_loss: 4.47986| div_loss: 0.67224| %_mask_idx: 0.3656| ppl: 209.76657| %_neg_is_pos: 0.00296| lr: 0.0| temp: 1.96526 | loss: 1.14111| constrast_loss: 4.49834| div_loss: 0.66111| %_mask_idx: 0.40586| ppl: 216.88814| %_neg_is_pos: 0.0021| lr: 0.0| temp: 1.96526 | loss: 1.123| constrast_loss: 4.42401| div_loss: 0.67969| %_mask_idx: 0.40836| ppl: 204.99582| %_neg_is_pos: 0.00299| lr: 0.0| temp: 1.96525 | loss: 1.12136| constrast_loss: 4.41792| div_loss: 0.67509| %_mask_idx: 0.39019| ppl: 207.94348| %_neg_is_pos: 0.00319| lr: 0.0| temp: 1.96525 | loss: 1.13428| constrast_loss: 4.47043| div_loss: 0.66687| %_mask_idx: 0.39301| ppl: 213.20277| %_neg_is_pos: 0.00389| lr: 0.0| temp: 1.96523 | loss: 1.11159| constrast_loss: 4.3773| div_loss: 0.69077| %_mask_idx: 0.35652| ppl: 197.90594| %_neg_is_pos: 0.00601| lr: 0.0| temp: 1.96523 | loss: 1.13193| constrast_loss: 4.45961| div_loss: 0.68119| %_mask_idx: 0.40883| ppl: 204.04068| %_neg_is_pos: 0.00437| lr: 0.0| temp: 1.96522 | loss: 1.13286| constrast_loss: 4.46423| div_loss: 0.6723| %_mask_idx: 0.38784| ppl: 209.72794| %_neg_is_pos: 0.00339| lr: 0.0| temp: 1.96522 [2021-09-02 04:48:09,663] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 04:48:09,663] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.12642| constrast_loss: 4.43779| div_loss: 0.6789| %_mask_idx: 0.36873| ppl: 205.50517| %_neg_is_pos: 0.00282| lr: 0.0| temp: 1.9652 | loss: 1.14066| constrast_loss: 4.49594| div_loss: 0.66683| %_mask_idx: 0.35981| ppl: 213.23041| %_neg_is_pos: 0.0035| lr: 0.0| temp: 1.9652 | loss: 1.13331| constrast_loss: 4.46441| div_loss: 0.68808| %_mask_idx: 0.3739| ppl: 199.62848| %_neg_is_pos: 0.00269| lr: 0.0| temp: 1.96519 | loss: 1.13149| constrast_loss: 4.45868| div_loss: 0.6727| %_mask_idx: 0.38095| ppl: 209.47256| %_neg_is_pos: 0.00229| lr: 0.0| temp: 1.96519 | loss: 1.13439| constrast_loss: 4.47117| div_loss: 0.66397| %_mask_idx: 0.40555| ppl: 215.05629| %_neg_is_pos: 0.00243| lr: 0.0| temp: 1.96518 | loss: 1.13322| constrast_loss: 4.46477| div_loss: 0.68125| %_mask_idx: 0.40899| ppl: 203.99792| %_neg_is_pos: 0.00238| lr: 0.0| temp: 1.96518 | loss: 1.13603| constrast_loss: 4.4769| div_loss: 0.67203| %_mask_idx: 0.39944| ppl: 209.90234| %_neg_is_pos: 0.00222| lr: 0.0| temp: 1.96517 | loss: 1.12703| constrast_loss: 4.44048| div_loss: 0.67651| %_mask_idx: 0.39364| ppl: 207.03291| %_neg_is_pos: 0.00239| lr: 0.0| temp: 1.96517 | loss: 1.13135| constrast_loss: 4.45798| div_loss: 0.6741| %_mask_idx: 0.36137| ppl: 208.57745| %_neg_is_pos: 0.00246| lr: 0.0| temp: 1.96515 | loss: 1.14053| constrast_loss: 4.495| div_loss: 0.67117| %_mask_idx: 0.3667| ppl: 210.45251| %_neg_is_pos: 0.00252| lr: 0.0| temp: 1.96515 | loss: 1.13044| constrast_loss: 4.4541| div_loss: 0.67666| %_mask_idx: 0.4386| ppl: 206.93576| %_neg_is_pos: 0.00227| lr: 0.0| temp: 1.96514 | loss: 1.14004| constrast_loss: 4.49358| div_loss: 0.66593| %_mask_idx: 0.4151| ppl: 213.80702| %_neg_is_pos: 0.0019| lr: 0.0| temp: 1.96514 | loss: 1.13215| constrast_loss: 4.46186| div_loss: 0.66721| %_mask_idx: 0.40116| ppl: 212.98315| %_neg_is_pos: 0.00209| lr: 0.0| temp: 1.96513 | loss: 1.1355| constrast_loss: 4.47539| div_loss: 0.66614| %_mask_idx: 0.36889| ppl: 213.67041| %_neg_is_pos: 0.00233| lr: 0.0| temp: 1.96513 | loss: 1.13547| constrast_loss: 4.47589| div_loss: 0.65979| %_mask_idx: 0.37406| ppl: 217.73221| %_neg_is_pos: 0.00146| lr: 0.0| temp: 1.96512 | loss: 1.13244| constrast_loss: 4.46213| div_loss: 0.67632| %_mask_idx: 0.40304| ppl: 207.15607| %_neg_is_pos: 0.00383| lr: 0.0| temp: 1.96512 | loss: 1.13995| constrast_loss: 4.49293| div_loss: 0.66849| %_mask_idx: 0.39239| ppl: 212.16562| %_neg_is_pos: 0.00218| lr: 0.0| temp: 1.9651 | loss: 1.14011| constrast_loss: 4.49297| div_loss: 0.67474| %_mask_idx: 0.35511| ppl: 208.16861| %_neg_is_pos: 0.00228| lr: 0.0| temp: 1.9651 | loss: 1.14184| constrast_loss: 4.50107| div_loss: 0.66299| %_mask_idx: 0.38315| ppl: 215.68347| %_neg_is_pos: 0.00368| lr: 0.0| temp: 1.96509 | loss: 1.13445| constrast_loss: 4.46904| div_loss: 0.68747| %_mask_idx: 0.41103| ppl: 200.02148| %_neg_is_pos: 0.00297| lr: 0.0| temp: 1.96509 | loss: 1.12845| constrast_loss: 4.44552| div_loss: 0.683| %_mask_idx: 0.30232| ppl: 202.87906| %_neg_is_pos: 0.00489| lr: 0.0| temp: 1.96508 | loss: 1.11888| constrast_loss: 4.40533| div_loss: 0.70189| %_mask_idx: 0.36419| ppl: 190.79135| %_neg_is_pos: 0.00748| lr: 0.0| temp: 1.96508 | loss: 1.13112| constrast_loss: 4.45679| div_loss: 0.67701| %_mask_idx: 0.4032| ppl: 206.71234| %_neg_is_pos: 0.00361| lr: 0.0| temp: 1.96507 | loss: 1.12974| constrast_loss: 4.4514| div_loss: 0.67544| %_mask_idx: 0.38127| ppl: 207.72021| %_neg_is_pos: 0.00431| lr: 0.0| temp: 1.96507 | loss: 1.12519| constrast_loss: 4.43151| div_loss: 0.69232| %_mask_idx: 0.35761| ppl: 196.91504| %_neg_is_pos: 0.00552| lr: 0.0| temp: 1.96505 | loss: 1.11582| constrast_loss: 4.39278| div_loss: 0.70499| %_mask_idx: 0.38377| ppl: 188.80713| %_neg_is_pos: 0.00474| lr: 0.0| temp: 1.96505 | loss: 1.1378| constrast_loss: 4.48446| div_loss: 0.66754| %_mask_idx: 0.38722| ppl: 212.77231| %_neg_is_pos: 0.0025| lr: 0.0| temp: 1.96504 | loss: 1.13837| constrast_loss: 4.48576| div_loss: 0.67722| %_mask_idx: 0.39395| ppl: 206.57841| %_neg_is_pos: 0.00226| lr: 0.0| temp: 1.96504 | loss: 1.13162| constrast_loss: 4.45817| div_loss: 0.68292| %_mask_idx: 0.37625| ppl: 202.92941| %_neg_is_pos: 0.00415| lr: 0.0| temp: 1.96502 | loss: 1.13068| constrast_loss: 4.45543| div_loss: 0.67296| %_mask_idx: 0.39928| ppl: 209.30872| %_neg_is_pos: 0.00269| lr: 0.0| temp: 1.96502 | loss: 1.14525| constrast_loss: 4.51386| div_loss: 0.67141| %_mask_idx: 0.3833| ppl: 210.29646| %_neg_is_pos: 0.00173| lr: 0.0| temp: 1.96501 | loss: 1.1331| constrast_loss: 4.4637| div_loss: 0.68719| %_mask_idx: 0.3797| ppl: 200.19841| %_neg_is_pos: 0.00454| lr: 0.0| temp: 1.96501 | loss: 1.12894| constrast_loss: 4.44814| div_loss: 0.67602| %_mask_idx: 0.39568| ppl: 207.34836| %_neg_is_pos: 0.0043| lr: 0.0| temp: 1.965 | loss: 1.12935| constrast_loss: 4.4504| div_loss: 0.66979| %_mask_idx: 0.4422| ppl: 211.33559| %_neg_is_pos: 0.00325| lr: 0.0| temp: 1.965 | loss: 1.12788| constrast_loss: 4.44334| div_loss: 0.68179| %_mask_idx: 0.39771| ppl: 203.65329| %_neg_is_pos: 0.00344| lr: 0.0| temp: 1.96499 | loss: 1.13649| constrast_loss: 4.4795| div_loss: 0.66472| %_mask_idx: 0.36231| ppl: 214.57648| %_neg_is_pos: 0.00245| lr: 0.0| temp: 1.96499 | loss: 1.13719| constrast_loss: 4.48253| div_loss: 0.66229| %_mask_idx: 0.42105| ppl: 216.13248| %_neg_is_pos: 0.00269| lr: 0.0| temp: 1.96497 | loss: 1.13558| constrast_loss: 4.47642| div_loss: 0.65907| %_mask_idx: 0.38095| ppl: 218.19397| %_neg_is_pos: 0.00175| lr: 0.0| temp: 1.96497 | loss: 1.12845| constrast_loss: 4.44564| div_loss: 0.6816| %_mask_idx: 0.41432| ppl: 203.77708| %_neg_is_pos: 0.00392| lr: 0.0| temp: 1.96496 | loss: 1.13266| constrast_loss: 4.46374| div_loss: 0.6689| %_mask_idx: 0.40367| ppl: 211.90695| %_neg_is_pos: 0.00453| lr: 0.0| temp: 1.96496 | loss: 1.13377| constrast_loss: 4.46852| div_loss: 0.66565| %_mask_idx: 0.33819| ppl: 213.9812| %_neg_is_pos: 0.00315| lr: 0.0| temp: 1.96495 | loss: 1.1261| constrast_loss: 4.43674| div_loss: 0.67647| %_mask_idx: 0.42074| ppl: 207.05789| %_neg_is_pos: 0.00406| lr: 0.0| temp: 1.96495 | loss: 1.1297| constrast_loss: 4.45075| div_loss: 0.68068| %_mask_idx: 0.37453| ppl: 204.36652| %_neg_is_pos: 0.00347| lr: 0.0| temp: 1.96494 | loss: 1.15039| constrast_loss: 4.5347| div_loss: 0.66862| %_mask_idx: 0.3786| ppl: 212.08029| %_neg_is_pos: 0.00342| lr: 0.0| temp: 1.96494 | loss: 1.13493| constrast_loss: 4.4721| div_loss: 0.67629| %_mask_idx: 0.40586| ppl: 207.17545| %_neg_is_pos: 0.00328| lr: 0.0| temp: 1.96492 | loss: 1.13261| constrast_loss: 4.46323| div_loss: 0.67194| %_mask_idx: 0.37296| ppl: 209.96159| %_neg_is_pos: 0.00632| lr: 0.0| temp: 1.96492 | loss: 1.13495| constrast_loss: 4.47278| div_loss: 0.67006| %_mask_idx: 0.43625| ppl: 211.15894| %_neg_is_pos: 0.00193| lr: 0.0| temp: 1.96491 | loss: 1.13807| constrast_loss: 4.48569| div_loss: 0.66594| %_mask_idx: 0.41024| ppl: 213.80133| %_neg_is_pos: 0.00527| lr: 0.0| temp: 1.96491 | loss: 1.13713| constrast_loss: 4.48143| div_loss: 0.67104| %_mask_idx: 0.36826| ppl: 210.53737| %_neg_is_pos: 0.00378| lr: 0.0| temp: 1.9649 | loss: 1.12857| constrast_loss: 4.44726| div_loss: 0.67037| %_mask_idx: 0.41933| ppl: 210.96387| %_neg_is_pos: 0.00263| lr: 0.0| temp: 1.9649 | loss: 1.1308| constrast_loss: 4.45578| div_loss: 0.67401| %_mask_idx: 0.41776| ppl: 208.63148| %_neg_is_pos: 0.00311| lr: 0.0| temp: 1.96489 | loss: 1.12246| constrast_loss: 4.4225| div_loss: 0.67349| %_mask_idx: 0.36388| ppl: 208.96396| %_neg_is_pos: 0.00226| lr: 0.0| temp: 1.96489 | loss: 1.13401| constrast_loss: 4.46871| div_loss: 0.67323| %_mask_idx: 0.43249| ppl: 209.132| %_neg_is_pos: 0.0042| lr: 0.0| temp: 1.96487 | loss: 1.1415| constrast_loss: 4.49869| div_loss: 0.6731| %_mask_idx: 0.40461| ppl: 209.21344| %_neg_is_pos: 0.00255| lr: 0.0| temp: 1.96487 | loss: 1.13179| constrast_loss: 4.45968| div_loss: 0.67459| %_mask_idx: 0.35652| ppl: 208.25974| %_neg_is_pos: 0.00331| lr: 0.0| temp: 1.96486 | loss: 1.13841| constrast_loss: 4.48785| div_loss: 0.65784| %_mask_idx: 0.40147| ppl: 218.98288| %_neg_is_pos: 0.00426| lr: 0.0| temp: 1.96486 | loss: 1.13043| constrast_loss: 4.45447| div_loss: 0.67237| %_mask_idx: 0.38612| ppl: 209.68253| %_neg_is_pos: 0.00688| lr: 0.0| temp: 1.96484 | loss: 1.11646| constrast_loss: 4.39641| div_loss: 0.69416| %_mask_idx: 0.38988| ppl: 195.73972| %_neg_is_pos: 0.00565| lr: 0.0| temp: 1.96484 | loss: 1.13304| constrast_loss: 4.46451| div_loss: 0.67644| %_mask_idx: 0.34649| ppl: 207.07648| %_neg_is_pos: 0.00262| lr: 0.0| temp: 1.96483 | loss: 1.13103| constrast_loss: 4.4568| div_loss: 0.6731| %_mask_idx: 0.39724| ppl: 209.21423| %_neg_is_pos: 0.0014| lr: 0.0| temp: 1.96483 | loss: 1.13085| constrast_loss: 4.45588| div_loss: 0.67511| %_mask_idx: 0.40226| ppl: 207.92654| %_neg_is_pos: 0.00358| lr: 0.0| temp: 1.96482 | loss: 1.12628| constrast_loss: 4.43714| div_loss: 0.6798| %_mask_idx: 0.39223| ppl: 204.92523| %_neg_is_pos: 0.0046| lr: 0.0| temp: 1.96482 | loss: 1.12174| constrast_loss: 4.41953| div_loss: 0.67447| %_mask_idx: 0.38831| ppl: 208.34103| %_neg_is_pos: 0.00433| lr: 0.0| temp: 1.96481 | loss: 1.1253| constrast_loss: 4.43354| div_loss: 0.67674| %_mask_idx: 0.39442| ppl: 206.88931| %_neg_is_pos: 0.00282| lr: 0.0| temp: 1.96481 | loss: 1.13088| constrast_loss: 4.45607| div_loss: 0.67471| %_mask_idx: 0.3515| ppl: 208.18301| %_neg_is_pos: 0.0037| lr: 0.0| temp: 1.96479 | loss: 1.13978| constrast_loss: 4.49136| div_loss: 0.67739| %_mask_idx: 0.45755| ppl: 206.47319| %_neg_is_pos: 0.00208| lr: 0.0| temp: 1.96479 | loss: 1.13555| constrast_loss: 4.4757| div_loss: 0.665| %_mask_idx: 0.39192| ppl: 214.40295| %_neg_is_pos: 0.00218| lr: 0.0| temp: 1.96478 | loss: 1.12499| constrast_loss: 4.43186| div_loss: 0.68108| %_mask_idx: 0.34994| ppl: 204.11028| %_neg_is_pos: 0.00352| lr: 0.0| temp: 1.96478 | loss: 1.14016| constrast_loss: 4.49281| div_loss: 0.67839| %_mask_idx: 0.39254| ppl: 205.83228| %_neg_is_pos: 0.0023| lr: 0.0| temp: 1.96477 | loss: 1.12005| constrast_loss: 4.41176| div_loss: 0.68457| %_mask_idx: 0.38456| ppl: 201.87801| %_neg_is_pos: 0.00267| lr: 0.0| temp: 1.96477 | loss: 1.12975| constrast_loss: 4.45055| div_loss: 0.68438| %_mask_idx: 0.42356| ppl: 201.99539| %_neg_is_pos: 0.00373| lr: 0.0| temp: 1.96476 | loss: 1.1338| constrast_loss: 4.46811| div_loss: 0.67072| %_mask_idx: 0.33051| ppl: 210.73904| %_neg_is_pos: 0.00234| lr: 0.0| temp: 1.96476 | loss: 1.13111| constrast_loss: 4.45653| div_loss: 0.6793| %_mask_idx: 0.35276| ppl: 205.24757| %_neg_is_pos: 0.00389| lr: 0.0| temp: 1.96474 | loss: 1.13052| constrast_loss: 4.4543| div_loss: 0.6778| %_mask_idx: 0.35746| ppl: 206.21033| %_neg_is_pos: 0.00421| lr: 0.0| temp: 1.96474 | loss: 1.13324| constrast_loss: 4.46586| div_loss: 0.67121| %_mask_idx: 0.38941| ppl: 210.42546| %_neg_is_pos: 0.00296| lr: 0.0| temp: 1.96473 | loss: 1.12856| constrast_loss: 4.44722| div_loss: 0.67042| %_mask_idx: 0.40288| ppl: 210.92975| %_neg_is_pos: 0.00376| lr: 0.0| temp: 1.96473 | loss: 1.13695| constrast_loss: 4.48012| div_loss: 0.6769| %_mask_idx: 0.4234| ppl: 206.78546| %_neg_is_pos: 0.0027| lr: 0.0| temp: 1.96472 | loss: 1.1487| constrast_loss: 4.52842| div_loss: 0.66367| %_mask_idx: 0.37531| ppl: 215.25143| %_neg_is_pos: 0.00192| lr: 0.0| temp: 1.96472 | loss: 1.13282| constrast_loss: 4.46314| div_loss: 0.6813| %_mask_idx: 0.38988| ppl: 203.9696| %_neg_is_pos: 0.00334| lr: 0.0| temp: 1.96471 | loss: 1.14093| constrast_loss: 4.49684| div_loss: 0.66871| %_mask_idx: 0.39756| ppl: 212.02872| %_neg_is_pos: 0.00305| lr: 0.0| temp: 1.96471 | loss: 1.13568| constrast_loss: 4.47596| div_loss: 0.66747| %_mask_idx: 0.41792| ppl: 212.82019| %_neg_is_pos: 0.00308| lr: 0.0| temp: 1.9647 | loss: 1.12728| constrast_loss: 4.44119| div_loss: 0.67952| %_mask_idx: 0.3833| ppl: 205.11011| %_neg_is_pos: 0.00648| lr: 0.0| temp: 1.9647 | loss: 1.13018| constrast_loss: 4.45277| div_loss: 0.67965| %_mask_idx: 0.41635| ppl: 205.02444| %_neg_is_pos: 0.00341| lr: 0.0| temp: 1.96469 | loss: 1.12411| constrast_loss: 4.42834| div_loss: 0.68094| %_mask_idx: 0.30717| ppl: 204.2009| %_neg_is_pos: 0.00671| lr: 0.0| temp: 1.96469 | loss: 1.11589| constrast_loss: 4.3958| div_loss: 0.67781| %_mask_idx: 0.37343| ppl: 206.20157| %_neg_is_pos: 0.00433| lr: 0.0| temp: 1.96467 | loss: 1.12781| constrast_loss: 4.44289| div_loss: 0.68338| %_mask_idx: 0.40586| ppl: 202.63971| %_neg_is_pos: 0.00359| lr: 0.0| temp: 1.96467 | loss: 1.12692| constrast_loss: 4.44098| div_loss: 0.66701| %_mask_idx: 0.34978| ppl: 213.11229| %_neg_is_pos: 0.00401| lr: 0.0| temp: 1.96466 | loss: 1.13289| constrast_loss: 4.46465| div_loss: 0.66921| %_mask_idx: 0.40523| ppl: 211.70612| %_neg_is_pos: 0.00445| lr: 0.0| temp: 1.96466 | loss: 1.13717| constrast_loss: 4.48156| div_loss: 0.67133| %_mask_idx: 0.41541| ppl: 210.34799| %_neg_is_pos: 0.0038| lr: 0.0| temp: 1.96465 | loss: 1.12588| constrast_loss: 4.43521| div_loss: 0.68302| %_mask_idx: 0.32989| ppl: 202.86421| %_neg_is_pos: 0.00507| lr: 0.0| temp: 1.96465 | loss: 1.13255| constrast_loss: 4.4624| div_loss: 0.67786| %_mask_idx: 0.42309| ppl: 206.17139| %_neg_is_pos: 0.00273| lr: 0.0| temp: 1.96464 | loss: 1.13721| constrast_loss: 4.48211| div_loss: 0.66722| %_mask_idx: 0.37328| ppl: 212.98135| %_neg_is_pos: 0.00469| lr: 0.0| temp: 1.96464 | loss: 1.13829| constrast_loss: 4.48492| div_loss: 0.68257| %_mask_idx: 0.40523| ppl: 203.15247| %_neg_is_pos: 0.00247| lr: 0.0| temp: 1.96462 | loss: 1.135| constrast_loss: 4.47233| div_loss: 0.67687| %_mask_idx: 0.35511| ppl: 206.80495| %_neg_is_pos: 0.00723| lr: 0.0| temp: 1.96462 | loss: 1.12292| constrast_loss: 4.42308| div_loss: 0.68615| %_mask_idx: 0.31281| ppl: 200.86636| %_neg_is_pos: 0.00745| lr: 0.0| temp: 1.96461 | loss: 1.12535| constrast_loss: 4.43298| div_loss: 0.68421| %_mask_idx: 0.37766| ppl: 202.10809| %_neg_is_pos: 0.00349| lr: 0.0| temp: 1.96461 | loss: 1.12719| constrast_loss: 4.43998| div_loss: 0.68794| %_mask_idx: 0.42575| ppl: 199.71854| %_neg_is_pos: 0.00398| lr: 0.0| temp: 1.9646 | loss: 1.13488| constrast_loss: 4.47266| div_loss: 0.66872| %_mask_idx: 0.35229| ppl: 212.01668| %_neg_is_pos: 0.0045| lr: 0.0| temp: 1.9646 | loss: 1.13118| constrast_loss: 4.45728| div_loss: 0.67425| %_mask_idx: 0.37453| ppl: 208.4812| %_neg_is_pos: 0.00335| lr: 0.0| temp: 1.96459 | loss: 1.12831| constrast_loss: 4.44627| div_loss: 0.6697| %_mask_idx: 0.40398| ppl: 211.38885| %_neg_is_pos: 0.00185| lr: 0.0| temp: 1.96459 | loss: 1.13684| constrast_loss: 4.48024| div_loss: 0.67141| %_mask_idx: 0.39145| ppl: 210.29669| %_neg_is_pos: 0.00313| lr: 0.0| temp: 1.96457 | loss: 1.12346| constrast_loss: 4.42533| div_loss: 0.68497| %_mask_idx: 0.41714| ppl: 201.61925| %_neg_is_pos: 0.00305| lr: 0.0| temp: 1.96457 | loss: 1.12856| constrast_loss: 4.44737| div_loss: 0.66852| %_mask_idx: 0.44204| ppl: 212.14572| %_neg_is_pos: 0.00311| lr: 0.0| temp: 1.96456 | loss: 1.12803| constrast_loss: 4.44602| div_loss: 0.66108| %_mask_idx: 0.36012| ppl: 216.91129| %_neg_is_pos: 0.00262| lr: 0.0| temp: 1.96456 | loss: 1.12615| constrast_loss: 4.43648| div_loss: 0.68144| %_mask_idx: 0.38142| ppl: 203.88078| %_neg_is_pos: 0.00549| lr: 0.0| temp: 1.96455 | loss: 1.13062| constrast_loss: 4.45563| div_loss: 0.66841| %_mask_idx: 0.43656| ppl: 212.21909| %_neg_is_pos: 0.00286| lr: 0.0| temp: 1.96455 | loss: 1.1262| constrast_loss: 4.43811| div_loss: 0.66694| %_mask_idx: 0.39348| ppl: 213.15686| %_neg_is_pos: 0.0051| lr: 0.0| temp: 1.96454 | loss: 1.12372| constrast_loss: 4.42527| div_loss: 0.69624| %_mask_idx: 0.39599| ppl: 194.40616| %_neg_is_pos: 0.00448| lr: 0.0| temp: 1.96454 | loss: 1.13641| constrast_loss: 4.47912| div_loss: 0.66531| %_mask_idx: 0.42121| ppl: 214.20451| %_neg_is_pos: 0.00221| lr: 0.0| temp: 1.96452 | loss: 1.1367| constrast_loss: 4.48049| div_loss: 0.66314| %_mask_idx: 0.43578| ppl: 215.59163| %_neg_is_pos: 0.00173| lr: 0.0| temp: 1.96452 | loss: 1.13413| constrast_loss: 4.46892| div_loss: 0.67592| %_mask_idx: 0.42434| ppl: 207.41142| %_neg_is_pos: 0.00171| lr: 0.0| temp: 1.96451 | loss: 1.13881| constrast_loss: 4.48726| div_loss: 0.67972| %_mask_idx: 0.34649| ppl: 204.97821| %_neg_is_pos: 0.00434| lr: 0.0| temp: 1.96451 [2021-09-02 04:57:23,354] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 04:57:23,354] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.12057| constrast_loss: 4.41406| div_loss: 0.6821| %_mask_idx: 0.35166| ppl: 203.45514| %_neg_is_pos: 0.00416| lr: 0.0| temp: 1.96449 | loss: 1.13336| constrast_loss: 4.46496| div_loss: 0.68484| %_mask_idx: 0.38377| ppl: 201.70071| %_neg_is_pos: 0.00333| lr: 0.0| temp: 1.96449 | loss: 1.1205| constrast_loss: 4.41378| div_loss: 0.68206| %_mask_idx: 0.3974| ppl: 203.4805| %_neg_is_pos: 0.00526| lr: 0.0| temp: 1.96448 | loss: 1.13642| constrast_loss: 4.47905| div_loss: 0.66628| %_mask_idx: 0.43249| ppl: 213.57893| %_neg_is_pos: 0.00137| lr: 0.0| temp: 1.96448 | loss: 1.14351| constrast_loss: 4.50722| div_loss: 0.66815| %_mask_idx: 0.38111| ppl: 212.38597| %_neg_is_pos: 0.00192| lr: 0.0| temp: 1.96447 | loss: 1.13484| constrast_loss: 4.47184| div_loss: 0.67518| %_mask_idx: 0.41964| ppl: 207.88362| %_neg_is_pos: 0.00232| lr: 0.0| temp: 1.96447 | loss: 1.13892| constrast_loss: 4.4883| div_loss: 0.67393| %_mask_idx: 0.38878| ppl: 208.68604| %_neg_is_pos: 0.00256| lr: 0.0| temp: 1.96446 | loss: 1.13042| constrast_loss: 4.45253| div_loss: 0.69135| %_mask_idx: 0.34414| ppl: 197.53871| %_neg_is_pos: 0.00452| lr: 0.0| temp: 1.96446 | loss: 1.1389| constrast_loss: 4.48895| div_loss: 0.66666| %_mask_idx: 0.40946| ppl: 213.33472| %_neg_is_pos: 0.00207| lr: 0.0| temp: 1.96444 | loss: 1.13045| constrast_loss: 4.45416| div_loss: 0.67652| %_mask_idx: 0.3313| ppl: 207.0271| %_neg_is_pos: 0.0036| lr: 0.0| temp: 1.96444 | loss: 1.12615| constrast_loss: 4.43765| div_loss: 0.66956| %_mask_idx: 0.38878| ppl: 211.47961| %_neg_is_pos: 0.00223| lr: 0.0| temp: 1.96443 | loss: 1.13022| constrast_loss: 4.45254| div_loss: 0.68333| %_mask_idx: 0.34132| ppl: 202.6676| %_neg_is_pos: 0.00322| lr: 0.0| temp: 1.96443 | loss: 1.14445| constrast_loss: 4.51043| div_loss: 0.67392| %_mask_idx: 0.36889| ppl: 208.68826| %_neg_is_pos: 0.00256| lr: 0.0| temp: 1.96442 | loss: 1.1297| constrast_loss: 4.45172| div_loss: 0.67059| %_mask_idx: 0.38894| ppl: 210.82159| %_neg_is_pos: 0.00273| lr: 0.0| temp: 1.96442 | loss: 1.13082| constrast_loss: 4.45508| div_loss: 0.68214| %_mask_idx: 0.37704| ppl: 203.429| %_neg_is_pos: 0.00428| lr: 0.0| temp: 1.96441 | loss: 1.12668| constrast_loss: 4.43888| div_loss: 0.67858| %_mask_idx: 0.3667| ppl: 205.71039| %_neg_is_pos: 0.00226| lr: 0.0| temp: 1.96441 | loss: 1.13409| constrast_loss: 4.46971| div_loss: 0.66652| %_mask_idx: 0.43656| ppl: 213.42911| %_neg_is_pos: 0.00151| lr: 0.0| temp: 1.96439 | loss: 1.11694| constrast_loss: 4.39941| div_loss: 0.68339| %_mask_idx: 0.33286| ppl: 202.63147| %_neg_is_pos: 0.00447| lr: 0.0| temp: 1.96439 | loss: 1.13062| constrast_loss: 4.45523| div_loss: 0.67255| %_mask_idx: 0.41338| ppl: 209.56949| %_neg_is_pos: 0.00135| lr: 0.0| temp: 1.96438 | loss: 1.13048| constrast_loss: 4.45418| div_loss: 0.67757| %_mask_idx: 0.32112| ppl: 206.35669| %_neg_is_pos: 0.00207| lr: 0.0| temp: 1.96438 | loss: 1.13863| constrast_loss: 4.48734| div_loss: 0.67183| %_mask_idx: 0.43625| ppl: 210.0273| %_neg_is_pos: 0.00136| lr: 0.0| temp: 1.96437 | loss: 1.13078| constrast_loss: 4.45489| div_loss: 0.68232| %_mask_idx: 0.39004| ppl: 203.31697| %_neg_is_pos: 0.00161| lr: 0.0| temp: 1.96437 | loss: 1.14097| constrast_loss: 4.49749| div_loss: 0.66396| %_mask_idx: 0.40038| ppl: 215.06725| %_neg_is_pos: 0.00293| lr: 0.0| temp: 1.96436 | loss: 1.12251| constrast_loss: 4.42228| div_loss: 0.67744| %_mask_idx: 0.40304| ppl: 206.4375| %_neg_is_pos: 0.00177| lr: 0.0| temp: 1.96436 | loss: 1.13352| constrast_loss: 4.46687| div_loss: 0.67222| %_mask_idx: 0.38283| ppl: 209.77701| %_neg_is_pos: 0.00219| lr: 0.0| temp: 1.96434 | loss: 1.13144| constrast_loss: 4.45829| div_loss: 0.67467| %_mask_idx: 0.40038| ppl: 208.21324| %_neg_is_pos: 0.00141| lr: 0.0| temp: 1.96434 | loss: 1.13464| constrast_loss: 4.47139| div_loss: 0.67168| %_mask_idx: 0.39192| ppl: 210.12637| %_neg_is_pos: 0.00154| lr: 0.0| temp: 1.96433 | loss: 1.13189| constrast_loss: 4.46025| div_loss: 0.67301| %_mask_idx: 0.41385| ppl: 209.27307| %_neg_is_pos: 0.00251| lr: 0.0| temp: 1.96433 | loss: 1.13919| constrast_loss: 4.48922| div_loss: 0.6755| %_mask_idx: 0.36216| ppl: 207.67862| %_neg_is_pos: 0.00293| lr: 0.0| temp: 1.96431 | loss: 1.12457| constrast_loss: 4.42809| div_loss: 0.70199| %_mask_idx: 0.35088| ppl: 190.72803| %_neg_is_pos: 0.00258| lr: 0.0| temp: 1.96431 | loss: 1.12748| constrast_loss: 4.44232| div_loss: 0.67591| %_mask_idx: 0.38643| ppl: 207.41524| %_neg_is_pos: 0.00226| lr: 0.0| temp: 1.9643 | loss: 1.12227| constrast_loss: 4.42048| div_loss: 0.68586| %_mask_idx: 0.3891| ppl: 201.05205| %_neg_is_pos: 0.0039| lr: 0.0| temp: 1.9643 | loss: 1.13271| constrast_loss: 4.46244| div_loss: 0.68393| %_mask_idx: 0.42841| ppl: 202.2854| %_neg_is_pos: 0.00241| lr: 0.0| temp: 1.96429 | loss: 1.12893| constrast_loss: 4.44729| div_loss: 0.6842| %_mask_idx: 0.40147| ppl: 202.1138| %_neg_is_pos: 0.00282| lr: 0.0| temp: 1.96429 | loss: 1.13128| constrast_loss: 4.45748| div_loss: 0.67657| %_mask_idx: 0.40962| ppl: 206.99664| %_neg_is_pos: 0.00391| lr: 0.0| temp: 1.96428 | loss: 1.13101| constrast_loss: 4.45654| div_loss: 0.67491| %_mask_idx: 0.41103| ppl: 208.05936| %_neg_is_pos: 0.00287| lr: 0.0| temp: 1.96428 | loss: 1.14222| constrast_loss: 4.50235| div_loss: 0.66543| %_mask_idx: 0.41604| ppl: 214.12448| %_neg_is_pos: 0.00217| lr: 0.0| temp: 1.96426 | loss: 1.13137| constrast_loss: 4.45801| div_loss: 0.67483| %_mask_idx: 0.42935| ppl: 208.1077| %_neg_is_pos: 0.00218| lr: 0.0| temp: 1.96426 | loss: 1.12282| constrast_loss: 4.42374| div_loss: 0.67522| %_mask_idx: 0.40069| ppl: 207.85681| %_neg_is_pos: 0.00384| lr: 0.0| temp: 1.96425 | loss: 1.1354| constrast_loss: 4.47442| div_loss: 0.6717| %_mask_idx: 0.38957| ppl: 210.11147| %_neg_is_pos: 0.00177| lr: 0.0| temp: 1.96425 | loss: 1.12753| constrast_loss: 4.44166| div_loss: 0.68467| %_mask_idx: 0.39599| ppl: 201.81354| %_neg_is_pos: 0.00269| lr: 0.0| temp: 1.96424 | loss: 1.14054| constrast_loss: 4.49543| div_loss: 0.66716| %_mask_idx: 0.43327| ppl: 213.01514| %_neg_is_pos: 0.00284| lr: 0.0| temp: 1.96424 | loss: 1.13399| constrast_loss: 4.4684| div_loss: 0.6754| %_mask_idx: 0.33897| ppl: 207.74269| %_neg_is_pos: 0.00427| lr: 0.0| temp: 1.96423 | loss: 1.13124| constrast_loss: 4.45806| div_loss: 0.66889| %_mask_idx: 0.41949| ppl: 211.90848| %_neg_is_pos: 0.00305| lr: 0.0| temp: 1.96423 | loss: 1.13593| constrast_loss: 4.47497| div_loss: 0.68771| %_mask_idx: 0.37093| ppl: 199.86487| %_neg_is_pos: 0.00392| lr: 0.0| temp: 1.96421 | loss: 1.12762| constrast_loss: 4.4414| div_loss: 0.6907| %_mask_idx: 0.34571| ppl: 197.95496| %_neg_is_pos: 0.00514| lr: 0.0| temp: 1.96421 | loss: 1.13099| constrast_loss: 4.45643| div_loss: 0.67515| %_mask_idx: 0.35338| ppl: 207.90305| %_neg_is_pos: 0.00295| lr: 0.0| temp: 1.9642 | loss: 1.13569| constrast_loss: 4.47534| div_loss: 0.67413| %_mask_idx: 0.41087| ppl: 208.55865| %_neg_is_pos: 0.00184| lr: 0.0| temp: 1.9642 | loss: 1.14421| constrast_loss: 4.51043| div_loss: 0.66414| %_mask_idx: 0.3985| ppl: 214.94965| %_neg_is_pos: 0.00119| lr: 0.0| temp: 1.96419 | loss: 1.12873| constrast_loss: 4.44542| div_loss: 0.69509| %_mask_idx: 0.401| ppl: 195.14076| %_neg_is_pos: 0.00229| lr: 0.0| temp: 1.96419 | loss: 1.1318| constrast_loss: 4.45848| div_loss: 0.68721| %_mask_idx: 0.39771| ppl: 200.18645| %_neg_is_pos: 0.00266| lr: 0.0| temp: 1.96418 | loss: 1.14068| constrast_loss: 4.49635| div_loss: 0.6638| %_mask_idx: 0.40085| ppl: 215.16579| %_neg_is_pos: 0.00117| lr: 0.0| temp: 1.96418 | loss: 1.12755| constrast_loss: 4.4408| div_loss: 0.69399| %_mask_idx: 0.33615| ppl: 195.8439| %_neg_is_pos: 0.00399| lr: 0.0| temp: 1.96416 | loss: 1.1253| constrast_loss: 4.4329| div_loss: 0.68296| %_mask_idx: 0.37281| ppl: 202.90298| %_neg_is_pos: 0.00432| lr: 0.0| temp: 1.96416 | loss: 1.12079| constrast_loss: 4.4141| div_loss: 0.6906| %_mask_idx: 0.37108| ppl: 198.01479| %_neg_is_pos: 0.00313| lr: 0.0| temp: 1.96415 | loss: 1.12808| constrast_loss: 4.44402| div_loss: 0.68299| %_mask_idx: 0.41776| ppl: 202.88672| %_neg_is_pos: 0.00242| lr: 0.0| temp: 1.96415 | loss: 1.1203| constrast_loss: 4.41294| div_loss: 0.68263| %_mask_idx: 0.3349| ppl: 203.11929| %_neg_is_pos: 0.00227| lr: 0.0| temp: 1.96414 | loss: 1.12803| constrast_loss: 4.44468| div_loss: 0.67453| %_mask_idx: 0.40335| ppl: 208.3031| %_neg_is_pos: 0.00279| lr: 0.0| temp: 1.96414 | loss: 1.12523| constrast_loss: 4.43271| div_loss: 0.68217| %_mask_idx: 0.44126| ppl: 203.41107| %_neg_is_pos: 0.00249| lr: 0.0| temp: 1.96413 | loss: 1.13247| constrast_loss: 4.4623| div_loss: 0.67577| %_mask_idx: 0.38925| ppl: 207.50861| %_neg_is_pos: 0.0036| lr: 0.0| temp: 1.96413 | loss: 1.13551| constrast_loss: 4.47528| div_loss: 0.66761| %_mask_idx: 0.38424| ppl: 212.7323| %_neg_is_pos: 0.0036| lr: 0.0| temp: 1.96412 | loss: 1.13493| constrast_loss: 4.47237| div_loss: 0.67361| %_mask_idx: 0.34038| ppl: 208.88893| %_neg_is_pos: 0.00396| lr: 0.0| temp: 1.96412 | loss: 1.13229| constrast_loss: 4.46208| div_loss: 0.67085| %_mask_idx: 0.37108| ppl: 210.65508| %_neg_is_pos: 0.00215| lr: 0.0| temp: 1.96411 | loss: 1.12371| constrast_loss: 4.42641| div_loss: 0.6844| %_mask_idx: 0.39239| ppl: 201.98322| %_neg_is_pos: 0.00343| lr: 0.0| temp: 1.96411 | loss: 1.12898| constrast_loss: 4.44752| div_loss: 0.68409| %_mask_idx: 0.38111| ppl: 202.17944| %_neg_is_pos: 0.00264| lr: 0.0| temp: 1.96409 | loss: 1.14003| constrast_loss: 4.49342| div_loss: 0.6672| %_mask_idx: 0.33459| ppl: 212.99304| %_neg_is_pos: 0.00195| lr: 0.0| temp: 1.96409 | loss: 1.14238| constrast_loss: 4.50315| div_loss: 0.66386| %_mask_idx: 0.4422| ppl: 215.13165| %_neg_is_pos: 0.0024| lr: 0.0| temp: 1.96408 | loss: 1.13994| constrast_loss: 4.49285| div_loss: 0.66926| %_mask_idx: 0.38048| ppl: 211.67139| %_neg_is_pos: 0.00221| lr: 0.0| temp: 1.96408 | loss: 1.12401| constrast_loss: 4.4279| div_loss: 0.68149| %_mask_idx: 0.38753| ppl: 203.84343| %_neg_is_pos: 0.00358| lr: 0.0| temp: 1.96407 | loss: 1.13697| constrast_loss: 4.48053| div_loss: 0.67354| %_mask_idx: 0.37218| ppl: 208.93402| %_neg_is_pos: 0.00402| lr: 0.0| temp: 1.96407 | loss: 1.13453| constrast_loss: 4.47096| div_loss: 0.67179| %_mask_idx: 0.39944| ppl: 210.05481| %_neg_is_pos: 0.00168| lr: 0.0| temp: 1.96406 | loss: 1.1294| constrast_loss: 4.44905| div_loss: 0.68566| %_mask_idx: 0.39724| ppl: 201.17819| %_neg_is_pos: 0.00237| lr: 0.0| temp: 1.96406 | loss: 1.13593| constrast_loss: 4.47678| div_loss: 0.66936| %_mask_idx: 0.41306| ppl: 211.60768| %_neg_is_pos: 0.00236| lr: 0.0| temp: 1.96404 | loss: 1.13273| constrast_loss: 4.4638| div_loss: 0.67118| %_mask_idx: 0.40398| ppl: 210.4473| %_neg_is_pos: 0.00215| lr: 0.0| temp: 1.96404 | loss: 1.13893| constrast_loss: 4.4891| div_loss: 0.66613| %_mask_idx: 0.43546| ppl: 213.67715| %_neg_is_pos: 0.00152| lr: 0.0| temp: 1.96403 | loss: 1.14661| constrast_loss: 4.51979| div_loss: 0.66661| %_mask_idx: 0.38816| ppl: 213.37064| %_neg_is_pos: 0.00147| lr: 0.0| temp: 1.96403 | loss: 1.13252| constrast_loss: 4.46322| div_loss: 0.66876| %_mask_idx: 0.38534| ppl: 211.99599| %_neg_is_pos: 0.00256| lr: 0.0| temp: 1.96402 | loss: 1.13921| constrast_loss: 4.48877| div_loss: 0.68086| %_mask_idx: 0.37187| ppl: 204.24762| %_neg_is_pos: 0.00313| lr: 0.0| temp: 1.96402 | loss: 1.12495| constrast_loss: 4.43139| div_loss: 0.68401| %_mask_idx: 0.36638| ppl: 202.2326| %_neg_is_pos: 0.00416| lr: 0.0| temp: 1.96401 | loss: 1.13353| constrast_loss: 4.46623| div_loss: 0.67903| %_mask_idx: 0.37923| ppl: 205.41772| %_neg_is_pos: 0.00321| lr: 0.0| temp: 1.96401 | loss: 1.13016| constrast_loss: 4.45296| div_loss: 0.67701| %_mask_idx: 0.4162| ppl: 206.71529| %_neg_is_pos: 0.0021| lr: 0.0| temp: 1.96399 | loss: 1.12518| constrast_loss: 4.43278| div_loss: 0.67949| %_mask_idx: 0.36732| ppl: 205.12408| %_neg_is_pos: 0.00384| lr: 0.0| temp: 1.96399 | loss: 1.12695| constrast_loss: 4.43942| div_loss: 0.68365| %_mask_idx: 0.34398| ppl: 202.46344| %_neg_is_pos: 0.00503| lr: 0.0| temp: 1.96398 | loss: 1.12426| constrast_loss: 4.42856| div_loss: 0.68472| %_mask_idx: 0.3891| ppl: 201.78018| %_neg_is_pos: 0.00447| lr: 0.0| temp: 1.96398 | loss: 1.13363| constrast_loss: 4.46808| div_loss: 0.66445| %_mask_idx: 0.39301| ppl: 214.75417| %_neg_is_pos: 0.00271| lr: 0.0| temp: 1.96396 | loss: 1.133| constrast_loss: 4.46474| div_loss: 0.6726| %_mask_idx: 0.42982| ppl: 209.53558| %_neg_is_pos: 0.00235| lr: 0.0| temp: 1.96396 | loss: 1.13587| constrast_loss: 4.47674| div_loss: 0.66725| %_mask_idx: 0.41463| ppl: 212.95752| %_neg_is_pos: 0.00233| lr: 0.0| temp: 1.96395 | loss: 1.12668| constrast_loss: 4.43873| div_loss: 0.67996| %_mask_idx: 0.43468| ppl: 204.82657| %_neg_is_pos: 0.00273| lr: 0.0| temp: 1.96395 | loss: 1.14005| constrast_loss: 4.49315| div_loss: 0.67036| %_mask_idx: 0.36983| ppl: 210.96939| %_neg_is_pos: 0.00196| lr: 0.0| temp: 1.96394 | loss: 1.13629| constrast_loss: 4.47814| div_loss: 0.67036| %_mask_idx: 0.38972| ppl: 210.96661| %_neg_is_pos: 0.00152| lr: 0.0| temp: 1.96394 | loss: 1.11614| constrast_loss: 4.39398| div_loss: 0.70583| %_mask_idx: 0.36497| ppl: 188.26736| %_neg_is_pos: 0.00315| lr: 0.0| temp: 1.96393 | loss: 1.13587| constrast_loss: 4.47514| div_loss: 0.68354| %_mask_idx: 0.36513| ppl: 202.53185| %_neg_is_pos: 0.0023| lr: 0.0| temp: 1.96393 | loss: 1.13335| constrast_loss: 4.46576| div_loss: 0.67628| %_mask_idx: 0.35949| ppl: 207.17963| %_neg_is_pos: 0.00489| lr: 0.0| temp: 1.96391 | loss: 1.1306| constrast_loss: 4.45449| div_loss: 0.6791| %_mask_idx: 0.35495| ppl: 205.37518| %_neg_is_pos: 0.00322| lr: 0.0| temp: 1.96391 | loss: 1.13567| constrast_loss: 4.47486| div_loss: 0.67809| %_mask_idx: 0.39912| ppl: 206.02441| %_neg_is_pos: 0.00219| lr: 0.0| temp: 1.9639 | loss: 1.13788| constrast_loss: 4.48537| div_loss: 0.66159| %_mask_idx: 0.36278| ppl: 216.57996| %_neg_is_pos: 0.00223| lr: 0.0| temp: 1.9639 | loss: 1.13991| constrast_loss: 4.49345| div_loss: 0.66179| %_mask_idx: 0.33443| ppl: 216.45311| %_neg_is_pos: 0.00362| lr: 0.0| temp: 1.96389 | loss: 1.13016| constrast_loss: 4.45323| div_loss: 0.674| %_mask_idx: 0.34586| ppl: 208.63684| %_neg_is_pos: 0.00233| lr: 0.0| temp: 1.96389 | loss: 1.13511| constrast_loss: 4.47272| div_loss: 0.67738| %_mask_idx: 0.38048| ppl: 206.47585| %_neg_is_pos: 0.00319| lr: 0.0| temp: 1.96388 | loss: 1.12508| constrast_loss: 4.43271| div_loss: 0.67609| %_mask_idx: 0.41463| ppl: 207.3053| %_neg_is_pos: 0.00251| lr: 0.0| temp: 1.96388 | loss: 1.1405| constrast_loss: 4.49493| div_loss: 0.67089| %_mask_idx: 0.37845| ppl: 210.62787| %_neg_is_pos: 0.00154| lr: 0.0| temp: 1.96386 | loss: 1.13658| constrast_loss: 4.47859| div_loss: 0.67714| %_mask_idx: 0.35448| ppl: 206.62729| %_neg_is_pos: 0.00289| lr: 0.0| temp: 1.96386 | loss: 1.12678| constrast_loss: 4.44014| div_loss: 0.66964| %_mask_idx: 0.38064| ppl: 211.43207| %_neg_is_pos: 0.00257| lr: 0.0| temp: 1.96385 | loss: 1.13058| constrast_loss: 4.45421| div_loss: 0.68128| %_mask_idx: 0.36701| ppl: 203.97874| %_neg_is_pos: 0.00297| lr: 0.0| temp: 1.96385 | loss: 1.1387| constrast_loss: 4.48672| div_loss: 0.6808| %_mask_idx: 0.4104| ppl: 204.28978| %_neg_is_pos: 0.00315| lr: 0.0| temp: 1.96384 | loss: 1.1364| constrast_loss: 4.47907| div_loss: 0.66514| %_mask_idx: 0.45081| ppl: 214.30978| %_neg_is_pos: 0.00132| lr: 0.0| temp: 1.96384 | loss: 1.12419| constrast_loss: 4.42833| div_loss: 0.68424| %_mask_idx: 0.37813| ppl: 202.08499| %_neg_is_pos: 0.00341| lr: 0.0| temp: 1.96383 | loss: 1.13637| constrast_loss: 4.47862| div_loss: 0.66841| %_mask_idx: 0.42105| ppl: 212.21519| %_neg_is_pos: 0.00245| lr: 0.0| temp: 1.96383 | loss: 1.12446| constrast_loss: 4.43054| div_loss: 0.6732| %_mask_idx: 0.32315| ppl: 209.15225| %_neg_is_pos: 0.00306| lr: 0.0| temp: 1.96381 | loss: 1.13921| constrast_loss: 4.48814| div_loss: 0.68691| %_mask_idx: 0.43249| ppl: 200.37921| %_neg_is_pos: 0.00187| lr: 0.0| temp: 1.96381 | loss: 1.12404| constrast_loss: 4.42814| div_loss: 0.67999| %_mask_idx: 0.41823| ppl: 204.80898| %_neg_is_pos: 0.00324| lr: 0.0| temp: 1.9638 | loss: 1.14463| constrast_loss: 4.51125| div_loss: 0.67252| %_mask_idx: 0.39834| ppl: 209.58783| %_neg_is_pos: 0.00216| lr: 0.0| temp: 1.9638 [2021-09-02 05:06:36,238] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 05:06:36,238] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.13407| constrast_loss: 4.46768| div_loss: 0.68604| %_mask_idx: 0.39113| ppl: 200.93402| %_neg_is_pos: 0.00304| lr: 0.0| temp: 1.96378 | loss: 1.1342| constrast_loss: 4.46839| div_loss: 0.6841| %_mask_idx: 0.33568| ppl: 202.17688| %_neg_is_pos: 0.00292| lr: 0.0| temp: 1.96378 | loss: 1.12936| constrast_loss: 4.44987| div_loss: 0.67588| %_mask_idx: 0.36482| ppl: 207.4343| %_neg_is_pos: 0.00473| lr: 0.0| temp: 1.96377 | loss: 1.12407| constrast_loss: 4.42702| div_loss: 0.69243| %_mask_idx: 0.39301| ppl: 196.84503| %_neg_is_pos: 0.00362| lr: 0.0| temp: 1.96377 | loss: 1.13202| constrast_loss: 4.46081| div_loss: 0.6729| %_mask_idx: 0.37312| ppl: 209.34163| %_neg_is_pos: 0.00225| lr: 0.0| temp: 1.96376 | loss: 1.12452| constrast_loss: 4.42936| div_loss: 0.68738| %_mask_idx: 0.39004| ppl: 200.07812| %_neg_is_pos: 0.00271| lr: 0.0| temp: 1.96376 | loss: 1.13856| constrast_loss: 4.4868| div_loss: 0.6743| %_mask_idx: 0.42998| ppl: 208.44745| %_neg_is_pos: 0.00214| lr: 0.0| temp: 1.96375 | loss: 1.12946| constrast_loss: 4.44953| div_loss: 0.68309| %_mask_idx: 0.36059| ppl: 202.82002| %_neg_is_pos: 0.00382| lr: 0.0| temp: 1.96375 | loss: 1.12395| constrast_loss: 4.4284| div_loss: 0.67394| %_mask_idx: 0.42372| ppl: 208.67592| %_neg_is_pos: 0.00209| lr: 0.0| temp: 1.96373| loss: 1.1311| constrast_loss: 4.45777| div_loss: 0.66617| %_mask_idx: 0.38456| ppl: 213.65204| %_neg_is_pos: 0.00259| lr: 0.0| temp: 1.96373 | loss: 1.12447| constrast_loss: 4.43048| div_loss: 0.67423| %_mask_idx: 0.36435| ppl: 208.49577| %_neg_is_pos: 0.00383| lr: 0.0| temp: 1.96372 | loss: 1.12238| constrast_loss: 4.42247| div_loss: 0.67037| %_mask_idx: 0.32957| ppl: 210.96124| %_neg_is_pos: 0.00278| lr: 0.0| temp: 1.96372 | loss: 1.13817| constrast_loss: 4.48559| div_loss: 0.67113| %_mask_idx: 0.39928| ppl: 210.47379| %_neg_is_pos: 0.00168| lr: 0.0| temp: 1.96371 | loss: 1.13487| constrast_loss: 4.47047| div_loss: 0.68986| %_mask_idx: 0.42747| ppl: 198.48892| %_neg_is_pos: 0.00404| lr: 0.0| temp: 1.96371 | loss: 1.14119| constrast_loss: 4.49773| div_loss: 0.67037| %_mask_idx: 0.40085| ppl: 210.96439| %_neg_is_pos: 0.00169| lr: 0.0| temp: 1.9637 | loss: 1.14062| constrast_loss: 4.49511| div_loss: 0.67348| %_mask_idx: 0.39568| ppl: 208.97496| %_neg_is_pos: 0.00338| lr: 0.0| temp: 1.9637 | loss: 1.131| constrast_loss: 4.45694| div_loss: 0.67074| %_mask_idx: 0.40648| ppl: 210.72533| %_neg_is_pos: 0.00289| lr: 0.0| temp: 1.96368 | loss: 1.12943| constrast_loss: 4.45024| div_loss: 0.67489| %_mask_idx: 0.41541| ppl: 208.0712| %_neg_is_pos: 0.00197| lr: 0.0| temp: 1.96368 | loss: 1.13434| constrast_loss: 4.46959| div_loss: 0.67784| %_mask_idx: 0.44815| ppl: 206.18117| %_neg_is_pos: 0.00188| lr: 0.0| temp: 1.96367 | loss: 1.14326| constrast_loss: 4.50602| div_loss: 0.67029| %_mask_idx: 0.41134| ppl: 211.01324| %_neg_is_pos: 0.00206| lr: 0.0| temp: 1.96367 | loss: 1.13588| constrast_loss: 4.47669| div_loss: 0.66817| %_mask_idx: 0.40868| ppl: 212.37436| %_neg_is_pos: 0.00205| lr: 0.0| temp: 1.96366 | loss: 1.13927| constrast_loss: 4.49068| div_loss: 0.66387| %_mask_idx: 0.42622| ppl: 215.12338| %_neg_is_pos: 0.00185| lr: 0.0| temp: 1.96366 | loss: 1.13836| constrast_loss: 4.48573| div_loss: 0.67716| %_mask_idx: 0.3703| ppl: 206.61972| %_neg_is_pos: 0.00144| lr: 0.0| temp: 1.96365 | loss: 1.13558| constrast_loss: 4.47488| div_loss: 0.67446| %_mask_idx: 0.44987| ppl: 208.34564| %_neg_is_pos: 0.00163| lr: 0.0| temp: 1.96365 | loss: 1.13636| constrast_loss: 4.47861| div_loss: 0.66813| %_mask_idx: 0.40006| ppl: 212.39868| %_neg_is_pos: 0.00119| lr: 0.0| temp: 1.96363| loss: 1.12147| constrast_loss: 4.41764| div_loss: 0.68247| %_mask_idx: 0.36842| ppl: 203.22226| %_neg_is_pos: 0.0038| lr: 0.0| temp: 1.96363 | loss: 1.13109| constrast_loss: 4.45714| div_loss: 0.67204| %_mask_idx: 0.41792| ppl: 209.89197| %_neg_is_pos: 0.00329| lr: 0.0| temp: 1.96362 | loss: 1.14395| constrast_loss: 4.50891| div_loss: 0.66872| %_mask_idx: 0.35072| ppl: 212.02026| %_neg_is_pos: 0.00375| lr: 0.0| temp: 1.96362 | loss: 1.13013| constrast_loss: 4.45321| div_loss: 0.67307| %_mask_idx: 0.36623| ppl: 209.2355| %_neg_is_pos: 0.00322| lr: 0.0| temp: 1.96361 | loss: 1.14123| constrast_loss: 4.4978| div_loss: 0.67113| %_mask_idx: 0.40883| ppl: 210.47433| %_neg_is_pos: 0.00244| lr: 0.0| temp: 1.96361 | loss: 1.13434| constrast_loss: 4.46887| div_loss: 0.68485| %_mask_idx: 0.32143| ppl: 201.69389| %_neg_is_pos: 0.00384| lr: 0.0| temp: 1.9636 | loss: 1.13294| constrast_loss: 4.46444| div_loss: 0.67333| %_mask_idx: 0.40508| ppl: 209.07108| %_neg_is_pos: 0.0029| lr: 0.0| temp: 1.9636 | loss: 1.13615| constrast_loss: 4.47779| div_loss: 0.66798| %_mask_idx: 0.39333| ppl: 212.4939| %_neg_is_pos: 0.00248| lr: 0.0| temp: 1.96359 | loss: 1.13599| constrast_loss: 4.47572| div_loss: 0.68245| %_mask_idx: 0.36701| ppl: 203.23393| %_neg_is_pos: 0.00296| lr: 0.0| temp: 1.96359 | loss: 1.14327| constrast_loss: 4.50647| div_loss: 0.66613| %_mask_idx: 0.38988| ppl: 213.67883| %_neg_is_pos: 0.00174| lr: 0.0| temp: 1.96358 | loss: 1.14134| constrast_loss: 4.49884| div_loss: 0.66499| %_mask_idx: 0.38643| ppl: 214.40495| %_neg_is_pos: 0.00178| lr: 0.0| temp: 1.96358 | loss: 1.13805| constrast_loss: 4.48428| div_loss: 0.67928| %_mask_idx: 0.38158| ppl: 205.26062| %_neg_is_pos: 0.00166| lr: 0.0| temp: 1.96356 | loss: 1.11972| constrast_loss: 4.41115| div_loss: 0.67713| %_mask_idx: 0.35589| ppl: 206.63364| %_neg_is_pos: 0.00336| lr: 0.0| temp: 1.96356 | loss: 1.13481| constrast_loss: 4.47218| div_loss: 0.67056| %_mask_idx: 0.42199| ppl: 210.83957| %_neg_is_pos: 0.00194| lr: 0.0| temp: 1.96355 | loss: 1.1362| constrast_loss: 4.47813| div_loss: 0.66671| %_mask_idx: 0.37046| ppl: 213.30447| %_neg_is_pos: 0.00393| lr: 0.0| temp: 1.96355 | loss: 1.14111| constrast_loss: 4.49805| div_loss: 0.66372| %_mask_idx: 0.39834| ppl: 215.22119| %_neg_is_pos: 0.00276| lr: 0.0| temp: 1.96354 | loss: 1.12681| constrast_loss: 4.43983| div_loss: 0.67414| %_mask_idx: 0.36623| ppl: 208.5529| %_neg_is_pos: 0.00334| lr: 0.0| temp: 1.96354 | loss: 1.12811| constrast_loss: 4.4442| div_loss: 0.68257| %_mask_idx: 0.4093| ppl: 203.15811| %_neg_is_pos: 0.00367| lr: 0.0| temp: 1.96353 | loss: 1.12596| constrast_loss: 4.43597| div_loss: 0.67854| %_mask_idx: 0.41118| ppl: 205.73381| %_neg_is_pos: 0.00334| lr: 0.0| temp: 1.96353 | loss: 1.13574| constrast_loss: 4.47548| div_loss: 0.67477| %_mask_idx: 0.36216| ppl: 208.14836| %_neg_is_pos: 0.00277| lr: 0.0| temp: 1.96351 | loss: 1.13312| constrast_loss: 4.46611| div_loss: 0.66355| %_mask_idx: 0.38283| ppl: 215.32776| %_neg_is_pos: 0.00171| lr: 0.0| temp: 1.96351 | loss: 1.13339| constrast_loss: 4.46634| div_loss: 0.67229| %_mask_idx: 0.39583| ppl: 209.73721| %_neg_is_pos: 0.00224| lr: 0.0| temp: 1.9635 | loss: 1.12619| constrast_loss: 4.43692| div_loss: 0.67845| %_mask_idx: 0.4021| ppl: 205.79243| %_neg_is_pos: 0.00292| lr: 0.0| temp: 1.9635 | loss: 1.13008| constrast_loss: 4.45342| div_loss: 0.66912| %_mask_idx: 0.40789| ppl: 211.76413| %_neg_is_pos: 0.00177| lr: 0.0| temp: 1.96349 | loss: 1.13282| constrast_loss: 4.46372| div_loss: 0.67577| %_mask_idx: 0.39254| ppl: 207.50853| %_neg_is_pos: 0.00179| lr: 0.0| temp: 1.96349 | loss: 1.13964| constrast_loss: 4.49094| div_loss: 0.67633| %_mask_idx: 0.42732| ppl: 207.15115| %_neg_is_pos: 0.00194| lr: 0.0| temp: 1.96348 | loss: 1.14453| constrast_loss: 4.51188| div_loss: 0.6622| %_mask_idx: 0.41604| ppl: 216.19038| %_neg_is_pos: 0.00145| lr: 0.0| temp: 1.96348 | loss: 1.12702| constrast_loss: 4.43995| div_loss: 0.68124| %_mask_idx: 0.37782| ppl: 204.00723| %_neg_is_pos: 0.00281| lr: 0.0| temp: 1.96346 | loss: 1.13258| constrast_loss: 4.46245| div_loss: 0.67875| %_mask_idx: 0.4093| ppl: 205.59918| %_neg_is_pos: 0.00298| lr: 0.0| temp: 1.96346 | loss: 1.13361| constrast_loss: 4.46716| div_loss: 0.67276| %_mask_idx: 0.36231| ppl: 209.43259| %_neg_is_pos: 0.00373| lr: 0.0| temp: 1.96345 | loss: 1.1307| constrast_loss: 4.45511| div_loss: 0.67695| %_mask_idx: 0.38017| ppl: 206.75119| %_neg_is_pos: 0.00207| lr: 0.0| temp: 1.96345 | loss: 1.13128| constrast_loss: 4.45781| div_loss: 0.673| %_mask_idx: 0.37798| ppl: 209.28113| %_neg_is_pos: 0.00198| lr: 0.0| temp: 1.96343 | loss: 1.13986| constrast_loss: 4.49284| div_loss: 0.66596| %_mask_idx: 0.40727| ppl: 213.78583| %_neg_is_pos: 0.00219| lr: 0.0| temp: 1.96343 | loss: 1.12965| constrast_loss: 4.4507| div_loss: 0.679| %_mask_idx: 0.39239| ppl: 205.43857| %_neg_is_pos: 0.00272| lr: 0.0| temp: 1.96342 | loss: 1.13103| constrast_loss: 4.4574| div_loss: 0.66725| %_mask_idx: 0.3963| ppl: 212.96313| %_neg_is_pos: 0.00198| lr: 0.0| temp: 1.96342 | loss: 1.12291| constrast_loss: 4.42324| div_loss: 0.68404| %_mask_idx: 0.35808| ppl: 202.21146| %_neg_is_pos: 0.00404| lr: 0.0| temp: 1.96341 | loss: 1.14036| constrast_loss: 4.49345| div_loss: 0.68004| %_mask_idx: 0.42168| ppl: 204.77371| %_neg_is_pos: 0.00364| lr: 0.0| temp: 1.96341 | loss: 1.1381| constrast_loss: 4.48553| div_loss: 0.6685| %_mask_idx: 0.40774| ppl: 212.15941| %_neg_is_pos: 0.00297| lr: 0.0| temp: 1.9634 | loss: 1.13481| constrast_loss: 4.47213| div_loss: 0.67116| %_mask_idx: 0.36607| ppl: 210.45776| %_neg_is_pos: 0.00197| lr: 0.0| temp: 1.9634 | loss: 1.13656| constrast_loss: 4.47906| div_loss: 0.67187| %_mask_idx: 0.38503| ppl: 210.00499| %_neg_is_pos: 0.0041| lr: 0.0| temp: 1.96338 | loss: 1.13028| constrast_loss: 4.45344| div_loss: 0.67677| %_mask_idx: 0.35652| ppl: 206.867| %_neg_is_pos: 0.0029| lr: 0.0| temp: 1.96338 | loss: 1.1264| constrast_loss: 4.43859| div_loss: 0.67004| %_mask_idx: 0.3985| ppl: 211.17419| %_neg_is_pos: 0.00264| lr: 0.0| temp: 1.96337 | loss: 1.13007| constrast_loss: 4.45235| div_loss: 0.67927| %_mask_idx: 0.39568| ppl: 205.26875| %_neg_is_pos: 0.00467| lr: 0.0| temp: 1.96337 | loss: 1.13339| constrast_loss: 4.46702| div_loss: 0.66528| %_mask_idx: 0.41197| ppl: 214.22102| %_neg_is_pos: 0.00341| lr: 0.0| temp: 1.96336 | loss: 1.13477| constrast_loss: 4.47165| div_loss: 0.6743| %_mask_idx: 0.35103| ppl: 208.44604| %_neg_is_pos: 0.00492| lr: 0.0| temp: 1.96336 | loss: 1.1391| constrast_loss: 4.48987| div_loss: 0.66517| %_mask_idx: 0.37688| ppl: 214.2887| %_neg_is_pos: 0.00177| lr: 0.0| temp: 1.96335 | loss: 1.13547| constrast_loss: 4.47376| div_loss: 0.681| %_mask_idx: 0.36623| ppl: 204.16266| %_neg_is_pos: 0.0043| lr: 0.0| temp: 1.96335 | loss: 1.13862| constrast_loss: 4.48745| div_loss: 0.67033| %_mask_idx: 0.40241| ppl: 210.98694| %_neg_is_pos: 0.00205| lr: 0.0| temp: 1.96333 | loss: 1.13857| constrast_loss: 4.48661| div_loss: 0.67659| %_mask_idx: 0.42935| ppl: 206.98209| %_neg_is_pos: 0.00238| lr: 0.0| temp: 1.96333 | loss: 1.13511| constrast_loss: 4.47368| div_loss: 0.66757| %_mask_idx: 0.35573| ppl: 212.75702| %_neg_is_pos: 0.00373| lr: 0.0| temp: 1.96332 | loss: 1.1427| constrast_loss: 4.50417| div_loss: 0.66651| %_mask_idx: 0.40539| ppl: 213.43324| %_neg_is_pos: 0.00171| lr: 0.0| temp: 1.96332 | loss: 1.13424| constrast_loss: 4.46842| div_loss: 0.68547| %_mask_idx: 0.37108| ppl: 201.29906| %_neg_is_pos: 0.00227| lr: 0.0| temp: 1.96331 | loss: 1.13867| constrast_loss: 4.4867| div_loss: 0.67964| %_mask_idx: 0.35902| ppl: 205.03171| %_neg_is_pos: 0.00353| lr: 0.0| temp: 1.96331 | loss: 1.13181| constrast_loss: 4.46085| div_loss: 0.66379| %_mask_idx: 0.38675| ppl: 215.17285| %_neg_is_pos: 0.00139| lr: 0.0| temp: 1.9633 | loss: 1.14008| constrast_loss: 4.49262| div_loss: 0.67721| %_mask_idx: 0.41557| ppl: 206.58511| %_neg_is_pos: 0.00269| lr: 0.0| temp: 1.9633 | loss: 1.13522| constrast_loss: 4.47336| div_loss: 0.67526| %_mask_idx: 0.40711| ppl: 207.83238| %_neg_is_pos: 0.00136| lr: 0.0| temp: 1.96328 | loss: 1.1267| constrast_loss: 4.43794| div_loss: 0.68859| %_mask_idx: 0.39192| ppl: 199.30414| %_neg_is_pos: 0.00396| lr: 0.0| temp: 1.96328 | loss: 1.13042| constrast_loss: 4.45392| div_loss: 0.67773| %_mask_idx: 0.41103| ppl: 206.25076| %_neg_is_pos: 0.00305| lr: 0.0| temp: 1.96327 | loss: 1.12996| constrast_loss: 4.45088| div_loss: 0.6896| %_mask_idx: 0.40492| ppl: 198.65436| %_neg_is_pos: 0.00283| lr: 0.0| temp: 1.96327 | loss: 1.13031| constrast_loss: 4.45283| div_loss: 0.68399| %_mask_idx: 0.34837| ppl: 202.24429| %_neg_is_pos: 0.00256| lr: 0.0| temp: 1.96325 | loss: 1.13017| constrast_loss: 4.45356| div_loss: 0.67119| %_mask_idx: 0.37218| ppl: 210.43591| %_neg_is_pos: 0.00258| lr: 0.0| temp: 1.96325 | loss: 1.12938| constrast_loss: 4.44988| div_loss: 0.67623| %_mask_idx: 0.36576| ppl: 207.21368| %_neg_is_pos: 0.00271| lr: 0.0| temp: 1.96324 | loss: 1.14197| constrast_loss: 4.50088| div_loss: 0.66981| %_mask_idx: 0.40586| ppl: 211.32401| %_neg_is_pos: 0.00116| lr: 0.0| temp: 1.96324 | loss: 1.12841| constrast_loss: 4.44672| div_loss: 0.66934| %_mask_idx: 0.33427| ppl: 211.62106| %_neg_is_pos: 0.00224| lr: 0.0| temp: 1.96323 | loss: 1.145| constrast_loss: 4.51283| div_loss: 0.67182| %_mask_idx: 0.40539| ppl: 210.03552| %_neg_is_pos: 0.00141| lr: 0.0| temp: 1.96323 | loss: 1.14294| constrast_loss: 4.50467| div_loss: 0.67084| %_mask_idx: 0.39583| ppl: 210.66409| %_neg_is_pos: 0.00137| lr: 0.0| temp: 1.96322 | loss: 1.13668| constrast_loss: 4.47927| div_loss: 0.67437| %_mask_idx: 0.41729| ppl: 208.40256| %_neg_is_pos: 0.00249| lr: 0.0| temp: 1.96322 | loss: 1.12792| constrast_loss: 4.44366| div_loss: 0.68011| %_mask_idx: 0.36905| ppl: 204.73065| %_neg_is_pos: 0.0035| lr: 0.0| temp: 1.9632 | loss: 1.13765| constrast_loss: 4.48389| div_loss: 0.66723| %_mask_idx: 0.38158| ppl: 212.97247| %_neg_is_pos: 0.00226| lr: 0.0| temp: 1.9632 | loss: 1.14668| constrast_loss: 4.51937| div_loss: 0.67358| %_mask_idx: 0.41698| ppl: 208.91193| %_neg_is_pos: 0.00335| lr: 0.0| temp: 1.96319 | loss: 1.13886| constrast_loss: 4.48885| div_loss: 0.66585| %_mask_idx: 0.37766| ppl: 213.85376| %_neg_is_pos: 0.00317| lr: 0.0| temp: 1.96319 | loss: 1.14461| constrast_loss: 4.51258| div_loss: 0.65843| %_mask_idx: 0.42246| ppl: 218.60591| %_neg_is_pos: 0.00087| lr: 0.0| temp: 1.96318 | loss: 1.12938| constrast_loss: 4.44954| div_loss: 0.67992| %_mask_idx: 0.38221| ppl: 204.85199| %_neg_is_pos: 0.00226| lr: 0.0| temp: 1.96318 | loss: 1.12357| constrast_loss: 4.42593| div_loss: 0.6836| %_mask_idx: 0.3515| ppl: 202.4955| %_neg_is_pos: 0.00298| lr: 0.0| temp: 1.96317 | loss: 1.13567| constrast_loss: 4.47513| div_loss: 0.67563| %_mask_idx: 0.42967| ppl: 207.59531| %_neg_is_pos: 0.00273| lr: 0.0| temp: 1.96317 | loss: 1.13962| constrast_loss: 4.49157| div_loss: 0.66903| %_mask_idx: 0.40147| ppl: 211.8201| %_neg_is_pos: 0.00167| lr: 0.0| temp: 1.96315 | loss: 1.13028| constrast_loss: 4.45208| div_loss: 0.69049| %_mask_idx: 0.38596| ppl: 198.08739| %_neg_is_pos: 0.00369| lr: 0.0| temp: 1.96315 | loss: 1.133| constrast_loss: 4.46455| div_loss: 0.67453| %_mask_idx: 0.37798| ppl: 208.3027| %_neg_is_pos: 0.00213| lr: 0.0| temp: 1.96314 | loss: 1.133| constrast_loss: 4.4646| div_loss: 0.67394| %_mask_idx: 0.40085| ppl: 208.67609| %_neg_is_pos: 0.0023| lr: 0.0| temp: 1.96314 | loss: 1.1326| constrast_loss: 4.46316| div_loss: 0.67254| %_mask_idx: 0.42199| ppl: 209.57437| %_neg_is_pos: 0.0025| lr: 0.0| temp: 1.96313 | loss: 1.12845| constrast_loss: 4.44429| div_loss: 0.69494| %_mask_idx: 0.33192| ppl: 195.24083| %_neg_is_pos: 0.00317| lr: 0.0| temp: 1.96313 | loss: 1.13536| constrast_loss: 4.47433| div_loss: 0.67095| %_mask_idx: 0.44878| ppl: 210.59058| %_neg_is_pos: 0.00177| lr: 0.0| temp: 1.96312 | loss: 1.14191| constrast_loss: 4.50067| div_loss: 0.66979| %_mask_idx: 0.37845| ppl: 211.3374| %_neg_is_pos: 0.00192| lr: 0.0| temp: 1.96312 | loss: 1.13857| constrast_loss: 4.4872| div_loss: 0.67062| %_mask_idx: 0.40946| ppl: 210.80243| %_neg_is_pos: 0.00326| lr: 0.0| temp: 1.9631 | loss: 1.1171| constrast_loss: 4.39946| div_loss: 0.68922| %_mask_idx: 0.41087| ppl: 198.90149| %_neg_is_pos: 0.00363| lr: 0.0| temp: 1.9631 | loss: 1.13893| constrast_loss: 4.489| div_loss: 0.66706| %_mask_idx: 0.37547| ppl: 213.08331| %_neg_is_pos: 0.00109| lr: 0.0| temp: 1.96309 | loss: 1.13234| constrast_loss: 4.4611| div_loss: 0.68262| %_mask_idx: 0.39975| ppl: 203.12622| %_neg_is_pos: 0.00225| lr: 0.0| temp: 1.96309 [2021-09-02 05:15:48,940] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 05:15:48,940] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.14241| constrast_loss: 4.50216| div_loss: 0.67482| %_mask_idx: 0.39959| ppl: 208.11513| %_neg_is_pos: 0.00164| lr: 0.0| temp: 1.96308 | loss: 1.14576| constrast_loss: 4.5165| div_loss: 0.66528| %_mask_idx: 0.41447| ppl: 214.22372| %_neg_is_pos: 0.00179| lr: 0.0| temp: 1.96308 | loss: 1.13874| constrast_loss: 4.48796| div_loss: 0.67007| %_mask_idx: 0.4422| ppl: 211.15277| %_neg_is_pos: 0.00218| lr: 0.0| temp: 1.96307 | loss: 1.13634| constrast_loss: 4.47901| div_loss: 0.66368| %_mask_idx: 0.43405| ppl: 215.24788| %_neg_is_pos: 0.00148| lr: 0.0| temp: 1.96307 | loss: 1.13154| constrast_loss: 4.45821| div_loss: 0.67946| %_mask_idx: 0.35558| ppl: 205.14343| %_neg_is_pos: 0.00142| lr: 0.0| temp: 1.96306 | loss: 1.13035| constrast_loss: 4.45338| div_loss: 0.6802| %_mask_idx: 0.32346| ppl: 204.67178| %_neg_is_pos: 0.00287| lr: 0.0| temp: 1.96306 | loss: 1.1347| constrast_loss: 4.47135| div_loss: 0.67465| %_mask_idx: 0.35652| ppl: 208.22339| %_neg_is_pos: 0.00208| lr: 0.0| temp: 1.96305 | loss: 1.13417| constrast_loss: 4.46906| div_loss: 0.67635| %_mask_idx: 0.34508| ppl: 207.13493| %_neg_is_pos: 0.00218| lr: 0.0| temp: 1.96305 | loss: 1.13925| constrast_loss: 4.4891| div_loss: 0.67904| %_mask_idx: 0.38737| ppl: 205.41214| %_neg_is_pos: 0.00413| lr: 0.0| temp: 1.96303 | loss: 1.14061| constrast_loss: 4.49382| div_loss: 0.68609| %_mask_idx: 0.40883| ppl: 200.90303| %_neg_is_pos: 0.00347| lr: 0.0| temp: 1.96303 | loss: 1.13476| constrast_loss: 4.47255| div_loss: 0.66506| %_mask_idx: 0.41181| ppl: 214.36105| %_neg_is_pos: 0.00129| lr: 0.0| temp: 1.96302 | loss: 1.14233| constrast_loss: 4.50171| div_loss: 0.67625| %_mask_idx: 0.41635| ppl: 207.19833| %_neg_is_pos: 0.00204| lr: 0.0| temp: 1.96302 | loss: 1.12641| constrast_loss: 4.43652| div_loss: 0.69104| %_mask_idx: 0.34978| ppl: 197.73486| %_neg_is_pos: 0.00324| lr: 0.0| temp: 1.96301 | loss: 1.13429| constrast_loss: 4.46969| div_loss: 0.67482| %_mask_idx: 0.39192| ppl: 208.11717| %_neg_is_pos: 0.00232| lr: 0.0| temp: 1.96301 | loss: 1.12881| constrast_loss: 4.44614| div_loss: 0.69087| %_mask_idx: 0.40648| ppl: 197.84035| %_neg_is_pos: 0.00538| lr: 0.0| temp: 1.963 | loss: 1.13996| constrast_loss: 4.49284| div_loss: 0.67006| %_mask_idx: 0.38064| ppl: 211.16333| %_neg_is_pos: 0.00218| lr: 0.0| temp: 1.963 | loss: 1.12444| constrast_loss: 4.42898| div_loss: 0.68761| %_mask_idx: 0.34336| ppl: 199.92897| %_neg_is_pos: 0.00339| lr: 0.0| temp: 1.96298 | loss: 1.13329| constrast_loss: 4.46562| div_loss: 0.67537| %_mask_idx: 0.39991| ppl: 207.76413| %_neg_is_pos: 0.00289| lr: 0.0| temp: 1.96298 | loss: 1.13253| constrast_loss: 4.46202| div_loss: 0.68095| %_mask_idx: 0.42481| ppl: 204.18929| %_neg_is_pos: 0.00273| lr: 0.0| temp: 1.96297 | loss: 1.1456| constrast_loss: 4.51474| div_loss: 0.6765| %_mask_idx: 0.39395| ppl: 207.03966| %_neg_is_pos: 0.00231| lr: 0.0| temp: 1.96297 | loss: 1.13125| constrast_loss: 4.4574| div_loss: 0.6758| %_mask_idx: 0.42105| ppl: 207.48586| %_neg_is_pos: 0.00228| lr: 0.0| temp: 1.96296 | loss: 1.13669| constrast_loss: 4.47777| div_loss: 0.68989| %_mask_idx: 0.37719| ppl: 198.46999| %_neg_is_pos: 0.00358| lr: 0.0| temp: 1.96296 | loss: 1.12975| constrast_loss: 4.44888| div_loss: 0.70117| %_mask_idx: 0.41291| ppl: 191.25308| %_neg_is_pos: 0.00339| lr: 0.0| temp: 1.96295 | loss: 1.1328| constrast_loss: 4.46214| div_loss: 0.69036| %_mask_idx: 0.40868| ppl: 198.16667| %_neg_is_pos: 0.00279| lr: 0.0| temp: 1.96295 | loss: 1.13352| constrast_loss: 4.46402| div_loss: 0.70042| %_mask_idx: 0.35558| ppl: 191.73094| %_neg_is_pos: 0.00254| lr: 0.0| temp: 1.96293 | loss: 1.12976| constrast_loss: 4.44969| div_loss: 0.69366| %_mask_idx: 0.4104| ppl: 196.05905| %_neg_is_pos: 0.00204| lr: 0.0| temp: 1.96293 | loss: 1.1384| constrast_loss: 4.48485| div_loss: 0.68751| %_mask_idx: 0.39333| ppl: 199.99667| %_neg_is_pos: 0.00191| lr: 0.0| temp: 1.96292 | loss: 1.12418| constrast_loss: 4.42743| div_loss: 0.69291| %_mask_idx: 0.41197| ppl: 196.53775| %_neg_is_pos: 0.00215| lr: 0.0| temp: 1.96292 | loss: 1.13591| constrast_loss: 4.47457| div_loss: 0.69053| %_mask_idx: 0.36341| ppl: 198.06041| %_neg_is_pos: 0.00256| lr: 0.0| temp: 1.9629 | loss: 1.1423| constrast_loss: 4.50044| div_loss: 0.68756| %_mask_idx: 0.3786| ppl: 199.96425| %_neg_is_pos: 0.00155| lr: 0.0| temp: 1.9629 | loss: 1.12984| constrast_loss: 4.44921| div_loss: 0.70155| %_mask_idx: 0.375| ppl: 191.00595| %_neg_is_pos: 0.00265| lr: 0.0| temp: 1.96289 | loss: 1.13308| constrast_loss: 4.46393| div_loss: 0.68386| %_mask_idx: 0.40085| ppl: 202.32657| %_neg_is_pos: 0.00254| lr: 0.0| temp: 1.96289 | loss: 1.13658| constrast_loss: 4.47794| div_loss: 0.68391| %_mask_idx: 0.39004| ppl: 202.29984| %_neg_is_pos: 0.00224| lr: 0.0| temp: 1.96288 | loss: 1.12454| constrast_loss: 4.42947| div_loss: 0.68683| %_mask_idx: 0.3302| ppl: 200.42674| %_neg_is_pos: 0.00294| lr: 0.0| temp: 1.96288 | loss: 1.13566| constrast_loss: 4.47459| div_loss: 0.68043| %_mask_idx: 0.39145| ppl: 204.52365| %_neg_is_pos: 0.0023| lr: 0.0| temp: 1.96287 | loss: 1.13535| constrast_loss: 4.47281| div_loss: 0.68605| %_mask_idx: 0.41714| ppl: 200.93044| %_neg_is_pos: 0.00253| lr: 0.0| temp: 1.96287 | loss: 1.14868| constrast_loss: 4.52668| div_loss: 0.68046| %_mask_idx: 0.3938| ppl: 204.50352| %_neg_is_pos: 0.00186| lr: 0.0| temp: 1.96285 | loss: 1.13466| constrast_loss: 4.46999| div_loss: 0.68645| %_mask_idx: 0.36732| ppl: 200.67111| %_neg_is_pos: 0.00238| lr: 0.0| temp: 1.96285 | loss: 1.13009| constrast_loss: 4.45194| div_loss: 0.68425| %_mask_idx: 0.40038| ppl: 202.07721| %_neg_is_pos: 0.00235| lr: 0.0| temp: 1.96284 | loss: 1.14057| constrast_loss: 4.4939| div_loss: 0.6836| %_mask_idx: 0.37375| ppl: 202.49303| %_neg_is_pos: 0.00224| lr: 0.0| temp: 1.96284 | loss: 1.13306| constrast_loss: 4.46318| div_loss: 0.69047| %_mask_idx: 0.40899| ppl: 198.09821| %_neg_is_pos: 0.00238| lr: 0.0| temp: 1.96283 | loss: 1.14113| constrast_loss: 4.49675| div_loss: 0.67792| %_mask_idx: 0.40695| ppl: 206.12988| %_neg_is_pos: 0.00146| lr: 0.0| temp: 1.96283 | loss: 1.13859| constrast_loss: 4.48657| div_loss: 0.67785| %_mask_idx: 0.42011| ppl: 206.17593| %_neg_is_pos: 0.00156| lr: 0.0| temp: 1.96282 | loss: 1.12835| constrast_loss: 4.44367| div_loss: 0.69726| %_mask_idx: 0.37249| ppl: 193.75612| %_neg_is_pos: 0.00229| lr: 0.0| temp: 1.96282 | loss: 1.12701| constrast_loss: 4.43738| div_loss: 0.70678| %_mask_idx: 0.39113| ppl: 187.65825| %_neg_is_pos: 0.00373| lr: 0.0| temp: 1.9628 | loss: 1.13578| constrast_loss: 4.47435| div_loss: 0.68776| %_mask_idx: 0.37234| ppl: 199.83055| %_neg_is_pos: 0.00263| lr: 0.0| temp: 1.9628 | loss: 1.14344| constrast_loss: 4.50535| div_loss: 0.68402| %_mask_idx: 0.41228| ppl: 202.22531| %_neg_is_pos: 0.00137| lr: 0.0| temp: 1.96279 | loss: 1.13778| constrast_loss: 4.48242| div_loss: 0.68688| %_mask_idx: 0.40257| ppl: 200.39787| %_neg_is_pos: 0.00189| lr: 0.0| temp: 1.96279 | loss: 1.13603| constrast_loss: 4.475| div_loss: 0.691| %_mask_idx: 0.38737| ppl: 197.75873| %_neg_is_pos: 0.00215| lr: 0.0| temp: 1.96278 | loss: 1.13374| constrast_loss: 4.46634| div_loss: 0.68621| %_mask_idx: 0.3432| ppl: 200.82248| %_neg_is_pos: 0.00182| lr: 0.0| temp: 1.96278 | loss: 1.13211| constrast_loss: 4.45979| div_loss: 0.6865| %_mask_idx: 0.41855| ppl: 200.64185| %_neg_is_pos: 0.00185| lr: 0.0| temp: 1.96277 | loss: 1.14075| constrast_loss: 4.49501| div_loss: 0.67978| %_mask_idx: 0.33114| ppl: 204.94083| %_neg_is_pos: 0.00237| lr: 0.0| temp: 1.96277 | loss: 1.15108| constrast_loss: 4.53633| div_loss: 0.67981| %_mask_idx: 0.40695| ppl: 204.92329| %_neg_is_pos: 0.00166| lr: 0.0| temp: 1.96275 | loss: 1.13685| constrast_loss: 4.47846| div_loss: 0.68941| %_mask_idx: 0.38518| ppl: 198.77763| %_neg_is_pos: 0.00313| lr: 0.0| temp: 1.96275 | loss: 1.14349| constrast_loss: 4.50553| div_loss: 0.68421| %_mask_idx: 0.41573| ppl: 202.10571| %_neg_is_pos: 0.00217| lr: 0.0| temp: 1.96274 | loss: 1.13293| constrast_loss: 4.46237| div_loss: 0.69346| %_mask_idx: 0.42779| ppl: 196.18478| %_neg_is_pos: 0.00169| lr: 0.0| temp: 1.96274 | loss: 1.12167| constrast_loss: 4.41674| div_loss: 0.69947| %_mask_idx: 0.30639| ppl: 192.34134| %_neg_is_pos: 0.00292| lr: 0.0| temp: 1.96272 | loss: 1.14037| constrast_loss: 4.49296| div_loss: 0.6852| %_mask_idx: 0.33145| ppl: 201.47061| %_neg_is_pos: 0.00171| lr: 0.0| temp: 1.96272 | loss: 1.13727| constrast_loss: 4.48077| div_loss: 0.68291| %_mask_idx: 0.40179| ppl: 202.93935| %_neg_is_pos: 0.00154| lr: 0.0| temp: 1.96271 | loss: 1.13888| constrast_loss: 4.48687| div_loss: 0.68666| %_mask_idx: 0.41494| ppl: 200.53769| %_neg_is_pos: 0.00159| lr: 0.0| temp: 1.96271 | loss: 1.13996| constrast_loss: 4.49155| div_loss: 0.6827| %_mask_idx: 0.39035| ppl: 203.07339| %_neg_is_pos: 0.00216| lr: 0.0| temp: 1.9627 | loss: 1.14263| constrast_loss: 4.50212| div_loss: 0.68393| %_mask_idx: 0.35213| ppl: 202.28491| %_neg_is_pos: 0.00156| lr: 0.0| temp: 1.9627 | loss: 1.12798| constrast_loss: 4.44308| div_loss: 0.68854| %_mask_idx: 0.39474| ppl: 199.33623| %_neg_is_pos: 0.00201| lr: 0.0| temp: 1.96269 | loss: 1.13867| constrast_loss: 4.48668| div_loss: 0.68005| %_mask_idx: 0.35385| ppl: 204.771| %_neg_is_pos: 0.00221| lr: 0.0| temp: 1.96269 | loss: 1.13465| constrast_loss: 4.46999| div_loss: 0.68607| %_mask_idx: 0.40022| ppl: 200.91341| %_neg_is_pos: 0.00179| lr: 0.0| temp: 1.96267 | loss: 1.12921| constrast_loss: 4.44662| div_loss: 0.70234| %_mask_idx: 0.35636| ppl: 190.50116| %_neg_is_pos: 0.00171| lr: 0.0| temp: 1.96267 | loss: 1.13522| constrast_loss: 4.47267| div_loss: 0.6823| %_mask_idx: 0.37829| ppl: 203.32672| %_neg_is_pos: 0.00198| lr: 0.0| temp: 1.96266 | loss: 1.12778| constrast_loss: 4.44214| div_loss: 0.68975| %_mask_idx: 0.35354| ppl: 198.55719| %_neg_is_pos: 0.00213| lr: 0.0| temp: 1.96266 | loss: 1.12603| constrast_loss: 4.43517| div_loss: 0.6894| %_mask_idx: 0.31344| ppl: 198.78581| %_neg_is_pos: 0.00313| lr: 0.0| temp: 1.96265 | loss: 1.13859| constrast_loss: 4.48688| div_loss: 0.67476| %_mask_idx: 0.39677| ppl: 208.15491| %_neg_is_pos: 0.00155| lr: 0.0| temp: 1.96265 | loss: 1.14572| constrast_loss: 4.51447| div_loss: 0.68422| %_mask_idx: 0.33427| ppl: 202.10146| %_neg_is_pos: 0.0025| lr: 0.0| temp: 1.96264 | loss: 1.13459| constrast_loss: 4.46949| div_loss: 0.68857| %_mask_idx: 0.41667| ppl: 199.31749| %_neg_is_pos: 0.00293| lr: 0.0| temp: 1.96264 | loss: 1.12953| constrast_loss: 4.44819| div_loss: 0.69909| %_mask_idx: 0.38957| ppl: 192.58221| %_neg_is_pos: 0.00222| lr: 0.0| temp: 1.96262 | loss: 1.13209| constrast_loss: 4.45959| div_loss: 0.68765| %_mask_idx: 0.37077| ppl: 199.90375| %_neg_is_pos: 0.00238| lr: 0.0| temp: 1.96262 | loss: 1.12325| constrast_loss: 4.42446| div_loss: 0.68529| %_mask_idx: 0.3291| ppl: 201.41287| %_neg_is_pos: 0.00229| lr: 0.0| temp: 1.96261 | loss: 1.1382| constrast_loss: 4.48408| div_loss: 0.68731| %_mask_idx: 0.38769| ppl: 200.11917| %_neg_is_pos: 0.00195| lr: 0.0| temp: 1.96261 | loss: 1.12644| constrast_loss: 4.43714| div_loss: 0.68628| %_mask_idx: 0.38017| ppl: 200.7813| %_neg_is_pos: 0.00223| lr: 0.0| temp: 1.9626 | loss: 1.1359| constrast_loss: 4.47547| div_loss: 0.68137| %_mask_idx: 0.38048| ppl: 203.92607| %_neg_is_pos: 0.00221| lr: 0.0| temp: 1.9626 | loss: 1.13798| constrast_loss: 4.4844| div_loss: 0.67538| %_mask_idx: 0.38628| ppl: 207.75978| %_neg_is_pos: 0.00189| lr: 0.0| temp: 1.96259 | loss: 1.14226| constrast_loss: 4.5017| div_loss: 0.67343| %_mask_idx: 0.33098| ppl: 209.00192| %_neg_is_pos: 0.00107| lr: 0.0| temp: 1.96259 | loss: 1.13342| constrast_loss: 4.46573| div_loss: 0.67937| %_mask_idx: 0.38972| ppl: 205.20557| %_neg_is_pos: 0.00214| lr: 0.0| temp: 1.96257 | loss: 1.13631| constrast_loss: 4.4761| div_loss: 0.69127| %_mask_idx: 0.35573| ppl: 197.58632| %_neg_is_pos: 0.00191| lr: 0.0| temp: 1.96257 | loss: 1.1383| constrast_loss: 4.4851| div_loss: 0.68111| %_mask_idx: 0.4256| ppl: 204.09114| %_neg_is_pos: 0.00148| lr: 0.0| temp: 1.96256 | loss: 1.1423| constrast_loss: 4.50059| div_loss: 0.68606| %_mask_idx: 0.36231| ppl: 200.92386| %_neg_is_pos: 0.00299| lr: 0.0| temp: 1.96256 | loss: 1.12499| constrast_loss: 4.43075| div_loss: 0.69219| %_mask_idx: 0.40836| ppl: 196.9964| %_neg_is_pos: 0.00216| lr: 0.0| temp: 1.96255 | loss: 1.12967| constrast_loss: 4.45067| div_loss: 0.68007| %_mask_idx: 0.41244| ppl: 204.75528| %_neg_is_pos: 0.00176| lr: 0.0| temp: 1.96255 | loss: 1.14334| constrast_loss: 4.50526| div_loss: 0.68101| %_mask_idx: 0.39975| ppl: 204.15253| %_neg_is_pos: 0.00189| lr: 0.0| temp: 1.96254 | loss: 1.13585| constrast_loss: 4.4754| div_loss: 0.67987| %_mask_idx: 0.39944| ppl: 204.88312| %_neg_is_pos: 0.00182| lr: 0.0| temp: 1.96254 | loss: 1.12468| constrast_loss: 4.42959| div_loss: 0.69154| %_mask_idx: 0.36795| ppl: 197.41235| %_neg_is_pos: 0.00267| lr: 0.0| temp: 1.96253 | loss: 1.12887| constrast_loss: 4.44735| div_loss: 0.68144| %_mask_idx: 0.41056| ppl: 203.87694| %_neg_is_pos: 0.00185| lr: 0.0| temp: 1.96253 | loss: 1.12379| constrast_loss: 4.42493| div_loss: 0.70215| %_mask_idx: 0.36529| ppl: 190.62213| %_neg_is_pos: 0.00267| lr: 0.0| temp: 1.96252 | loss: 1.13623| constrast_loss: 4.47616| div_loss: 0.68746| %_mask_idx: 0.38972| ppl: 200.02519| %_neg_is_pos: 0.00181| lr: 0.0| temp: 1.96252 | loss: 1.12308| constrast_loss: 4.42308| div_loss: 0.69227| %_mask_idx: 0.35103| ppl: 196.94626| %_neg_is_pos: 0.00355| lr: 0.0| temp: 1.9625 | loss: 1.13451| constrast_loss: 4.46818| div_loss: 0.69847| %_mask_idx: 0.36184| ppl: 192.98198| %_neg_is_pos: 0.00281| lr: 0.0| temp: 1.9625 | loss: 1.14316| constrast_loss: 4.50501| div_loss: 0.67605| %_mask_idx: 0.37986| ppl: 207.32578| %_neg_is_pos: 0.00148| lr: 0.0| temp: 1.96249 | loss: 1.13272| constrast_loss: 4.46166| div_loss: 0.69232| %_mask_idx: 0.36451| ppl: 196.91318| %_neg_is_pos: 0.00205| lr: 0.0| temp: 1.96249 | loss: 1.13139| constrast_loss: 4.45653| div_loss: 0.69027| %_mask_idx: 0.38283| ppl: 198.22693| %_neg_is_pos: 0.0027| lr: 0.0| temp: 1.96248 | loss: 1.12716| constrast_loss: 4.43821| div_loss: 0.70435| %_mask_idx: 0.35103| ppl: 189.21799| %_neg_is_pos: 0.00344| lr: 0.0| temp: 1.96248 | loss: 1.13442| constrast_loss: 4.46835| div_loss: 0.69333| %_mask_idx: 0.35072| ppl: 196.26651| %_neg_is_pos: 0.00215| lr: 0.0| temp: 1.96247 | loss: 1.13011| constrast_loss: 4.45211| div_loss: 0.68319| %_mask_idx: 0.39442| ppl: 202.75632| %_neg_is_pos: 0.00215| lr: 0.0| temp: 1.96247 | loss: 1.13334| constrast_loss: 4.46451| div_loss: 0.68837| %_mask_idx: 0.41416| ppl: 199.44038| %_neg_is_pos: 0.00172| lr: 0.0| temp: 1.96245 | loss: 1.12511| constrast_loss: 4.43172| div_loss: 0.68712| %_mask_idx: 0.41024| ppl: 200.24136| %_neg_is_pos: 0.00213| lr: 0.0| temp: 1.96245 | loss: 1.12964| constrast_loss: 4.44869| div_loss: 0.69891| %_mask_idx: 0.40038| ppl: 192.70032| %_neg_is_pos: 0.00275| lr: 0.0| temp: 1.96244 | loss: 1.14007| constrast_loss: 4.49143| div_loss: 0.68838| %_mask_idx: 0.34821| ppl: 199.43407| %_neg_is_pos: 0.00183| lr: 0.0| temp: 1.96244 | loss: 1.13893| constrast_loss: 4.48776| div_loss: 0.67938| %_mask_idx: 0.43484| ppl: 205.19429| %_neg_is_pos: 0.00136| lr: 0.0| temp: 1.96243 | loss: 1.13262| constrast_loss: 4.46202| div_loss: 0.68449| %_mask_idx: 0.39568| ppl: 201.92563| %_neg_is_pos: 0.00226| lr: 0.0| temp: 1.96243 | loss: 1.13375| constrast_loss: 4.46666| div_loss: 0.68333| %_mask_idx: 0.36685| ppl: 202.66599| %_neg_is_pos: 0.0029| lr: 0.0| temp: 1.96242 | loss: 1.13094| constrast_loss: 4.4549| div_loss: 0.68872| %_mask_idx: 0.38628| ppl: 199.21622| %_neg_is_pos: 0.00208| lr: 0.0| temp: 1.96242 | loss: 1.13339| constrast_loss: 4.46447| div_loss: 0.69074| %_mask_idx: 0.36701| ppl: 197.9258| %_neg_is_pos: 0.00234| lr: 0.0| temp: 1.9624 | loss: 1.13871| constrast_loss: 4.48647| div_loss: 0.68388| %_mask_idx: 0.39004| ppl: 202.31976| %_neg_is_pos: 0.00151| lr: 0.0| temp: 1.9624 | loss: 1.12275| constrast_loss: 4.4221| div_loss: 0.68891| %_mask_idx: 0.39286| ppl: 199.09563| %_neg_is_pos: 0.00192| lr: 0.0| temp: 1.96239 | loss: 1.13389| constrast_loss: 4.46696| div_loss: 0.68598| %_mask_idx: 0.3537| ppl: 200.97208| %_neg_is_pos: 0.0021| lr: 0.0| temp: 1.96239 [2021-09-02 05:25:02,693] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 05:25:02,693] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.13161| constrast_loss: 4.45712| div_loss: 0.6933| %_mask_idx: 0.41259| ppl: 196.28748| %_neg_is_pos: 0.00211| lr: 0.0| temp: 1.96237 | loss: 1.13294| constrast_loss: 4.46236| div_loss: 0.69406| %_mask_idx: 0.36341| ppl: 195.80336| %_neg_is_pos: 0.00284| lr: 0.0| temp: 1.96237 | loss: 1.12822| constrast_loss: 4.4438| div_loss: 0.69079| %_mask_idx: 0.41902| ppl: 197.89755| %_neg_is_pos: 0.00215| lr: 0.0| temp: 1.96236 | loss: 1.12979| constrast_loss: 4.45079| div_loss: 0.68361| %_mask_idx: 0.40304| ppl: 202.49274| %_neg_is_pos: 0.00281| lr: 0.0| temp: 1.96236 | loss: 1.12728| constrast_loss: 4.44108| div_loss: 0.68039| %_mask_idx: 0.35009| ppl: 204.54929| %_neg_is_pos: 0.00258| lr: 0.0| temp: 1.96235 | loss: 1.1364| constrast_loss: 4.47759| div_loss: 0.68007| %_mask_idx: 0.3916| ppl: 204.75211| %_neg_is_pos: 0.00185| lr: 0.0| temp: 1.96235 | loss: 1.13934| constrast_loss: 4.48802| div_loss: 0.6934| %_mask_idx: 0.40836| ppl: 196.22209| %_neg_is_pos: 0.00318| lr: 0.0| temp: 1.96234 | loss: 1.13292| constrast_loss: 4.46329| div_loss: 0.68389| %_mask_idx: 0.36231| ppl: 202.30853| %_neg_is_pos: 0.00248| lr: 0.0| temp: 1.96234 | loss: 1.12871| constrast_loss: 4.44595| div_loss: 0.68902| %_mask_idx: 0.37939| ppl: 199.02975| %_neg_is_pos: 0.00464| lr: 0.0| temp: 1.96232 | loss: 1.14475| constrast_loss: 4.51127| div_loss: 0.67723| %_mask_idx: 0.40414| ppl: 206.56993| %_neg_is_pos: 0.00323| lr: 0.0| temp: 1.96232 | loss: 1.14343| constrast_loss: 4.50619| div_loss: 0.67549| %_mask_idx: 0.39787| ppl: 207.68713| %_neg_is_pos: 0.00342| lr: 0.0| temp: 1.96231 | loss: 1.13057| constrast_loss: 4.45388| div_loss: 0.68399| %_mask_idx: 0.37218| ppl: 202.24339| %_neg_is_pos: 0.00411| lr: 0.0| temp: 1.96231 | loss: 1.12931| constrast_loss: 4.44752| div_loss: 0.69714| %_mask_idx: 0.36216| ppl: 193.83331| %_neg_is_pos: 0.0055| lr: 0.0| temp: 1.9623 | loss: 1.12823| constrast_loss: 4.44429| div_loss: 0.68628| %_mask_idx: 0.42074| ppl: 200.77771| %_neg_is_pos: 0.00321| lr: 0.0| temp: 1.9623 | loss: 1.12453| constrast_loss: 4.42793| div_loss: 0.70199| %_mask_idx: 0.34492| ppl: 190.72531| %_neg_is_pos: 0.00546| lr: 0.0| temp: 1.96229 | loss: 1.11829| constrast_loss: 4.40371| div_loss: 0.69451| %_mask_idx: 0.36028| ppl: 195.51093| %_neg_is_pos: 0.00339| lr: 0.0| temp: 1.96229 | loss: 1.1334| constrast_loss: 4.46526| div_loss: 0.68352| %_mask_idx: 0.38596| ppl: 202.54404| %_neg_is_pos: 0.00258| lr: 0.0| temp: 1.96227| loss: 1.12857| constrast_loss: 4.44525| div_loss: 0.69038| %_mask_idx: 0.38596| ppl: 198.15613| %_neg_is_pos: 0.00289| lr: 0.0| temp: 1.96227 | loss: 1.13381| constrast_loss: 4.46699| div_loss: 0.68257| %_mask_idx: 0.44674| ppl: 203.15562| %_neg_is_pos: 0.00343| lr: 0.0| temp: 1.96226 | loss: 1.12341| constrast_loss: 4.4257| div_loss: 0.67935| %_mask_idx: 0.38847| ppl: 205.21625| %_neg_is_pos: 0.00318| lr: 0.0| temp: 1.96226 | loss: 1.12805| constrast_loss: 4.44371| div_loss: 0.68481| %_mask_idx: 0.4317| ppl: 201.72276| %_neg_is_pos: 0.00262| lr: 0.0| temp: 1.96225 | loss: 1.1415| constrast_loss: 4.4982| div_loss: 0.67796| %_mask_idx: 0.39489| ppl: 206.10805| %_neg_is_pos: 0.00263| lr: 0.0| temp: 1.96225 | loss: 1.13437| constrast_loss: 4.46847| div_loss: 0.69012| %_mask_idx: 0.38174| ppl: 198.32607| %_neg_is_pos: 0.00288| lr: 0.0| temp: 1.96224 | loss: 1.12268| constrast_loss: 4.42079| div_loss: 0.69936| %_mask_idx: 0.39082| ppl: 192.41211| %_neg_is_pos: 0.00362| lr: 0.0| temp: 1.96224 | loss: 1.1296| constrast_loss: 4.44948| div_loss: 0.689| %_mask_idx: 0.38941| ppl: 199.04262| %_neg_is_pos: 0.00248| lr: 0.0| temp: 1.96222 | loss: 1.13108| constrast_loss: 4.45514| div_loss: 0.6917| %_mask_idx: 0.36983| ppl: 197.31207| %_neg_is_pos: 0.00281| lr: 0.0| temp: 1.96222 | loss: 1.13673| constrast_loss: 4.4768| div_loss: 0.70104| %_mask_idx: 0.42262| ppl: 191.33554| %_neg_is_pos: 0.00258| lr: 0.0| temp: 1.96221 | loss: 1.1327| constrast_loss: 4.46262| div_loss: 0.6817| %_mask_idx: 0.37563| ppl: 203.71495| %_neg_is_pos: 0.00185| lr: 0.0| temp: 1.96221 | loss: 1.13251| constrast_loss: 4.46208| div_loss: 0.67974| %_mask_idx: 0.41729| ppl: 204.96556| %_neg_is_pos: 0.00159| lr: 0.0| temp: 1.96219 | loss: 1.13244| constrast_loss: 4.46112| div_loss: 0.68642| %_mask_idx: 0.3797| ppl: 200.69209| %_neg_is_pos: 0.00178| lr: 0.0| temp: 1.96219 | loss: 1.13166| constrast_loss: 4.4589| div_loss: 0.67745| %_mask_idx: 0.39223| ppl: 206.43454| %_neg_is_pos: 0.00194| lr: 0.0| temp: 1.96218 | loss: 1.1372| constrast_loss: 4.48131| div_loss: 0.67492| %_mask_idx: 0.38894| ppl: 208.04913| %_neg_is_pos: 0.00176| lr: 0.0| temp: 1.96218 | loss: 1.13656| constrast_loss: 4.47724| div_loss: 0.68985| %_mask_idx: 0.40915| ppl: 198.49303| %_neg_is_pos: 0.00186| lr: 0.0| temp: 1.96217 | loss: 1.13273| constrast_loss: 4.46241| div_loss: 0.68507| %_mask_idx: 0.3927| ppl: 201.55524| %_neg_is_pos: 0.00249| lr: 0.0| temp: 1.96217 | loss: 1.13422| constrast_loss: 4.46909| div_loss: 0.67787| %_mask_idx: 0.37939| ppl: 206.16011| %_neg_is_pos: 0.00145| lr: 0.0| temp: 1.96216 | loss: 1.14046| constrast_loss: 4.49441| div_loss: 0.67419| %_mask_idx: 0.38784| ppl: 208.52112| %_neg_is_pos: 0.00124| lr: 0.0| temp: 1.96216 | loss: 1.13145| constrast_loss: 4.45708| div_loss: 0.68712| %_mask_idx: 0.40194| ppl: 200.24414| %_neg_is_pos: 0.00225| lr: 0.0| temp: 1.96214 | loss: 1.14624| constrast_loss: 4.51688| div_loss: 0.68073| %_mask_idx: 0.37453| ppl: 204.33215| %_neg_is_pos: 0.00154| lr: 0.0| temp: 1.96214 | loss: 1.1259| constrast_loss: 4.43542| div_loss: 0.68184| %_mask_idx: 0.39756| ppl: 203.62021| %_neg_is_pos: 0.00236| lr: 0.0| temp: 1.96213 | loss: 1.13652| constrast_loss: 4.47751| div_loss: 0.68577| %_mask_idx: 0.40868| ppl: 201.10468| %_neg_is_pos: 0.00182| lr: 0.0| temp: 1.96213 | loss: 1.13489| constrast_loss: 4.47203| div_loss: 0.67515| %_mask_idx: 0.39505| ppl: 207.90367| %_neg_is_pos: 0.00164| lr: 0.0| temp: 1.96212 | loss: 1.1302| constrast_loss: 4.45332| div_loss: 0.6746| %_mask_idx: 0.35793| ppl: 208.25577| %_neg_is_pos: 0.00208| lr: 0.0| temp: 1.96212 | loss: 1.13342| constrast_loss: 4.46557| div_loss: 0.68115| %_mask_idx: 0.375| ppl: 204.06641| %_neg_is_pos: 0.00156| lr: 0.0| temp: 1.96211 | loss: 1.14083| constrast_loss: 4.49521| div_loss: 0.68108| %_mask_idx: 0.34994| ppl: 204.10721| %_neg_is_pos: 0.00147| lr: 0.0| temp: 1.96211 | loss: 1.13947| constrast_loss: 4.48963| div_loss: 0.68265| %_mask_idx: 0.35526| ppl: 203.10516| %_neg_is_pos: 0.002| lr: 0.0| temp: 1.96209 | loss: 1.13656| constrast_loss: 4.47841| div_loss: 0.67823| %_mask_idx: 0.41792| ppl: 205.93085| %_neg_is_pos: 0.00128| lr: 0.0| temp: 1.96209 | loss: 1.13522| constrast_loss: 4.47287| div_loss: 0.6801| %_mask_idx: 0.36607| ppl: 204.73767| %_neg_is_pos: 0.00213| lr: 0.0| temp: 1.96208 | loss: 1.14121| constrast_loss: 4.49602| div_loss: 0.6882| %_mask_idx: 0.41479| ppl: 199.55167| %_neg_is_pos: 0.00249| lr: 0.0| temp: 1.96208 | loss: 1.13233| constrast_loss: 4.46042| div_loss: 0.68912| %_mask_idx: 0.37469| ppl: 198.9657| %_neg_is_pos: 0.00169| lr: 0.0| temp: 1.96207 | loss: 1.13685| constrast_loss: 4.47835| div_loss: 0.69051| %_mask_idx: 0.41557| ppl: 198.07254| %_neg_is_pos: 0.00183| lr: 0.0| temp: 1.96207 | loss: 1.13041| constrast_loss: 4.45316| div_loss: 0.68492| %_mask_idx: 0.39693| ppl: 201.6496| %_neg_is_pos: 0.00136| lr: 0.0| temp: 1.96206 | loss: 1.13845| constrast_loss: 4.48582| div_loss: 0.67994| %_mask_idx: 0.40241| ppl: 204.84082| %_neg_is_pos: 0.00138| lr: 0.0| temp: 1.96206 | loss: 1.14545| constrast_loss: 4.51402| div_loss: 0.67768| %_mask_idx: 0.40602| ppl: 206.28406| %_neg_is_pos: 0.00152| lr: 0.0| temp: 1.96204 | loss: 1.12799| constrast_loss: 4.44249| div_loss: 0.6949| %_mask_idx: 0.31657| ppl: 195.26704| %_neg_is_pos: 0.00274| lr: 0.0| temp: 1.96204 | loss: 1.14456| constrast_loss: 4.51067| div_loss: 0.67559| %_mask_idx: 0.41212| ppl: 207.62064| %_neg_is_pos: 0.00117| lr: 0.0| temp: 1.96203 | loss: 1.14612| constrast_loss: 4.51687| div_loss: 0.67606| %_mask_idx: 0.36513| ppl: 207.3223| %_neg_is_pos: 0.00136| lr: 0.0| temp: 1.96203 | loss: 1.13484| constrast_loss: 4.471| div_loss: 0.68375| %_mask_idx: 0.37046| ppl: 202.39946| %_neg_is_pos: 0.00214| lr: 0.0| temp: 1.96202 | loss: 1.13219| constrast_loss: 4.46028| div_loss: 0.68478| %_mask_idx: 0.37312| ppl: 201.73904| %_neg_is_pos: 0.00263| lr: 0.0| temp: 1.96202 | loss: 1.1412| constrast_loss: 4.496| div_loss: 0.68819| %_mask_idx: 0.44721| ppl: 199.55775| %_neg_is_pos: 0.00146| lr: 0.0| temp: 1.96201 | loss: 1.13936| constrast_loss: 4.48917| div_loss: 0.68275| %_mask_idx: 0.37328| ppl: 203.03833| %_neg_is_pos: 0.00145| lr: 0.0| temp: 1.96201 | loss: 1.13963| constrast_loss: 4.49074| div_loss: 0.6779| %_mask_idx: 0.37046| ppl: 206.14557| %_neg_is_pos: 0.00128| lr: 0.0| temp: 1.962 | loss: 1.13659| constrast_loss: 4.47776| div_loss: 0.68585| %_mask_idx: 0.38894| ppl: 201.05618| %_neg_is_pos: 0.0025| lr: 0.0| temp: 1.962 | loss: 1.1431| constrast_loss: 4.50413| div_loss: 0.68269| %_mask_idx: 0.4057| ppl: 203.07965| %_neg_is_pos: 0.00146| lr: 0.0| temp: 1.96199 | loss: 1.13476| constrast_loss: 4.47035| div_loss: 0.68707| %_mask_idx: 0.35072| ppl: 200.27463| %_neg_is_pos: 0.00292| lr: 0.0| temp: 1.96199 | loss: 1.13438| constrast_loss: 4.46955| div_loss: 0.67952| %_mask_idx: 0.35417| ppl: 205.10788| %_neg_is_pos: 0.00132| lr: 0.0| temp: 1.96197 | loss: 1.12984| constrast_loss: 4.45008| div_loss: 0.69299| %_mask_idx: 0.35526| ppl: 196.48723| %_neg_is_pos: 0.00248| lr: 0.0| temp: 1.96197 | loss: 1.13418| constrast_loss: 4.46786| div_loss: 0.68857| %_mask_idx: 0.3786| ppl: 199.31317| %_neg_is_pos: 0.00327| lr: 0.0| temp: 1.96196 | loss: 1.13644| constrast_loss: 4.47667| div_loss: 0.69072| %_mask_idx: 0.37829| ppl: 197.93771| %_neg_is_pos: 0.00168| lr: 0.0| temp: 1.96196 | loss: 1.1347| constrast_loss: 4.47151| div_loss: 0.6729| %_mask_idx: 0.40977| ppl: 209.34586| %_neg_is_pos: 0.00162| lr: 0.0| temp: 1.96195 | loss: 1.13082| constrast_loss: 4.45454| div_loss: 0.68737| %_mask_idx: 0.37563| ppl: 200.08023| %_neg_is_pos: 0.00259| lr: 0.0| temp: 1.96195 | loss: 1.1246| constrast_loss: 4.4298| div_loss: 0.68585| %_mask_idx: 0.37954| ppl: 201.05602| %_neg_is_pos: 0.00215| lr: 0.0| temp: 1.96194 | loss: 1.13494| constrast_loss: 4.47152| div_loss: 0.68236| %_mask_idx: 0.38753| ppl: 203.28865| %_neg_is_pos: 0.00246| lr: 0.0| temp: 1.96194 | loss: 1.1326| constrast_loss: 4.46306| div_loss: 0.67345| %_mask_idx: 0.3714| ppl: 208.99298| %_neg_is_pos: 0.00166| lr: 0.0| temp: 1.96192 | loss: 1.13582| constrast_loss: 4.47586| div_loss: 0.67428| %_mask_idx: 0.40429| ppl: 208.45993| %_neg_is_pos: 0.00123| lr: 0.0| temp: 1.96192 | loss: 1.13584| constrast_loss: 4.47549| div_loss: 0.67876| %_mask_idx: 0.36169| ppl: 205.59061| %_neg_is_pos: 0.00143| lr: 0.0| temp: 1.96191 | loss: 1.12444| constrast_loss: 4.42864| div_loss: 0.69127| %_mask_idx: 0.36889| ppl: 197.58746| %_neg_is_pos: 0.00213| lr: 0.0| temp: 1.96191 | loss: 1.13251| constrast_loss: 4.46161| div_loss: 0.68413| %_mask_idx: 0.41494| ppl: 202.15961| %_neg_is_pos: 0.00182| lr: 0.0| temp: 1.9619 | loss: 1.11983| constrast_loss: 4.41021| div_loss: 0.69107| %_mask_idx: 0.36873| ppl: 197.71622| %_neg_is_pos: 0.00357| lr: 0.0| temp: 1.9619 | loss: 1.13099| constrast_loss: 4.455| div_loss: 0.68977| %_mask_idx: 0.39693| ppl: 198.54997| %_neg_is_pos: 0.00175| lr: 0.0| temp: 1.96189 | loss: 1.13089| constrast_loss: 4.45463| div_loss: 0.68947| %_mask_idx: 0.33835| ppl: 198.74162| %_neg_is_pos: 0.00317| lr: 0.0| temp: 1.96189 | loss: 1.14303| constrast_loss: 4.50386| div_loss: 0.68261| %_mask_idx: 0.41024| ppl: 203.13184| %_neg_is_pos: 0.00197| lr: 0.0| temp: 1.96187 | loss: 1.13093| constrast_loss: 4.45486| div_loss: 0.68882| %_mask_idx: 0.3808| ppl: 199.15317| %_neg_is_pos: 0.00133| lr: 0.0| temp: 1.96187 | loss: 1.13832| constrast_loss: 4.48576| div_loss: 0.6752| %_mask_idx: 0.39301| ppl: 207.87109| %_neg_is_pos: 0.00219| lr: 0.0| temp: 1.96186 | loss: 1.13183| constrast_loss: 4.45941| div_loss: 0.67925| %_mask_idx: 0.36764| ppl: 205.28188| %_neg_is_pos: 0.00234| lr: 0.0| temp: 1.96186 | loss: 1.13379| constrast_loss: 4.46709| div_loss: 0.68075| %_mask_idx: 0.42199| ppl: 204.3194| %_neg_is_pos: 0.00202| lr: 0.0| temp: 1.96184 | loss: 1.12979| constrast_loss: 4.45073| div_loss: 0.68427| %_mask_idx: 0.37234| ppl: 202.06964| %_neg_is_pos: 0.00206| lr: 0.0| temp: 1.96184 | loss: 1.12691| constrast_loss: 4.43951| div_loss: 0.68142| %_mask_idx: 0.41447| ppl: 203.89371| %_neg_is_pos: 0.00141| lr: 0.0| temp: 1.96183 | loss: 1.13573| constrast_loss: 4.47485| div_loss: 0.68078| %_mask_idx: 0.35683| ppl: 204.30025| %_neg_is_pos: 0.00187| lr: 0.0| temp: 1.96183 | loss: 1.13467| constrast_loss: 4.47057| div_loss: 0.68113| %_mask_idx: 0.43766| ppl: 204.07422| %_neg_is_pos: 0.0016| lr: 0.0| temp: 1.96182 | loss: 1.13892| constrast_loss: 4.48774| div_loss: 0.67941| %_mask_idx: 0.39035| ppl: 205.17563| %_neg_is_pos: 0.00163| lr: 0.0| temp: 1.96182 | loss: 1.13568| constrast_loss: 4.47466| div_loss: 0.68051| %_mask_idx: 0.39787| ppl: 204.47064| %_neg_is_pos: 0.00231| lr: 0.0| temp: 1.96181 | loss: 1.13748| constrast_loss: 4.48116| div_loss: 0.68752| %_mask_idx: 0.39317| ppl: 199.98486| %_neg_is_pos: 0.00265| lr: 0.0| temp: 1.96181 | loss: 1.13228| constrast_loss: 4.46059| div_loss: 0.68526| %_mask_idx: 0.39192| ppl: 201.43552| %_neg_is_pos: 0.00192| lr: 0.0| temp: 1.96179 | loss: 1.13254| constrast_loss: 4.46144| div_loss: 0.68734| %_mask_idx: 0.36732| ppl: 200.10455| %_neg_is_pos: 0.00195| lr: 0.0| temp: 1.96179 | loss: 1.13707| constrast_loss: 4.47974| div_loss: 0.68562| %_mask_idx: 0.40273| ppl: 201.20078| %_neg_is_pos: 0.00147| lr: 0.0| temp: 1.96178 | loss: 1.12872| constrast_loss: 4.44589| div_loss: 0.69| %_mask_idx: 0.41964| ppl: 198.39818| %_neg_is_pos: 0.00197| lr: 0.0| temp: 1.96178 | loss: 1.13711| constrast_loss: 4.4806| div_loss: 0.67848| %_mask_idx: 0.38299| ppl: 205.76999| %_neg_is_pos: 0.00192| lr: 0.0| temp: 1.96177 | loss: 1.12493| constrast_loss: 4.43164| div_loss: 0.68075| %_mask_idx: 0.42528| ppl: 204.31827| %_neg_is_pos: 0.00161| lr: 0.0| temp: 1.96177 | loss: 1.12487| constrast_loss: 4.43033| div_loss: 0.6914| %_mask_idx: 0.36842| ppl: 197.50456| %_neg_is_pos: 0.00302| lr: 0.0| temp: 1.96176 | loss: 1.15114| constrast_loss: 4.53657| div_loss: 0.68008| %_mask_idx: 0.39944| ppl: 204.75159| %_neg_is_pos: 0.00211| lr: 0.0| temp: 1.96176 | loss: 1.13826| constrast_loss: 4.4851| div_loss: 0.67962| %_mask_idx: 0.34508| ppl: 205.0441| %_neg_is_pos: 0.00143| lr: 0.0| temp: 1.96174 | loss: 1.13202| constrast_loss: 4.45939| div_loss: 0.68669| %_mask_idx: 0.35996| ppl: 200.51611| %_neg_is_pos: 0.00201| lr: 0.0| temp: 1.96174 | loss: 1.13981| constrast_loss: 4.49037| div_loss: 0.68852| %_mask_idx: 0.37234| ppl: 199.34627| %_neg_is_pos: 0.00185| lr: 0.0| temp: 1.96173 | loss: 1.13259| constrast_loss: 4.46252| div_loss: 0.67842| %_mask_idx: 0.37672| ppl: 205.8116| %_neg_is_pos: 0.00165| lr: 0.0| temp: 1.96173 | loss: 1.14217| constrast_loss: 4.50031| div_loss: 0.68349| %_mask_idx: 0.40727| ppl: 202.56842| %_neg_is_pos: 0.00169| lr: 0.0| temp: 1.96172 | loss: 1.13892| constrast_loss: 4.48749| div_loss: 0.68167| %_mask_idx: 0.39051| ppl: 203.73114| %_neg_is_pos: 0.00201| lr: 0.0| temp: 1.96172 | loss: 1.1302| constrast_loss: 4.45202| div_loss: 0.68778| %_mask_idx: 0.40602| ppl: 199.82132| %_neg_is_pos: 0.00159| lr: 0.0| temp: 1.96171 | loss: 1.1385| constrast_loss: 4.48651| div_loss: 0.67487| %_mask_idx: 0.38722| ppl: 208.08386| %_neg_is_pos: 0.00093| lr: 0.0| temp: 1.96171 | loss: 1.13468| constrast_loss: 4.46968| div_loss: 0.69046| %_mask_idx: 0.39113| ppl: 198.10269| %_neg_is_pos: 0.00141| lr: 0.0| temp: 1.96169 | loss: 1.12671| constrast_loss: 4.43776| div_loss: 0.69081| %_mask_idx: 0.41275| ppl: 197.88345| %_neg_is_pos: 0.00232| lr: 0.0| temp: 1.96169 | loss: 1.13641| constrast_loss: 4.47711| div_loss: 0.6854| %_mask_idx: 0.42857| ppl: 201.34558| %_neg_is_pos: 0.00193| lr: 0.0| temp: 1.96168 | loss: 1.14216| constrast_loss: 4.50065| div_loss: 0.67985| %_mask_idx: 0.44189| ppl: 204.89587| %_neg_is_pos: 0.00125| lr: 0.0| temp: 1.96168 [2021-09-02 05:34:16,627] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 05:34:16,627] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.13209| constrast_loss: 4.46038| div_loss: 0.67994| %_mask_idx: 0.35761| ppl: 204.83615| %_neg_is_pos: 0.00133| lr: 0.0| temp: 1.96166 | loss: 1.13828| constrast_loss: 4.48403| div_loss: 0.69094| %_mask_idx: 0.41682| ppl: 197.79654| %_neg_is_pos: 0.00185| lr: 0.0| temp: 1.96166 | loss: 1.13293| constrast_loss: 4.46375| div_loss: 0.67957| %_mask_idx: 0.37986| ppl: 205.07541| %_neg_is_pos: 0.00129| lr: 0.0| temp: 1.96165 | loss: 1.13724| constrast_loss: 4.48104| div_loss: 0.67923| %_mask_idx: 0.42215| ppl: 205.29053| %_neg_is_pos: 0.00271| lr: 0.0| temp: 1.96165 | loss: 1.12674| constrast_loss: 4.43845| div_loss: 0.68526| %_mask_idx: 0.38784| ppl: 201.43129| %_neg_is_pos: 0.00228| lr: 0.0| temp: 1.96164 | loss: 1.14585| constrast_loss: 4.51659| div_loss: 0.66796| %_mask_idx: 0.40727| ppl: 212.50757| %_neg_is_pos: 0.00138| lr: 0.0| temp: 1.96164 | loss: 1.13767| constrast_loss: 4.4819| div_loss: 0.68775| %_mask_idx: 0.44314| ppl: 199.84041| %_neg_is_pos: 0.00254| lr: 0.0| temp: 1.96163 | loss: 1.1148| constrast_loss: 4.38962| div_loss: 0.69574| %_mask_idx: 0.32315| ppl: 194.72519| %_neg_is_pos: 0.00408| lr: 0.0| temp: 1.96163 | loss: 1.13746| constrast_loss: 4.48223| div_loss: 0.676| %_mask_idx: 0.43954| ppl: 207.36017| %_neg_is_pos: 0.00171| lr: 0.0| temp: 1.96161 | loss: 1.13515| constrast_loss: 4.47174| div_loss: 0.6888| %_mask_idx: 0.37093| ppl: 199.17058| %_neg_is_pos: 0.00354| lr: 0.0| temp: 1.96161 | loss: 1.14463| constrast_loss: 4.51034| div_loss: 0.68174| %_mask_idx: 0.41447| ppl: 203.68553| %_neg_is_pos: 0.00211| lr: 0.0| temp: 1.9616 | loss: 1.11972| constrast_loss: 4.40944| div_loss: 0.6945| %_mask_idx: 0.37265| ppl: 195.51797| %_neg_is_pos: 0.00342| lr: 0.0| temp: 1.9616 | loss: 1.11956| constrast_loss: 4.4062| div_loss: 0.72037| %_mask_idx: 0.3266| ppl: 178.96152| %_neg_is_pos: 0.00383| lr: 0.0| temp: 1.96159 | loss: 1.12912| constrast_loss: 4.44765| div_loss: 0.68841| %_mask_idx: 0.39474| ppl: 199.41898| %_neg_is_pos: 0.00167| lr: 0.0| temp: 1.96159 | loss: 1.12698| constrast_loss: 4.43878| div_loss: 0.69155| %_mask_idx: 0.36357| ppl: 197.40735| %_neg_is_pos: 0.00319| lr: 0.0| temp: 1.96158 | loss: 1.13259| constrast_loss: 4.46285| div_loss: 0.67519| %_mask_idx: 0.35981| ppl: 207.87668| %_neg_is_pos: 0.00214| lr: 0.0| temp: 1.96158 | loss: 1.13928| constrast_loss: 4.4883| div_loss: 0.68816| %_mask_idx: 0.38033| ppl: 199.57658| %_neg_is_pos: 0.00233| lr: 0.0| temp: 1.96156 | loss: 1.13638| constrast_loss: 4.47727| div_loss: 0.6824| %_mask_idx: 0.4317| ppl: 203.26672| %_neg_is_pos: 0.00258| lr: 0.0| temp: 1.96156 | loss: 1.13934| constrast_loss: 4.48914| div_loss: 0.68234| %_mask_idx: 0.4209| ppl: 203.30338| %_neg_is_pos: 0.00223| lr: 0.0| temp: 1.96155 | loss: 1.14203| constrast_loss: 4.50027| div_loss: 0.67853| %_mask_idx: 0.46131| ppl: 205.73874| %_neg_is_pos: 0.00216| lr: 0.0| temp: 1.96155 | loss: 1.136| constrast_loss: 4.47418| div_loss: 0.69811| %_mask_idx: 0.37688| ppl: 193.21268| %_neg_is_pos: 0.00483| lr: 0.0| temp: 1.96154 | loss: 1.13265| constrast_loss: 4.46159| div_loss: 0.6901| %_mask_idx: 0.37751| ppl: 198.33511| %_neg_is_pos: 0.00269| lr: 0.0| temp: 1.96154 | loss: 1.13738| constrast_loss: 4.48129| div_loss: 0.68211| %_mask_idx: 0.42826| ppl: 203.44978| %_neg_is_pos: 0.00339| lr: 0.0| temp: 1.96153 | loss: 1.1241| constrast_loss: 4.42686| div_loss: 0.69558| %_mask_idx: 0.41103| ppl: 194.82573| %_neg_is_pos: 0.00358| lr: 0.0| temp: 1.96153 | loss: 1.13435| constrast_loss: 4.46877| div_loss: 0.68616| %_mask_idx: 0.36779| ppl: 200.85582| %_neg_is_pos: 0.00336| lr: 0.0| temp: 1.96151 | loss: 1.13186| constrast_loss: 4.45903| div_loss: 0.68402| %_mask_idx: 0.36513| ppl: 202.2291| %_neg_is_pos: 0.00389| lr: 0.0| temp: 1.96151 | loss: 1.14259| constrast_loss: 4.503| div_loss: 0.67357| %_mask_idx: 0.42747| ppl: 208.91254| %_neg_is_pos: 0.00226| lr: 0.0| temp: 1.96151 | loss: 1.12805| constrast_loss: 4.44291| div_loss: 0.69305| %_mask_idx: 0.38643| ppl: 196.44751| %_neg_is_pos: 0.0044| lr: 0.0| temp: 1.96151 | loss: 1.12278| constrast_loss: 4.4205| div_loss: 0.70633| %_mask_idx: 0.36216| ppl: 187.95059| %_neg_is_pos: 0.00434| lr: 0.0| temp: 1.96149 | loss: 1.12936| constrast_loss: 4.44814| div_loss: 0.69299| %_mask_idx: 0.38753| ppl: 196.48767| %_neg_is_pos: 0.0032| lr: 0.0| temp: 1.96149 | loss: 1.12314| constrast_loss: 4.42296| div_loss: 0.69586| %_mask_idx: 0.38675| ppl: 194.64978| %_neg_is_pos: 0.00206| lr: 0.0| temp: 1.96148 | loss: 1.12776| constrast_loss: 4.44286| div_loss: 0.68171| %_mask_idx: 0.37469| ppl: 203.70709| %_neg_is_pos: 0.00309| lr: 0.0| temp: 1.96148 | loss: 1.13307| constrast_loss: 4.46228| div_loss: 0.7002| %_mask_idx: 0.36372| ppl: 191.87202| %_neg_is_pos: 0.00283| lr: 0.0| temp: 1.96147 | loss: 1.12546| constrast_loss: 4.43228| div_loss: 0.69551| %_mask_idx: 0.36466| ppl: 194.87679| %_neg_is_pos: 0.00398| lr: 0.0| temp: 1.96147 | loss: 1.15404| constrast_loss: 4.54704| div_loss: 0.69106| %_mask_idx: 0.36732| ppl: 197.71953| %_neg_is_pos: 0.00259| lr: 0.0| temp: 1.96146 | loss: 1.13805| constrast_loss: 4.48408| div_loss: 0.68102| %_mask_idx: 0.40899| ppl: 204.1474| %_neg_is_pos: 0.00212| lr: 0.0| temp: 1.96146 | loss: 1.14135| constrast_loss: 4.49615| div_loss: 0.69238| %_mask_idx: 0.40962| ppl: 196.87584| %_neg_is_pos: 0.00281| lr: 0.0| temp: 1.96144 | loss: 1.12261| constrast_loss: 4.42043| div_loss: 0.70023| %_mask_idx: 0.36012| ppl: 191.85138| %_neg_is_pos: 0.00436| lr: 0.0| temp: 1.96144 | loss: 1.11849| constrast_loss: 4.40394| div_loss: 0.70004| %_mask_idx: 0.36357| ppl: 191.97339| %_neg_is_pos: 0.00357| lr: 0.0| temp: 1.96143 | loss: 1.12504| constrast_loss: 4.43014| div_loss: 0.70016| %_mask_idx: 0.41259| ppl: 191.89833| %_neg_is_pos: 0.00423| lr: 0.0| temp: 1.96143 | loss: 1.14119| constrast_loss: 4.49616| div_loss: 0.68593| %_mask_idx: 0.38643| ppl: 201.00798| %_neg_is_pos: 0.00463| lr: 0.0| temp: 1.96142 | loss: 1.13368| constrast_loss: 4.46615| div_loss: 0.68585| %_mask_idx: 0.46084| ppl: 201.05441| %_neg_is_pos: 0.00216| lr: 0.0| temp: 1.96142 | loss: 1.12978| constrast_loss: 4.45012| div_loss: 0.69008| %_mask_idx: 0.37218| ppl: 198.34659| %_neg_is_pos: 0.00523| lr: 0.0| temp: 1.96141 | loss: 1.13281| constrast_loss: 4.46219| div_loss: 0.69037| %_mask_idx: 0.38988| ppl: 198.16499| %_neg_is_pos: 0.00303| lr: 0.0| temp: 1.96141 | loss: 1.14328| constrast_loss: 4.50498| div_loss: 0.68126| %_mask_idx: 0.42732| ppl: 203.99475| %_neg_is_pos: 0.00353| lr: 0.0| temp: 1.96139 | loss: 1.12775| constrast_loss: 4.44192| div_loss: 0.69101| %_mask_idx: 0.37218| ppl: 197.75552| %_neg_is_pos: 0.005| lr: 0.0| temp: 1.96139 | loss: 1.13421| constrast_loss: 4.46852| div_loss: 0.68336| %_mask_idx: 0.4032| ppl: 202.65204| %_neg_is_pos: 0.0024| lr: 0.0| temp: 1.96138 | loss: 1.13854| constrast_loss: 4.48545| div_loss: 0.68687| %_mask_idx: 0.44189| ppl: 200.40195| %_neg_is_pos: 0.00251| lr: 0.0| temp: 1.96138 | loss: 1.12441| constrast_loss: 4.42851| div_loss: 0.69127| %_mask_idx: 0.40648| ppl: 197.58778| %_neg_is_pos: 0.00255| lr: 0.0| temp: 1.96137 | loss: 1.1253| constrast_loss: 4.43055| div_loss: 0.70636| %_mask_idx: 0.35996| ppl: 187.9274| %_neg_is_pos: 0.00322| lr: 0.0| temp: 1.96137 | loss: 1.12681| constrast_loss: 4.43894| div_loss: 0.68304| %_mask_idx: 0.36952| ppl: 202.8555| %_neg_is_pos: 0.00392| lr: 0.0| temp: 1.96136 | loss: 1.13014| constrast_loss: 4.45169| div_loss: 0.68875| %_mask_idx: 0.43045| ppl: 199.19733| %_neg_is_pos: 0.00297| lr: 0.0| temp: 1.96136 | loss: 1.12355| constrast_loss: 4.42446| div_loss: 0.69726| %_mask_idx: 0.33302| ppl: 193.75101| %_neg_is_pos: 0.0052| lr: 0.0| temp: 1.96134 | loss: 1.13677| constrast_loss: 4.47949| div_loss: 0.67588| %_mask_idx: 0.36043| ppl: 207.43694| %_neg_is_pos: 0.00182| lr: 0.0| temp: 1.96134 | loss: 1.13454| constrast_loss: 4.46871| div_loss: 0.69455| %_mask_idx: 0.35385| ppl: 195.4886| %_neg_is_pos: 0.00495| lr: 0.0| temp: 1.96133 | loss: 1.13337| constrast_loss: 4.46497| div_loss: 0.6852| %_mask_idx: 0.40461| ppl: 201.4697| %_neg_is_pos: 0.00182| lr: 0.0| temp: 1.96133 | loss: 1.12696| constrast_loss: 4.43867| div_loss: 0.69173| %_mask_idx: 0.3631| ppl: 197.29504| %_neg_is_pos: 0.00322| lr: 0.0| temp: 1.96131 | loss: 1.12722| constrast_loss: 4.43909| div_loss: 0.69773| %_mask_idx: 0.33083| ppl: 193.45358| %_neg_is_pos: 0.00611| lr: 0.0| temp: 1.96131 | loss: 1.14095| constrast_loss: 4.49521| div_loss: 0.68576| %_mask_idx: 0.39474| ppl: 201.11325| %_neg_is_pos: 0.002| lr: 0.0| temp: 1.9613 | loss: 1.12738| constrast_loss: 4.44099| div_loss: 0.68517| %_mask_idx: 0.36404| ppl: 201.49237| %_neg_is_pos: 0.00464| lr: 0.0| temp: 1.9613 | loss: 1.13178| constrast_loss: 4.45865| div_loss: 0.68467| %_mask_idx: 0.3609| ppl: 201.8136| %_neg_is_pos: 0.00223| lr: 0.0| temp: 1.96129 | loss: 1.1379| constrast_loss: 4.48359| div_loss: 0.68003| %_mask_idx: 0.43029| ppl: 204.78238| %_neg_is_pos: 0.00165| lr: 0.0| temp: 1.96129 | loss: 1.13868| constrast_loss: 4.48668| div_loss: 0.6804| %_mask_idx: 0.41604| ppl: 204.54155| %_neg_is_pos: 0.00298| lr: 0.0| temp: 1.96128 | loss: 1.13428| constrast_loss: 4.46841| div_loss: 0.68723| %_mask_idx: 0.38409| ppl: 200.17288| %_neg_is_pos: 0.00327| lr: 0.0| temp: 1.96128 | loss: 1.13905| constrast_loss: 4.48878| div_loss: 0.67418| %_mask_idx: 0.41463| ppl: 208.52621| %_neg_is_pos: 0.00354| lr: 0.0| temp: 1.96126 | loss: 1.1334| constrast_loss: 4.46513| div_loss: 0.68471| %_mask_idx: 0.38111| ppl: 201.78787| %_neg_is_pos: 0.00227| lr: 0.0| temp: 1.96126 | loss: 1.13156| constrast_loss: 4.45764| div_loss: 0.68595| %_mask_idx: 0.41056| ppl: 200.9906| %_neg_is_pos: 0.00204| lr: 0.0| temp: 1.96125 | loss: 1.13213| constrast_loss: 4.45886| div_loss: 0.69651| %_mask_idx: 0.36137| ppl: 194.23615| %_neg_is_pos: 0.00417| lr: 0.0| temp: 1.96125 | loss: 1.13536| constrast_loss: 4.47128| div_loss: 0.70144| %_mask_idx: 0.42325| ppl: 191.08144| %_neg_is_pos: 0.00469| lr: 0.0| temp: 1.96124 | loss: 1.12822| constrast_loss: 4.44328| div_loss: 0.69597| %_mask_idx: 0.3891| ppl: 194.5769| %_neg_is_pos: 0.00272| lr: 0.0| temp: 1.96124 | loss: 1.13592| constrast_loss: 4.47506| div_loss: 0.68613| %_mask_idx: 0.42419| ppl: 200.87361| %_neg_is_pos: 0.00305| lr: 0.0| temp: 1.96123 | loss: 1.13565| constrast_loss: 4.47375| div_loss: 0.68852| %_mask_idx: 0.38205| ppl: 199.34734| %_neg_is_pos: 0.00413| lr: 0.0| temp: 1.96123 | loss: 1.13395| constrast_loss: 4.46761| div_loss: 0.68201| %_mask_idx: 0.41353| ppl: 203.51289| %_neg_is_pos: 0.00227| lr: 0.0| temp: 1.96121 | loss: 1.12898| constrast_loss: 4.44683| div_loss: 0.69087| %_mask_idx: 0.34586| ppl: 197.84117| %_neg_is_pos: 0.00307| lr: 0.0| temp: 1.96121 | loss: 1.12484| constrast_loss: 4.43029| div_loss: 0.69063| %_mask_idx: 0.36247| ppl: 197.995| %_neg_is_pos: 0.00458| lr: 0.0| temp: 1.9612 | loss: 1.12254| constrast_loss: 4.42124| div_loss: 0.68903| %_mask_idx: 0.38048| ppl: 199.01805| %_neg_is_pos: 0.00292| lr: 0.0| temp: 1.9612 | loss: 1.12598| constrast_loss: 4.43432| div_loss: 0.69604| %_mask_idx: 0.35338| ppl: 194.53665| %_neg_is_pos: 0.00482| lr: 0.0| temp: 1.96119 | loss: 1.12704| constrast_loss: 4.43872| div_loss: 0.69458| %_mask_idx: 0.39035| ppl: 195.47119| %_neg_is_pos: 0.00387| lr: 0.0| temp: 1.96119 | loss: 1.12615| constrast_loss: 4.43512| div_loss: 0.69461| %_mask_idx: 0.39646| ppl: 195.44827| %_neg_is_pos: 0.00318| lr: 0.0| temp: 1.96118 | loss: 1.13048| constrast_loss: 4.45339| div_loss: 0.6853| %_mask_idx: 0.34931| ppl: 201.40807| %_neg_is_pos: 0.0035| lr: 0.0| temp: 1.96118 | loss: 1.13165| constrast_loss: 4.45778| div_loss: 0.68806| %_mask_idx: 0.42325| ppl: 199.64377| %_neg_is_pos: 0.00346| lr: 0.0| temp: 1.96116 | loss: 1.13529| constrast_loss: 4.47337| div_loss: 0.67776| %_mask_idx: 0.414| ppl: 206.2345| %_neg_is_pos: 0.00243| lr: 0.0| temp: 1.96116 | loss: 1.13142| constrast_loss: 4.45713| div_loss: 0.68544| %_mask_idx: 0.39254| ppl: 201.31613| %_neg_is_pos: 0.00222| lr: 0.0| temp: 1.96115 | loss: 1.12399| constrast_loss: 4.42645| div_loss: 0.69494| %_mask_idx: 0.35683| ppl: 195.23624| %_neg_is_pos: 0.00336| lr: 0.0| temp: 1.96115 | loss: 1.13329| constrast_loss: 4.46439| div_loss: 0.68779| %_mask_idx: 0.34727| ppl: 199.81378| %_neg_is_pos: 0.00335| lr: 0.0| temp: 1.96113 | loss: 1.12067| constrast_loss: 4.41336| div_loss: 0.69307| %_mask_idx: 0.40962| ppl: 196.43515| %_neg_is_pos: 0.00395| lr: 0.0| temp: 1.96113 | loss: 1.13507| constrast_loss: 4.47066| div_loss: 0.69631| %_mask_idx: 0.39505| ppl: 194.36267| %_neg_is_pos: 0.00464| lr: 0.0| temp: 1.96112 | loss: 1.13204| constrast_loss: 4.45964| div_loss: 0.68508| %_mask_idx: 0.41134| ppl: 201.55093| %_neg_is_pos: 0.00244| lr: 0.0| temp: 1.96112 | loss: 1.12674| constrast_loss: 4.43822| div_loss: 0.6875| %_mask_idx: 0.39207| ppl: 200.00237| %_neg_is_pos: 0.00296| lr: 0.0| temp: 1.96111 | loss: 1.13546| constrast_loss: 4.47243| div_loss: 0.69394| %_mask_idx: 0.36685| ppl: 195.8773| %_neg_is_pos: 0.00519| lr: 0.0| temp: 1.96111 | loss: 1.13414| constrast_loss: 4.46827| div_loss: 0.68298| %_mask_idx: 0.41087| ppl: 202.89099| %_neg_is_pos: 0.00274| lr: 0.0| temp: 1.9611 | loss: 1.12329| constrast_loss: 4.42345| div_loss: 0.69698| %_mask_idx: 0.37829| ppl: 193.93196| %_neg_is_pos: 0.0042| lr: 0.0| temp: 1.9611 | loss: 1.13317| constrast_loss: 4.46446| div_loss: 0.68215| %_mask_idx: 0.38064| ppl: 203.4256| %_neg_is_pos: 0.00328| lr: 0.0| temp: 1.96108 | loss: 1.14587| constrast_loss: 4.51537| div_loss: 0.68117| %_mask_idx: 0.38894| ppl: 204.04855| %_neg_is_pos: 0.00206| lr: 0.0| temp: 1.96108 | loss: 1.13495| constrast_loss: 4.47124| div_loss: 0.6855| %_mask_idx: 0.36137| ppl: 201.27878| %_neg_is_pos: 0.00339| lr: 0.0| temp: 1.96107 | loss: 1.13758| constrast_loss: 4.48189| div_loss: 0.68435| %_mask_idx: 0.37343| ppl: 202.01312| %_neg_is_pos: 0.00311| lr: 0.0| temp: 1.96107 | loss: 1.14194| constrast_loss: 4.50002| div_loss: 0.67744| %_mask_idx: 0.41949| ppl: 206.44019| %_neg_is_pos: 0.00232| lr: 0.0| temp: 1.96106 | loss: 1.13759| constrast_loss: 4.48231| div_loss: 0.68042| %_mask_idx: 0.41714| ppl: 204.53145| %_neg_is_pos: 0.00187| lr: 0.0| temp: 1.96106 | loss: 1.13028| constrast_loss: 4.45234| div_loss: 0.6879| %_mask_idx: 0.34884| ppl: 199.74252| %_neg_is_pos: 0.0027| lr: 0.0| temp: 1.96105 | loss: 1.13203| constrast_loss: 4.45925| div_loss: 0.68854| %_mask_idx: 0.4162| ppl: 199.33653| %_neg_is_pos: 0.00411| lr: 0.0| temp: 1.96105 | loss: 1.12552| constrast_loss: 4.43266| div_loss: 0.69425| %_mask_idx: 0.39521| ppl: 195.67841| %_neg_is_pos: 0.00265| lr: 0.0| temp: 1.96103 | loss: 1.13447| constrast_loss: 4.46815| div_loss: 0.69739| %_mask_idx: 0.38863| ppl: 193.67102| %_neg_is_pos: 0.00341| lr: 0.0| temp: 1.96103 | loss: 1.13673| constrast_loss: 4.47903| div_loss: 0.67889| %_mask_idx: 0.38471| ppl: 205.51239| %_neg_is_pos: 0.00429| lr: 0.0| temp: 1.96102 | loss: 1.13611| constrast_loss: 4.47569| div_loss: 0.6873| %_mask_idx: 0.43233| ppl: 200.13055| %_neg_is_pos: 0.00291| lr: 0.0| temp: 1.96102 | loss: 1.13101| constrast_loss: 4.45479| div_loss: 0.69257| %_mask_idx: 0.36889| ppl: 196.75565| %_neg_is_pos: 0.00279| lr: 0.0| temp: 1.96101 | loss: 1.13835| constrast_loss: 4.48451| div_loss: 0.68896| %_mask_idx: 0.4162| ppl: 199.06693| %_neg_is_pos: 0.0038| lr: 0.0| temp: 1.96101 | loss: 1.13868| constrast_loss: 4.48602| div_loss: 0.68713| %_mask_idx: 0.40523| ppl: 200.2339| %_neg_is_pos: 0.00228| lr: 0.0| temp: 1.96101 | loss: 1.12871| constrast_loss: 4.44598| div_loss: 0.68864| %_mask_idx: 0.3974| ppl: 199.26997| %_neg_is_pos: 0.00371| lr: 0.0| temp: 1.96101 | loss: 1.13219| constrast_loss: 4.45934| div_loss: 0.69424| %_mask_idx: 0.35761| ppl: 195.68425| %_neg_is_pos: 0.00379| lr: 0.0| temp: 1.96099 | loss: 1.12539| constrast_loss: 4.43302| div_loss: 0.68546| %_mask_idx: 0.42246| ppl: 201.30627| %_neg_is_pos: 0.00326| lr: 0.0| temp: 1.96099 | loss: 1.14383| constrast_loss: 4.50714| div_loss: 0.68176| %_mask_idx: 0.40492| ppl: 203.67514| %_neg_is_pos: 0.00252| lr: 0.0| temp: 1.96098 | loss: 1.12502| constrast_loss: 4.43109| div_loss: 0.69| %_mask_idx: 0.388| ppl: 198.40115| %_neg_is_pos: 0.00457| lr: 0.0| temp: 1.96098 [2021-09-02 05:43:29,968] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 05:43:29,968] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.13793| constrast_loss: 4.48315| div_loss: 0.68562| %_mask_idx: 0.35511| ppl: 201.20172| %_neg_is_pos: 0.00378| lr: 0.0| temp: 1.96096 | loss: 1.13697| constrast_loss: 4.4792| div_loss: 0.68705| %_mask_idx: 0.41338| ppl: 200.28946| %_neg_is_pos: 0.00339| lr: 0.0| temp: 1.96096 | loss: 1.13004| constrast_loss: 4.45035| div_loss: 0.69805| %_mask_idx: 0.4115| ppl: 193.24586| %_neg_is_pos: 0.00332| lr: 0.0| temp: 1.96095 | loss: 1.13088| constrast_loss: 4.45378| div_loss: 0.69764| %_mask_idx: 0.40429| ppl: 193.51263| %_neg_is_pos: 0.0031| lr: 0.0| temp: 1.96095 | loss: 1.13312| constrast_loss: 4.46241| div_loss: 0.70082| %_mask_idx: 0.33521| ppl: 191.47264| %_neg_is_pos: 0.00405| lr: 0.0| temp: 1.96094 | loss: 1.13255| constrast_loss: 4.46156| div_loss: 0.68655| %_mask_idx: 0.36341| ppl: 200.6062| %_neg_is_pos: 0.00315| lr: 0.0| temp: 1.96094 | loss: 1.13231| constrast_loss: 4.46054| div_loss: 0.68689| %_mask_idx: 0.4281| ppl: 200.3891| %_neg_is_pos: 0.00281| lr: 0.0| temp: 1.96093 | loss: 1.14331| constrast_loss: 4.50544| div_loss: 0.67815| %_mask_idx: 0.43813| ppl: 205.98526| %_neg_is_pos: 0.00186| lr: 0.0| temp: 1.96093 | loss: 1.13924| constrast_loss: 4.48636| div_loss: 0.7058| %_mask_idx: 0.35934| ppl: 188.29097| %_neg_is_pos: 0.00328| lr: 0.0| temp: 1.96091 | loss: 1.12775| constrast_loss: 4.44112| div_loss: 0.69871| %_mask_idx: 0.43468| ppl: 192.82332| %_neg_is_pos: 0.00328| lr: 0.0| temp: 1.96091 | loss: 1.13857| constrast_loss: 4.48539| div_loss: 0.68899| %_mask_idx: 0.43954| ppl: 199.04369| %_neg_is_pos: 0.00145| lr: 0.0| temp: 1.9609 | loss: 1.14016| constrast_loss: 4.49113| div_loss: 0.69504| %_mask_idx: 0.38816| ppl: 195.17392| %_neg_is_pos: 0.00269| lr: 0.0| temp: 1.9609 | loss: 1.1348| constrast_loss: 4.46965| div_loss: 0.69548| %_mask_idx: 0.40836| ppl: 194.8938| %_neg_is_pos: 0.00333| lr: 0.0| temp: 1.96089 | loss: 1.13771| constrast_loss: 4.4823| div_loss: 0.68544| %_mask_idx: 0.39113| ppl: 201.31763| %_neg_is_pos: 0.00167| lr: 0.0| temp: 1.96089 | loss: 1.12564| constrast_loss: 4.43296| div_loss: 0.69615| %_mask_idx: 0.39176| ppl: 194.46371| %_neg_is_pos: 0.00277| lr: 0.0| temp: 1.96088 | loss: 1.13171| constrast_loss: 4.45751| div_loss: 0.69327| %_mask_idx: 0.35056| ppl: 196.30855| %_neg_is_pos: 0.00234| lr: 0.0| temp: 1.96088 | loss: 1.13412| constrast_loss: 4.46685| div_loss: 0.6962| %_mask_idx: 0.40523| ppl: 194.4348| %_neg_is_pos: 0.00167| lr: 0.0| temp: 1.96086 | loss: 1.13593| constrast_loss: 4.47477| div_loss: 0.68962| %_mask_idx: 0.35542| ppl: 198.64438| %_neg_is_pos: 0.00229| lr: 0.0| temp: 1.96086 | loss: 1.13071| constrast_loss: 4.45369| div_loss: 0.69161| %_mask_idx: 0.31751| ppl: 197.36755| %_neg_is_pos: 0.00456| lr: 0.0| temp: 1.96085 | loss: 1.13838| constrast_loss: 4.48472| div_loss: 0.68805| %_mask_idx: 0.42434| ppl: 199.65009| %_neg_is_pos: 0.00265| lr: 0.0| temp: 1.96085 | loss: 1.14188| constrast_loss: 4.49736| div_loss: 0.70155| %_mask_idx: 0.388| ppl: 191.00836| %_neg_is_pos: 0.00421| lr: 0.0| temp: 1.96084 | loss: 1.13481| constrast_loss: 4.47032| div_loss: 0.68919| %_mask_idx: 0.38252| ppl: 198.91586| %_neg_is_pos: 0.00536| lr: 0.0| temp: 1.96084 | loss: 1.12142| constrast_loss: 4.41472| div_loss: 0.70979| %_mask_idx: 0.34289| ppl: 185.73499| %_neg_is_pos: 0.00558| lr: 0.0| temp: 1.96083 | loss: 1.12942| constrast_loss: 4.4475| div_loss: 0.70185| %_mask_idx: 0.37218| ppl: 190.81474| %_neg_is_pos: 0.00311| lr: 0.0| temp: 1.96083 | loss: 1.12623| constrast_loss: 4.43627| div_loss: 0.6865| %_mask_idx: 0.35135| ppl: 200.64034| %_neg_is_pos: 0.00297| lr: 0.0| temp: 1.96081 | loss: 1.12913| constrast_loss: 4.44739| div_loss: 0.69145| %_mask_idx: 0.42935| ppl: 197.47046| %_neg_is_pos: 0.0032| lr: 0.0| temp: 1.96081 | loss: 1.12421| constrast_loss: 4.42706| div_loss: 0.69765| %_mask_idx: 0.41432| ppl: 193.5034| %_neg_is_pos: 0.00343| lr: 0.0| temp: 1.9608 | loss: 1.12975| constrast_loss: 4.44908| div_loss: 0.69921| %_mask_idx: 0.42544| ppl: 192.50867| %_neg_is_pos: 0.00386| lr: 0.0| temp: 1.9608 | loss: 1.13272| constrast_loss: 4.46175| div_loss: 0.69121| %_mask_idx: 0.40257| ppl: 197.62805| %_neg_is_pos: 0.00306| lr: 0.0| temp: 1.96078 | loss: 1.13486| constrast_loss: 4.4707| div_loss: 0.68751| %_mask_idx: 0.36952| ppl: 199.99268| %_neg_is_pos: 0.00472| lr: 0.0| temp: 1.96078 | loss: 1.11936| constrast_loss: 4.40683| div_loss: 0.70616| %_mask_idx: 0.37813| ppl: 188.05756| %_neg_is_pos: 0.00432| lr: 0.0| temp: 1.96077 | loss: 1.13863| constrast_loss: 4.48536| div_loss: 0.69157| %_mask_idx: 0.34211| ppl: 197.3981| %_neg_is_pos: 0.00426| lr: 0.0| temp: 1.96077 | loss: 1.13441| constrast_loss: 4.46789| div_loss: 0.69734| %_mask_idx: 0.42262| ppl: 193.70071| %_neg_is_pos: 0.00297| lr: 0.0| temp: 1.96076 | loss: 1.11664| constrast_loss: 4.39433| div_loss: 0.72207| %_mask_idx: 0.36451| ppl: 177.87332| %_neg_is_pos: 0.00725| lr: 0.0| temp: 1.96076 | loss: 1.13377| constrast_loss: 4.46592| div_loss: 0.6916| %_mask_idx: 0.39082| ppl: 197.37898| %_neg_is_pos: 0.00249| lr: 0.0| temp: 1.96075 | loss: 1.13697| constrast_loss: 4.47727| div_loss: 0.70593| %_mask_idx: 0.43061| ppl: 188.20467| %_neg_is_pos: 0.00186| lr: 0.0| temp: 1.96075 | loss: 1.12855| constrast_loss: 4.44536| div_loss: 0.68823| %_mask_idx: 0.39301| ppl: 199.53487| %_neg_is_pos: 0.00269| lr: 0.0| temp: 1.96073 | loss: 1.13088| constrast_loss: 4.45528| div_loss: 0.68253| %_mask_idx: 0.36278| ppl: 203.17955| %_neg_is_pos: 0.00319| lr: 0.0| temp: 1.96073 | loss: 1.12881| constrast_loss: 4.4468| div_loss: 0.68423| %_mask_idx: 0.37343| ppl: 202.0899| %_neg_is_pos: 0.00384| lr: 0.0| temp: 1.96072 | loss: 1.14682| constrast_loss: 4.51888| div_loss: 0.68385| %_mask_idx: 0.44408| ppl: 202.33531| %_neg_is_pos: 0.00159| lr: 0.0| temp: 1.96072 | loss: 1.13916| constrast_loss: 4.48783| div_loss: 0.68821| %_mask_idx: 0.43797| ppl: 199.5434| %_neg_is_pos: 0.00185| lr: 0.0| temp: 1.96071 | loss: 1.12955| constrast_loss: 4.4482| div_loss: 0.69986| %_mask_idx: 0.41463| ppl: 192.08875| %_neg_is_pos: 0.00276| lr: 0.0| temp: 1.96071 | loss: 1.12371| constrast_loss: 4.425| div_loss: 0.69837| %_mask_idx: 0.39427| ppl: 193.04041| %_neg_is_pos: 0.00378| lr: 0.0| temp: 1.9607 | loss: 1.13326| constrast_loss: 4.46346| div_loss: 0.69582| %_mask_idx: 0.40695| ppl: 194.67538| %_neg_is_pos: 0.00271| lr: 0.0| temp: 1.9607 | loss: 1.1239| constrast_loss: 4.42509| div_loss: 0.70506| %_mask_idx: 0.38299| ppl: 188.76003| %_neg_is_pos: 0.00483| lr: 0.0| temp: 1.96068 | loss: 1.13341| constrast_loss: 4.4648| div_loss: 0.68844| %_mask_idx: 0.43139| ppl: 199.39612| %_neg_is_pos: 0.00216| lr: 0.0| temp: 1.96068 | loss: 1.13008| constrast_loss: 4.44997| div_loss: 0.70347| %_mask_idx: 0.32769| ppl: 189.77686| %_neg_is_pos: 0.00392| lr: 0.0| temp: 1.96067 | loss: 1.14428| constrast_loss: 4.50839| div_loss: 0.68749| %_mask_idx: 0.45269| ppl: 200.009| %_neg_is_pos: 0.00235| lr: 0.0| temp: 1.96067 | loss: 1.12089| constrast_loss: 4.41381| div_loss: 0.69733| %_mask_idx: 0.36952| ppl: 193.71136| %_neg_is_pos: 0.00348| lr: 0.0| temp: 1.96066 | loss: 1.13389| constrast_loss: 4.46663| div_loss: 0.68941| %_mask_idx: 0.32581| ppl: 198.77945| %_neg_is_pos: 0.00339| lr: 0.0| temp: 1.96066 | loss: 1.13766| constrast_loss: 4.48179| div_loss: 0.68853| %_mask_idx: 0.35965| ppl: 199.34282| %_neg_is_pos: 0.00436| lr: 0.0| temp: 1.96065 | loss: 1.13293| constrast_loss: 4.46243| div_loss: 0.69301| %_mask_idx: 0.40085| ppl: 196.47493| %_neg_is_pos: 0.00304| lr: 0.0| temp: 1.96065 | loss: 1.14058| constrast_loss: 4.49345| div_loss: 0.68857| %_mask_idx: 0.40022| ppl: 199.31824| %_neg_is_pos: 0.00289| lr: 0.0| temp: 1.96063 | loss: 1.12765| constrast_loss: 4.44072| div_loss: 0.6988| %_mask_idx: 0.33443| ppl: 192.76862| %_neg_is_pos: 0.00479| lr: 0.0| temp: 1.96063 | loss: 1.12584| constrast_loss: 4.43323| div_loss: 0.70135| %_mask_idx: 0.38315| ppl: 191.13541| %_neg_is_pos: 0.00398| lr: 0.0| temp: 1.96062 | loss: 1.13184| constrast_loss: 4.45884| div_loss: 0.68505| %_mask_idx: 0.38503| ppl: 201.56955| %_neg_is_pos: 0.00298| lr: 0.0| temp: 1.96062 | loss: 1.12022| constrast_loss: 4.41004| div_loss: 0.70856| %_mask_idx: 0.41573| ppl: 186.52045| %_neg_is_pos: 0.0039| lr: 0.0| temp: 1.9606 | loss: 1.14279| constrast_loss: 4.50281| div_loss: 0.68347| %_mask_idx: 0.38753| ppl: 202.5795| %_neg_is_pos: 0.00253| lr: 0.0| temp: 1.9606 | loss: 1.13076| constrast_loss: 4.45462| div_loss: 0.68406| %_mask_idx: 0.401| ppl: 202.19968| %_neg_is_pos: 0.00275| lr: 0.0| temp: 1.96059 | loss: 1.13051| constrast_loss: 4.45213| div_loss: 0.69907| %_mask_idx: 0.42826| ppl: 192.59218| %_neg_is_pos: 0.00359| lr: 0.0| temp: 1.96059 | loss: 1.11891| constrast_loss: 4.40519| div_loss: 0.70454| %_mask_idx: 0.38221| ppl: 189.09433| %_neg_is_pos: 0.00326| lr: 0.0| temp: 1.96058 | loss: 1.13116| constrast_loss: 4.45511| div_loss: 0.69518| %_mask_idx: 0.43264| ppl: 195.0872| %_neg_is_pos: 0.00293| lr: 0.0| temp: 1.96058 | loss: 1.11813| constrast_loss: 4.40272| div_loss: 0.69782| %_mask_idx: 0.3714| ppl: 193.3965| %_neg_is_pos: 0.00453| lr: 0.0| temp: 1.96057 | loss: 1.13151| constrast_loss: 4.45604| div_loss: 0.6998| %_mask_idx: 0.39709| ppl: 192.13112| %_neg_is_pos: 0.00304| lr: 0.0| temp: 1.96057 | loss: 1.12928| constrast_loss: 4.44737| div_loss: 0.69763| %_mask_idx: 0.3161| ppl: 193.51831| %_neg_is_pos: 0.00427| lr: 0.0| temp: 1.96055 | loss: 1.14035| constrast_loss: 4.49263| div_loss: 0.6876| %_mask_idx: 0.44204| ppl: 199.93443| %_neg_is_pos: 0.00189| lr: 0.0| temp: 1.96055 | loss: 1.13362| constrast_loss: 4.46554| div_loss: 0.68943| %_mask_idx: 0.39912| ppl: 198.76672| %_neg_is_pos: 0.00365| lr: 0.0| temp: 1.96054 | loss: 1.13288| constrast_loss: 4.46181| div_loss: 0.69707| %_mask_idx: 0.375| ppl: 193.87836| %_neg_is_pos: 0.00383| lr: 0.0| temp: 1.96054 | loss: 1.13905| constrast_loss: 4.48757| div_loss: 0.68624| %_mask_idx: 0.42011| ppl: 200.80692| %_neg_is_pos: 0.00193| lr: 0.0| temp: 1.96053 | loss: 1.13226| constrast_loss: 4.45948| div_loss: 0.69551| %_mask_idx: 0.38612| ppl: 194.87666| %_neg_is_pos: 0.00294| lr: 0.0| temp: 1.96053 | loss: 1.12781| constrast_loss: 4.4412| div_loss: 0.70028| %_mask_idx: 0.35448| ppl: 191.82285| %_neg_is_pos: 0.00435| lr: 0.0| temp: 1.96052 | loss: 1.13014| constrast_loss: 4.45165| div_loss: 0.68895| %_mask_idx: 0.36779| ppl: 199.07056| %_neg_is_pos: 0.00257| lr: 0.0| temp: 1.96052 | loss: 1.13924| constrast_loss: 4.48732| div_loss: 0.69624| %_mask_idx: 0.39897| ppl: 194.40714| %_neg_is_pos: 0.00429| lr: 0.0| temp: 1.96051 | loss: 1.14568| constrast_loss: 4.51462| div_loss: 0.68114| %_mask_idx: 0.38753| ppl: 204.0723| %_neg_is_pos: 0.00205| lr: 0.0| temp: 1.96051 | loss: 1.1332| constrast_loss: 4.46277| div_loss: 0.70022| %_mask_idx: 0.36122| ppl: 191.86185| %_neg_is_pos: 0.00536| lr: 0.0| temp: 1.9605 | loss: 1.12603| constrast_loss: 4.4346| div_loss: 0.69515| %_mask_idx: 0.41118| ppl: 195.10434| %_neg_is_pos: 0.00369| lr: 0.0| temp: 1.9605 | loss: 1.12711| constrast_loss: 4.43958| div_loss: 0.68867| %_mask_idx: 0.35025| ppl: 199.25142| %_neg_is_pos: 0.00319| lr: 0.0| temp: 1.96049 | loss: 1.1334| constrast_loss: 4.46484| div_loss: 0.68754| %_mask_idx: 0.37641| ppl: 199.97571| %_neg_is_pos: 0.00362| lr: 0.0| temp: 1.96049 | loss: 1.1389| constrast_loss: 4.48694| div_loss: 0.68656| %_mask_idx: 0.38581| ppl: 200.60074| %_neg_is_pos: 0.00337| lr: 0.0| temp: 1.96048 | loss: 1.12932| constrast_loss: 4.44748| div_loss: 0.69792| %_mask_idx: 0.41118| ppl: 193.33423| %_neg_is_pos: 0.00335| lr: 0.0| temp: 1.96048 | loss: 1.13618| constrast_loss: 4.47531| div_loss: 0.69429| %_mask_idx: 0.3786| ppl: 195.65761| %_neg_is_pos: 0.00305| lr: 0.0| temp: 1.96046 | loss: 1.12967| constrast_loss: 4.45005| div_loss: 0.6865| %_mask_idx: 0.35855| ppl: 200.64137| %_neg_is_pos: 0.00336| lr: 0.0| temp: 1.96046 | loss: 1.13115| constrast_loss: 4.45516| div_loss: 0.69432| %_mask_idx: 0.38409| ppl: 195.63757| %_neg_is_pos: 0.00457| lr: 0.0| temp: 1.96045 | loss: 1.12641| constrast_loss: 4.43548| div_loss: 0.70171| %_mask_idx: 0.40351| ppl: 190.90843| %_neg_is_pos: 0.00364| lr: 0.0| temp: 1.96045 | loss: 1.12517| constrast_loss: 4.43133| div_loss: 0.6935| %_mask_idx: 0.38048| ppl: 196.16003| %_neg_is_pos: 0.00296| lr: 0.0| temp: 1.96043 | loss: 1.13933| constrast_loss: 4.48937| div_loss: 0.6795| %_mask_idx: 0.38894| ppl: 205.11786| %_neg_is_pos: 0.00202| lr: 0.0| temp: 1.96043 | loss: 1.12961| constrast_loss: 4.44858| div_loss: 0.69851| %_mask_idx: 0.40602| ppl: 192.95476| %_neg_is_pos: 0.00217| lr: 0.0| temp: 1.96042 | loss: 1.13538| constrast_loss: 4.47218| div_loss: 0.69341| %_mask_idx: 0.40335| ppl: 196.21992| %_neg_is_pos: 0.00309| lr: 0.0| temp: 1.96042 | loss: 1.12528| constrast_loss: 4.43167| div_loss: 0.69443| %_mask_idx: 0.37061| ppl: 195.56528| %_neg_is_pos: 0.00323| lr: 0.0| temp: 1.96041 | loss: 1.12633| constrast_loss: 4.43483| div_loss: 0.70483| %_mask_idx: 0.401| ppl: 188.90967| %_neg_is_pos: 0.00284| lr: 0.0| temp: 1.96041 | loss: 1.1278| constrast_loss: 4.44159| div_loss: 0.69632| %_mask_idx: 0.38017| ppl: 194.35403| %_neg_is_pos: 0.00301| lr: 0.0| temp: 1.9604 | loss: 1.1322| constrast_loss: 4.45908| div_loss: 0.69703| %_mask_idx: 0.39411| ppl: 193.89816| %_neg_is_pos: 0.0033| lr: 0.0| temp: 1.9604 | loss: 1.13717| constrast_loss: 4.47984| div_loss: 0.68826| %_mask_idx: 0.41761| ppl: 199.51398| %_neg_is_pos: 0.00246| lr: 0.0| temp: 1.96038 | loss: 1.12508| constrast_loss: 4.43002| div_loss: 0.703| %_mask_idx: 0.37813| ppl: 190.08202| %_neg_is_pos: 0.00354| lr: 0.0| temp: 1.96038 | loss: 1.13522| constrast_loss: 4.47242| div_loss: 0.68463| %_mask_idx: 0.38017| ppl: 201.83902| %_neg_is_pos: 0.00232| lr: 0.0| temp: 1.96037 | loss: 1.12599| constrast_loss: 4.4348| div_loss: 0.69163| %_mask_idx: 0.45818| ppl: 197.35974| %_neg_is_pos: 0.00295| lr: 0.0| temp: 1.96037 | loss: 1.13136| constrast_loss: 4.456| div_loss: 0.69429| %_mask_idx: 0.35652| ppl: 195.6517| %_neg_is_pos: 0.0039| lr: 0.0| temp: 1.96036 | loss: 1.13284| constrast_loss: 4.46205| div_loss: 0.693| %_mask_idx: 0.42168| ppl: 196.48099| %_neg_is_pos: 0.00411| lr: 0.0| temp: 1.96036 | loss: 1.13293| constrast_loss: 4.46223| div_loss: 0.69476| %_mask_idx: 0.39944| ppl: 195.3566| %_neg_is_pos: 0.00324| lr: 0.0| temp: 1.96035 | loss: 1.1394| constrast_loss: 4.48901| div_loss: 0.68586| %_mask_idx: 0.4256| ppl: 201.05113| %_neg_is_pos: 0.0024| lr: 0.0| temp: 1.96035 | loss: 1.12627| constrast_loss: 4.43567| div_loss: 0.69415| %_mask_idx: 0.44471| ppl: 195.74504| %_neg_is_pos: 0.00226| lr: 0.0| temp: 1.96033 | loss: 1.12835| constrast_loss: 4.44369| div_loss: 0.69699| %_mask_idx: 0.36169| ppl: 193.92331| %_neg_is_pos: 0.00462| lr: 0.0| temp: 1.96033 | loss: 1.11416| constrast_loss: 4.38525| div_loss: 0.7139| %_mask_idx: 0.35636| ppl: 183.10632| %_neg_is_pos: 0.00576| lr: 0.0| temp: 1.96032 | loss: 1.13553| constrast_loss: 4.47263| div_loss: 0.69499| %_mask_idx: 0.35103| ppl: 195.20352| %_neg_is_pos: 0.00293| lr: 0.0| temp: 1.96032 | loss: 1.12911| constrast_loss: 4.44687| div_loss: 0.69575| %_mask_idx: 0.30843| ppl: 194.72282| %_neg_is_pos: 0.00488| lr: 0.0| temp: 1.96031 | loss: 1.13404| constrast_loss: 4.46636| div_loss: 0.69818| %_mask_idx: 0.40774| ppl: 193.16428| %_neg_is_pos: 0.00435| lr: 0.0| temp: 1.96031 | loss: 1.13823| constrast_loss: 4.48449| div_loss: 0.6844| %_mask_idx: 0.39019| ppl: 201.9823| %_neg_is_pos: 0.00217| lr: 0.0| temp: 1.9603 | loss: 1.12951| constrast_loss: 4.44878| div_loss: 0.69267| %_mask_idx: 0.39035| ppl: 196.69031| %_neg_is_pos: 0.00342| lr: 0.0| temp: 1.9603 | loss: 1.1289| constrast_loss: 4.4469| div_loss: 0.68682| %_mask_idx: 0.36106| ppl: 200.43756| %_neg_is_pos: 0.00377| lr: 0.0| temp: 1.96028 | loss: 1.11455| constrast_loss: 4.3874| div_loss: 0.70796| %_mask_idx: 0.33756| ppl: 186.90652| %_neg_is_pos: 0.00363| lr: 0.0| temp: 1.96028 | loss: 1.13441| constrast_loss: 4.46835| div_loss: 0.69308| %_mask_idx: 0.39709| ppl: 196.43118| %_neg_is_pos: 0.00263| lr: 0.0| temp: 1.96027 | loss: 1.12354| constrast_loss: 4.42466| div_loss: 0.69487| %_mask_idx: 0.36216| ppl: 195.28391| %_neg_is_pos: 0.00365| lr: 0.0| temp: 1.96027 [2021-09-02 05:52:43,166] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 05:52:43,166] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.12898| constrast_loss: 4.44643| div_loss: 0.6947| %_mask_idx: 0.36028| ppl: 195.38965| %_neg_is_pos: 0.00478| lr: 0.0| temp: 1.96025 | loss: 1.13061| constrast_loss: 4.45283| div_loss: 0.69622| %_mask_idx: 0.39223| ppl: 194.4189| %_neg_is_pos: 0.00372| lr: 0.0| temp: 1.96025 | loss: 1.11821| constrast_loss: 4.40248| div_loss: 0.70368| %_mask_idx: 0.34884| ppl: 189.64581| %_neg_is_pos: 0.00571| lr: 0.0| temp: 1.96024 | loss: 1.13888| constrast_loss: 4.48627| div_loss: 0.69236| %_mask_idx: 0.40304| ppl: 196.88873| %_neg_is_pos: 0.00302| lr: 0.0| temp: 1.96024 | loss: 1.13234| constrast_loss: 4.45961| div_loss: 0.69754| %_mask_idx: 0.38863| ppl: 193.5766| %_neg_is_pos: 0.0033| lr: 0.0| temp: 1.96023 | loss: 1.1277| constrast_loss: 4.44149| div_loss: 0.69317| %_mask_idx: 0.41134| ppl: 196.37117| %_neg_is_pos: 0.00411| lr: 0.0| temp: 1.96023 | loss: 1.13129| constrast_loss: 4.45477| div_loss: 0.70383| %_mask_idx: 0.38831| ppl: 189.54694| %_neg_is_pos: 0.00323| lr: 0.0| temp: 1.96022 | loss: 1.13183| constrast_loss: 4.45813| div_loss: 0.69193| %_mask_idx: 0.39865| ppl: 197.16235| %_neg_is_pos: 0.00171| lr: 0.0| temp: 1.96022 | loss: 1.13493| constrast_loss: 4.47076| div_loss: 0.6895| %_mask_idx: 0.38769| ppl: 198.71841| %_neg_is_pos: 0.00372| lr: 0.0| temp: 1.9602 | loss: 1.13429| constrast_loss: 4.46807| div_loss: 0.69078| %_mask_idx: 0.39145| ppl: 197.89964| %_neg_is_pos: 0.00277| lr: 0.0| temp: 1.9602 | loss: 1.13285| constrast_loss: 4.46126| div_loss: 0.70134| %_mask_idx: 0.43609| ppl: 191.14503| %_neg_is_pos: 0.0026| lr: 0.0| temp: 1.96019 | loss: 1.13325| constrast_loss: 4.46462| div_loss: 0.68371| %_mask_idx: 0.4422| ppl: 202.42558| %_neg_is_pos: 0.00191| lr: 0.0| temp: 1.96019 | loss: 1.13847| constrast_loss: 4.48548| div_loss: 0.68414| %_mask_idx: 0.34915| ppl: 202.15295| %_neg_is_pos: 0.00235| lr: 0.0| temp: 1.96018 | loss: 1.14002| constrast_loss: 4.4908| div_loss: 0.69267| %_mask_idx: 0.42152| ppl: 196.69238| %_neg_is_pos: 0.00232| lr: 0.0| temp: 1.96018 | loss: 1.13653| constrast_loss: 4.47643| div_loss: 0.69674| %_mask_idx: 0.40147| ppl: 194.08501| %_neg_is_pos: 0.00168| lr: 0.0| temp: 1.96017 | loss: 1.13464| constrast_loss: 4.46941| div_loss: 0.69157| %_mask_idx: 0.37672| ppl: 197.39545| %_neg_is_pos: 0.00263| lr: 0.0| temp: 1.96017 | loss: 1.12839| constrast_loss: 4.44383| div_loss: 0.69724| %_mask_idx: 0.36795| ppl: 193.76636| %_neg_is_pos: 0.00328| lr: 0.0| temp: 1.96015 | loss: 1.12804| constrast_loss: 4.44252| div_loss: 0.69635| %_mask_idx: 0.34665| ppl: 194.33307| %_neg_is_pos: 0.00277| lr: 0.0| temp: 1.96015 | loss: 1.13504| constrast_loss: 4.47066| div_loss: 0.69513| %_mask_idx: 0.39677| ppl: 195.11417| %_neg_is_pos: 0.00352| lr: 0.0| temp: 1.96014 | loss: 1.12499| constrast_loss: 4.43029| div_loss: 0.69648| %_mask_idx: 0.38174| ppl: 194.25107| %_neg_is_pos: 0.00415| lr: 0.0| temp: 1.96014 | loss: 1.12239| constrast_loss: 4.4193| div_loss: 0.70262| %_mask_idx: 0.35636| ppl: 190.32085| %_neg_is_pos: 0.00487| lr: 0.0| temp: 1.96013 | loss: 1.13063| constrast_loss: 4.45214| div_loss: 0.70387| %_mask_idx: 0.39489| ppl: 189.52454| %_neg_is_pos: 0.00529| lr: 0.0| temp: 1.96013 | loss: 1.13611| constrast_loss: 4.47528| div_loss: 0.69162| %_mask_idx: 0.3443| ppl: 197.366| %_neg_is_pos: 0.00408| lr: 0.0| temp: 1.96012 | loss: 1.12632| constrast_loss: 4.43568| div_loss: 0.6961| %_mask_idx: 0.39051| ppl: 194.4967| %_neg_is_pos: 0.00365| lr: 0.0| temp: 1.96012 | loss: 1.139| constrast_loss: 4.4868| div_loss: 0.69186| %_mask_idx: 0.42747| ppl: 197.21198| %_neg_is_pos: 0.0015| lr: 0.0| temp: 1.9601 | loss: 1.13489| constrast_loss: 4.46977| div_loss: 0.69807| %_mask_idx: 0.40868| ppl: 193.23804| %_neg_is_pos: 0.00268| lr: 0.0| temp: 1.9601 | loss: 1.13529| constrast_loss: 4.47058| div_loss: 0.70559| %_mask_idx: 0.39677| ppl: 188.4245| %_neg_is_pos: 0.00447| lr: 0.0| temp: 1.96009 | loss: 1.12442| constrast_loss: 4.42834| div_loss: 0.69358| %_mask_idx: 0.33678| ppl: 196.10818| %_neg_is_pos: 0.00365| lr: 0.0| temp: 1.96009 | loss: 1.13258| constrast_loss: 4.46012| div_loss: 0.70194| %_mask_idx: 0.39944| ppl: 190.75562| %_neg_is_pos: 0.0035| lr: 0.0| temp: 1.96007 | loss: 1.13361| constrast_loss: 4.46474| div_loss: 0.69684| %_mask_idx: 0.37202| ppl: 194.02025| %_neg_is_pos: 0.00274| lr: 0.0| temp: 1.96007 | loss: 1.12102| constrast_loss: 4.41304| div_loss: 0.71043| %_mask_idx: 0.46977| ppl: 185.32477| %_neg_is_pos: 0.00287| lr: 0.0| temp: 1.96006 | loss: 1.13298| constrast_loss: 4.46128| div_loss: 0.70618| %_mask_idx: 0.40445| ppl: 188.04709| %_neg_is_pos: 0.00395| lr: 0.0| temp: 1.96006 | loss: 1.12021| constrast_loss: 4.41007| div_loss: 0.70747| %_mask_idx: 0.38503| ppl: 187.22104| %_neg_is_pos: 0.00311| lr: 0.0| temp: 1.96005 | loss: 1.13345| constrast_loss: 4.4649| div_loss: 0.68892| %_mask_idx: 0.37798| ppl: 199.08865| %_neg_is_pos: 0.00269| lr: 0.0| temp: 1.96005 | loss: 1.13761| constrast_loss: 4.48068| div_loss: 0.69763| %_mask_idx: 0.45019| ppl: 193.51555| %_neg_is_pos: 0.0025| lr: 0.0| temp: 1.96004 | loss: 1.1228| constrast_loss: 4.42061| div_loss: 0.70607| %_mask_idx: 0.375| ppl: 188.11824| %_neg_is_pos: 0.00336| lr: 0.0| temp: 1.96004 | loss: 1.12919| constrast_loss: 4.44646| div_loss: 0.70282| %_mask_idx: 0.388| ppl: 190.19232| %_neg_is_pos: 0.00322| lr: 0.0| temp: 1.96002 | loss: 1.14033| constrast_loss: 4.49156| div_loss: 0.69761| %_mask_idx: 0.42528| ppl: 193.52896| %_neg_is_pos: 0.00234| lr: 0.0| temp: 1.96002 | loss: 1.14177| constrast_loss: 4.49795| div_loss: 0.69129| %_mask_idx: 0.40789| ppl: 197.5741| %_neg_is_pos: 0.00197| lr: 0.0| temp: 1.96002 | loss: 1.13296| constrast_loss: 4.46091| div_loss: 0.70928| %_mask_idx: 0.39348| ppl: 186.05789| %_neg_is_pos: 0.00486| lr: 0.0| temp: 1.96002 | loss: 1.13122| constrast_loss: 4.45531| div_loss: 0.69588| %_mask_idx: 0.39975| ppl: 194.63667| %_neg_is_pos: 0.00355| lr: 0.0| temp: 1.96001 | loss: 1.13777| constrast_loss: 4.48073| div_loss: 0.70339| %_mask_idx: 0.41839| ppl: 189.83093| %_neg_is_pos: 0.0032| lr: 0.0| temp: 1.96001 | loss: 1.1399| constrast_loss: 4.48998| div_loss: 0.696| %_mask_idx: 0.37657| ppl: 194.56108| %_neg_is_pos: 0.00348| lr: 0.0| temp: 1.96 | loss: 1.13204| constrast_loss: 4.45914| div_loss: 0.69006| %_mask_idx: 0.36905| ppl: 198.36372| %_neg_is_pos: 0.00411| lr: 0.0| temp: 1.96 | loss: 1.13766| constrast_loss: 4.4817| div_loss: 0.68921| %_mask_idx: 0.40085| ppl: 198.90778| %_neg_is_pos: 0.00236| lr: 0.0| temp: 1.95998 | loss: 1.13404| constrast_loss: 4.46592| div_loss: 0.70252| %_mask_idx: 0.41087| ppl: 190.38783| %_neg_is_pos: 0.00308| lr: 0.0| temp: 1.95998 | loss: 1.13076| constrast_loss: 4.45311| div_loss: 0.69916| %_mask_idx: 0.4093| ppl: 192.53561| %_neg_is_pos: 0.00399| lr: 0.0| temp: 1.95997 | loss: 1.13735| constrast_loss: 4.47937| div_loss: 0.70036| %_mask_idx: 0.40555| ppl: 191.7666| %_neg_is_pos: 0.00266| lr: 0.0| temp: 1.95997 | loss: 1.13465| constrast_loss: 4.46851| div_loss: 0.70097| %_mask_idx: 0.40695| ppl: 191.38013| %_neg_is_pos: 0.00347| lr: 0.0| temp: 1.95996 | loss: 1.14184| constrast_loss: 4.49849| div_loss: 0.68854| %_mask_idx: 0.39458| ppl: 199.33698| %_neg_is_pos: 0.00307| lr: 0.0| temp: 1.95996 | loss: 1.12803| constrast_loss: 4.44173| div_loss: 0.70373| %_mask_idx: 0.39098| ppl: 189.6102| %_neg_is_pos: 0.00426| lr: 0.0| temp: 1.95995 | loss: 1.12984| constrast_loss: 4.44847| div_loss: 0.709| %_mask_idx: 0.39677| ppl: 186.23706| %_neg_is_pos: 0.00344| lr: 0.0| temp: 1.95995 | loss: 1.12853| constrast_loss: 4.44368| div_loss: 0.7046| %_mask_idx: 0.36983| ppl: 189.05724| %_neg_is_pos: 0.00537| lr: 0.0| temp: 1.95993 | loss: 1.13722| constrast_loss: 4.48026| div_loss: 0.68598| %_mask_idx: 0.42011| ppl: 200.97009| %_neg_is_pos: 0.00202| lr: 0.0| temp: 1.95993 | loss: 1.12417| constrast_loss: 4.42681| div_loss: 0.69865| %_mask_idx: 0.3526| ppl: 192.8632| %_neg_is_pos: 0.00335| lr: 0.0| temp: 1.95992 | loss: 1.13226| constrast_loss: 4.45923| div_loss: 0.69808| %_mask_idx: 0.32816| ppl: 193.23058| %_neg_is_pos: 0.00646| lr: 0.0| temp: 1.95992 | loss: 1.13448| constrast_loss: 4.46837| div_loss: 0.69546| %_mask_idx: 0.42074| ppl: 194.90619| %_neg_is_pos: 0.00514| lr: 0.0| temp: 1.9599 | loss: 1.13447| constrast_loss: 4.46854| div_loss: 0.6934| %_mask_idx: 0.38174| ppl: 196.22655| %_neg_is_pos: 0.00394| lr: 0.0| temp: 1.9599 | loss: 1.13241| constrast_loss: 4.46061| div_loss: 0.69041| %_mask_idx: 0.41024| ppl: 198.13931| %_neg_is_pos: 0.00309| lr: 0.0| temp: 1.95989 | loss: 1.13431| constrast_loss: 4.46753| div_loss: 0.69694| %_mask_idx: 0.4151| ppl: 193.95987| %_neg_is_pos: 0.0049| lr: 0.0| temp: 1.95989 | loss: 1.1405| constrast_loss: 4.49251| div_loss: 0.69472| %_mask_idx: 0.37375| ppl: 195.37619| %_neg_is_pos: 0.00282| lr: 0.0| temp: 1.95988 | loss: 1.1332| constrast_loss: 4.46245| div_loss: 0.70328| %_mask_idx: 0.42669| ppl: 189.90268| %_neg_is_pos: 0.00339| lr: 0.0| temp: 1.95988 | loss: 1.13486| constrast_loss: 4.47015| div_loss: 0.69274| %_mask_idx: 0.42137| ppl: 196.64378| %_neg_is_pos: 0.00403| lr: 0.0| temp: 1.95987 | loss: 1.13818| constrast_loss: 4.4839| div_loss: 0.6883| %_mask_idx: 0.41588| ppl: 199.48767| %_neg_is_pos: 0.00264| lr: 0.0| temp: 1.95987 | loss: 1.14143| constrast_loss: 4.49699| div_loss: 0.68754| %_mask_idx: 0.42998| ppl: 199.97679| %_neg_is_pos: 0.00151| lr: 0.0| temp: 1.95985 | loss: 1.12637| constrast_loss: 4.43497| div_loss: 0.70518| %_mask_idx: 0.39239| ppl: 188.68594| %_neg_is_pos: 0.00462| lr: 0.0| temp: 1.95985 | loss: 1.14956| constrast_loss: 4.52941| div_loss: 0.68837| %_mask_idx: 0.43123| ppl: 199.44193| %_neg_is_pos: 0.00203| lr: 0.0| temp: 1.95984 | loss: 1.13567| constrast_loss: 4.47346| div_loss: 0.69215| %_mask_idx: 0.40774| ppl: 197.02399| %_neg_is_pos: 0.00312| lr: 0.0| temp: 1.95984 | loss: 1.13202| constrast_loss: 4.45754| div_loss: 0.70562| %_mask_idx: 0.39474| ppl: 188.40021| %_neg_is_pos: 0.00325| lr: 0.0| temp: 1.95983 | loss: 1.13501| constrast_loss: 4.47012| div_loss: 0.69905| %_mask_idx: 0.41369| ppl: 192.605| %_neg_is_pos: 0.00283| lr: 0.0| temp: 1.95983 | loss: 1.1251| constrast_loss: 4.42996| div_loss: 0.70418| %_mask_idx: 0.36325| ppl: 189.32306| %_neg_is_pos: 0.00454| lr: 0.0| temp: 1.95982 | loss: 1.13168| constrast_loss: 4.4576| div_loss: 0.6911| %_mask_idx: 0.35401| ppl: 197.69699| %_neg_is_pos: 0.00354| lr: 0.0| temp: 1.95982 | loss: 1.13519| constrast_loss: 4.47171| div_loss: 0.69047| %_mask_idx: 0.41682| ppl: 198.10234| %_neg_is_pos: 0.00273| lr: 0.0| temp: 1.9598 | loss: 1.13316| constrast_loss: 4.46333| div_loss: 0.69301| %_mask_idx: 0.37688| ppl: 196.47208| %_neg_is_pos: 0.0037| lr: 0.0| temp: 1.9598 | loss: 1.12254| constrast_loss: 4.42028| div_loss: 0.69868| %_mask_idx: 0.39803| ppl: 192.84244| %_neg_is_pos: 0.00408| lr: 0.0| temp: 1.95979 | loss: 1.12127| constrast_loss: 4.41336| div_loss: 0.71719| %_mask_idx: 0.40116| ppl: 181.00006| %_neg_is_pos: 0.00427| lr: 0.0| temp: 1.95979 | loss: 1.12633| constrast_loss: 4.43574| div_loss: 0.69568| %_mask_idx: 0.35526| ppl: 194.76627| %_neg_is_pos: 0.00395| lr: 0.0| temp: 1.95978 | loss: 1.13426| constrast_loss: 4.46699| div_loss: 0.70072| %_mask_idx: 0.3891| ppl: 191.54181| %_neg_is_pos: 0.00333| lr: 0.0| temp: 1.95978 | loss: 1.12638| constrast_loss: 4.4341| div_loss: 0.71405| %_mask_idx: 0.3974| ppl: 183.00574| %_neg_is_pos: 0.00458| lr: 0.0| temp: 1.95977 | loss: 1.12148| constrast_loss: 4.41549| div_loss: 0.70434| %_mask_idx: 0.43938| ppl: 189.22452| %_neg_is_pos: 0.00401| lr: 0.0| temp: 1.95977 | loss: 1.13419| constrast_loss: 4.46773| div_loss: 0.69031| %_mask_idx: 0.3916| ppl: 198.20003| %_neg_is_pos: 0.00424| lr: 0.0| temp: 1.95975 | loss: 1.13713| constrast_loss: 4.47878| div_loss: 0.69748| %_mask_idx: 0.38659| ppl: 193.60982| %_neg_is_pos: 0.00395| lr: 0.0| temp: 1.95975 | loss: 1.13772| constrast_loss: 4.48199| div_loss: 0.68881| %_mask_idx: 0.41698| ppl: 199.15976| %_neg_is_pos: 0.00272| lr: 0.0| temp: 1.95974 | loss: 1.12594| constrast_loss: 4.4332| div_loss: 0.70573| %_mask_idx: 0.35605| ppl: 188.3342| %_neg_is_pos: 0.0051| lr: 0.0| temp: 1.95974 | loss: 1.12595| constrast_loss: 4.43319| div_loss: 0.70628| %_mask_idx: 0.35824| ppl: 187.98065| %_neg_is_pos: 0.00441| lr: 0.0| temp: 1.95972 | loss: 1.14414| constrast_loss: 4.50716| div_loss: 0.69413| %_mask_idx: 0.38111| ppl: 195.75793| %_neg_is_pos: 0.00236| lr: 0.0| temp: 1.95972 | loss: 1.1329| constrast_loss: 4.46139| div_loss: 0.70214| %_mask_idx: 0.38033| ppl: 190.62746| %_neg_is_pos: 0.00331| lr: 0.0| temp: 1.95971 | loss: 1.12034| constrast_loss: 4.41062| div_loss: 0.7074| %_mask_idx: 0.37939| ppl: 187.26143| %_neg_is_pos: 0.00558| lr: 0.0| temp: 1.95971 | loss: 1.13139| constrast_loss: 4.45565| div_loss: 0.6993| %_mask_idx: 0.35542| ppl: 192.44864| %_neg_is_pos: 0.00388| lr: 0.0| temp: 1.9597 | loss: 1.12979| constrast_loss: 4.44846| div_loss: 0.70694| %_mask_idx: 0.34414| ppl: 187.55867| %_neg_is_pos: 0.0035| lr: 0.0| temp: 1.9597 | loss: 1.13481| constrast_loss: 4.46906| div_loss: 0.70188| %_mask_idx: 0.42982| ppl: 190.79836| %_neg_is_pos: 0.00371| lr: 0.0| temp: 1.95969 | loss: 1.13204| constrast_loss: 4.45782| div_loss: 0.70319| %_mask_idx: 0.3573| ppl: 189.95825| %_neg_is_pos: 0.00488| lr: 0.0| temp: 1.95969 | loss: 1.13818| constrast_loss: 4.48337| div_loss: 0.6937| %_mask_idx: 0.36732| ppl: 196.03497| %_neg_is_pos: 0.00353| lr: 0.0| temp: 1.95967 | loss: 1.12564| constrast_loss: 4.43198| div_loss: 0.70582| %_mask_idx: 0.40821| ppl: 188.27412| %_neg_is_pos: 0.00389| lr: 0.0| temp: 1.95967 | loss: 1.13228| constrast_loss: 4.45875| div_loss: 0.7037| %_mask_idx: 0.39787| ppl: 189.63513| %_neg_is_pos: 0.00373| lr: 0.0| temp: 1.95966 | loss: 1.13511| constrast_loss: 4.47057| div_loss: 0.69891| %_mask_idx: 0.37657| ppl: 192.69669| %_neg_is_pos: 0.00368| lr: 0.0| temp: 1.95966 | loss: 1.1327| constrast_loss: 4.46095| div_loss: 0.69852| %_mask_idx: 0.36169| ppl: 192.94843| %_neg_is_pos: 0.003| lr: 0.0| temp: 1.95965 | loss: 1.13951| constrast_loss: 4.48886| div_loss: 0.69176| %_mask_idx: 0.42513| ppl: 197.27567| %_neg_is_pos: 0.00238| lr: 0.0| temp: 1.95965 | loss: 1.12367| constrast_loss: 4.42466| div_loss: 0.70011| %_mask_idx: 0.34633| ppl: 191.93222| %_neg_is_pos: 0.00399| lr: 0.0| temp: 1.95964 | loss: 1.12191| constrast_loss: 4.41623| div_loss: 0.71401| %_mask_idx: 0.39646| ppl: 183.0332| %_neg_is_pos: 0.00545| lr: 0.0| temp: 1.95964 | loss: 1.12328| constrast_loss: 4.42279| div_loss: 0.70329| %_mask_idx: 0.38393| ppl: 189.89325| %_neg_is_pos: 0.0036| lr: 0.0| temp: 1.95962 | loss: 1.12313| constrast_loss: 4.42298| div_loss: 0.69542| %_mask_idx: 0.38017| ppl: 194.9292| %_neg_is_pos: 0.00326| lr: 0.0| temp: 1.95962 | loss: 1.13108| constrast_loss: 4.45465| div_loss: 0.69669| %_mask_idx: 0.39427| ppl: 194.11658| %_neg_is_pos: 0.00282| lr: 0.0| temp: 1.95961 | loss: 1.13277| constrast_loss: 4.46144| div_loss: 0.69646| %_mask_idx: 0.41917| ppl: 194.2652| %_neg_is_pos: 0.0033| lr: 0.0| temp: 1.95961 | loss: 1.12183| constrast_loss: 4.41675| div_loss: 0.70574| %_mask_idx: 0.4093| ppl: 188.3289| %_neg_is_pos: 0.00392| lr: 0.0| temp: 1.9596 | loss: 1.12168| constrast_loss: 4.41598| div_loss: 0.7073| %_mask_idx: 0.39427| ppl: 187.32642| %_neg_is_pos: 0.00569| lr: 0.0| temp: 1.9596 | loss: 1.13146| constrast_loss: 4.45637| div_loss: 0.69474| %_mask_idx: 0.39724| ppl: 195.36572| %_neg_is_pos: 0.00275| lr: 0.0| temp: 1.95959 | loss: 1.13007| constrast_loss: 4.45098| div_loss: 0.69299| %_mask_idx: 0.4032| ppl: 196.48651| %_neg_is_pos: 0.00351| lr: 0.0| temp: 1.95959 | loss: 1.11668| constrast_loss: 4.39531| div_loss: 0.71428| %_mask_idx: 0.41259| ppl: 182.85895| %_neg_is_pos: 0.0035| lr: 0.0| temp: 1.95957 | loss: 1.13799| constrast_loss: 4.48214| div_loss: 0.69824| %_mask_idx: 0.36701| ppl: 193.12332| %_neg_is_pos: 0.00249| lr: 0.0| temp: 1.95957 | loss: 1.12536| constrast_loss: 4.4324| div_loss: 0.69031| %_mask_idx: 0.35385| ppl: 198.20477| %_neg_is_pos: 0.00272| lr: 0.0| temp: 1.95956 | loss: 1.12554| constrast_loss: 4.43084| div_loss: 0.71305| %_mask_idx: 0.37845| ppl: 183.64787| %_neg_is_pos: 0.00471| lr: 0.0| temp: 1.95956 [2021-09-02 06:01:57,211] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 06:01:57,211] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.12959| constrast_loss: 4.44789| div_loss: 0.70457| %_mask_idx: 0.36372| ppl: 189.07291| %_neg_is_pos: 0.00506| lr: 0.0| temp: 1.95954 | loss: 1.13131| constrast_loss: 4.45642| div_loss: 0.6883| %_mask_idx: 0.36294| ppl: 199.48488| %_neg_is_pos: 0.00313| lr: 0.0| temp: 1.95954 | loss: 1.14749| constrast_loss: 4.52083| div_loss: 0.69149| %_mask_idx: 0.41479| ppl: 197.44937| %_neg_is_pos: 0.0019| lr: 0.0| temp: 1.95953 | loss: 1.1452| constrast_loss: 4.51207| div_loss: 0.68744| %_mask_idx: 0.38753| ppl: 200.04074| %_neg_is_pos: 0.00241| lr: 0.0| temp: 1.95953 | loss: 1.12997| constrast_loss: 4.45025| div_loss: 0.69646| %_mask_idx: 0.38142| ppl: 194.26767| %_neg_is_pos: 0.00362| lr: 0.0| temp: 1.95953 | loss: 1.12449| constrast_loss: 4.42659| div_loss: 0.71352| %_mask_idx: 0.34258| ppl: 183.34494| %_neg_is_pos: 0.00507| lr: 0.0| temp: 1.95953 | loss: 1.13757| constrast_loss: 4.48099| div_loss: 0.69281| %_mask_idx: 0.36341| ppl: 196.60318| %_neg_is_pos: 0.00245| lr: 0.0| temp: 1.95952 | loss: 1.1264| constrast_loss: 4.43593| div_loss: 0.69666| %_mask_idx: 0.38972| ppl: 194.13936| %_neg_is_pos: 0.00258| lr: 0.0| temp: 1.95952 | loss: 1.13463| constrast_loss: 4.46919| div_loss: 0.69351| %_mask_idx: 0.41526| ppl: 196.15587| %_neg_is_pos: 0.00243| lr: 0.0| temp: 1.9595| loss: 1.12992| constrast_loss: 4.45066| div_loss: 0.69005| %_mask_idx: 0.35808| ppl: 198.36754| %_neg_is_pos: 0.00292| lr: 0.0| temp: 1.9595 | loss: 1.12793| constrast_loss: 4.44246| div_loss: 0.69266| %_mask_idx: 0.32926| ppl: 196.69557| %_neg_is_pos: 0.00372| lr: 0.0| temp: 1.95949 | loss: 1.12864| constrast_loss: 4.44545| div_loss: 0.69132| %_mask_idx: 0.35793| ppl: 197.55754| %_neg_is_pos: 0.00261| lr: 0.0| temp: 1.95949 | loss: 1.13012| constrast_loss: 4.45137| div_loss: 0.69121| %_mask_idx: 0.39364| ppl: 197.62662| %_neg_is_pos: 0.0036| lr: 0.0| temp: 1.95948 | loss: 1.12624| constrast_loss: 4.4351| div_loss: 0.69853| %_mask_idx: 0.33647| ppl: 192.94115| %_neg_is_pos: 0.00205| lr: 0.0| temp: 1.95948 | loss: 1.14054| constrast_loss: 4.49256| div_loss: 0.6959| %_mask_idx: 0.38925| ppl: 194.62326| %_neg_is_pos: 0.00255| lr: 0.0| temp: 1.95947 | loss: 1.13705| constrast_loss: 4.47879| div_loss: 0.69398| %_mask_idx: 0.38988| ppl: 195.84962| %_neg_is_pos: 0.00193| lr: 0.0| temp: 1.95947 | loss: 1.14061| constrast_loss: 4.49309| div_loss: 0.69357| %_mask_idx: 0.3844| ppl: 196.11765| %_neg_is_pos: 0.00245| lr: 0.0| temp: 1.95945| loss: 1.14641| constrast_loss: 4.51539| div_loss: 0.70241| %_mask_idx: 0.41808| ppl: 190.46048| %_neg_is_pos: 0.00293| lr: 0.0| temp: 1.95945 | loss: 1.13426| constrast_loss: 4.46705| div_loss: 0.69971| %_mask_idx: 0.33976| ppl: 192.18733| %_neg_is_pos: 0.00359| lr: 0.0| temp: 1.95944 | loss: 1.13052| constrast_loss: 4.45111| div_loss: 0.70989| %_mask_idx: 0.34665| ppl: 185.66982| %_neg_is_pos: 0.00499| lr: 0.0| temp: 1.95944 | loss: 1.12602| constrast_loss: 4.43224| div_loss: 0.7184| %_mask_idx: 0.34586| ppl: 180.22615| %_neg_is_pos: 0.00408| lr: 0.0| temp: 1.95943 | loss: 1.13927| constrast_loss: 4.4861| div_loss: 0.70973| %_mask_idx: 0.40915| ppl: 185.77478| %_neg_is_pos: 0.00318| lr: 0.0| temp: 1.95943 | loss: 1.13715| constrast_loss: 4.47709| div_loss: 0.71508| %_mask_idx: 0.41353| ppl: 182.34628| %_neg_is_pos: 0.0025| lr: 0.0| temp: 1.95942 | loss: 1.13683| constrast_loss: 4.47641| div_loss: 0.70893| %_mask_idx: 0.36388| ppl: 186.28445| %_neg_is_pos: 0.00395| lr: 0.0| temp: 1.95942 | loss: 1.12616| constrast_loss: 4.43341| div_loss: 0.71223| %_mask_idx: 0.38189| ppl: 184.17288| %_neg_is_pos: 0.00271| lr: 0.0| temp: 1.9594 | loss: 1.13618| constrast_loss: 4.4741| div_loss: 0.7063| %_mask_idx: 0.42888| ppl: 187.96944| %_neg_is_pos: 0.00275| lr: 0.0| temp: 1.9594 | loss: 1.13042| constrast_loss: 4.45188| div_loss: 0.69788| %_mask_idx: 0.34211| ppl: 193.35513| %_neg_is_pos: 0.0023| lr: 0.0| temp: 1.95939 | loss: 1.1272| constrast_loss: 4.43894| div_loss: 0.69877| %_mask_idx: 0.34962| ppl: 192.78687| %_neg_is_pos: 0.00374| lr: 0.0| temp: 1.95939 | loss: 1.13433| constrast_loss: 4.4675| div_loss: 0.69827| %_mask_idx: 0.38596| ppl: 193.10498| %_neg_is_pos: 0.00561| lr: 0.0| temp: 1.95937 | loss: 1.13034| constrast_loss: 4.4493| div_loss: 0.72065| %_mask_idx: 0.35824| ppl: 178.78236| %_neg_is_pos: 0.00512| lr: 0.0| temp: 1.95937 | loss: 1.13734| constrast_loss: 4.47924| div_loss: 0.70116| %_mask_idx: 0.41902| ppl: 191.25635| %_neg_is_pos: 0.00191| lr: 0.0| temp: 1.95936 | loss: 1.13693| constrast_loss: 4.47786| div_loss: 0.69851| %_mask_idx: 0.39662| ppl: 192.95535| %_neg_is_pos: 0.00355| lr: 0.0| temp: 1.95936 | loss: 1.138| constrast_loss: 4.48233| div_loss: 0.69691| %_mask_idx: 0.40304| ppl: 193.97931| %_neg_is_pos: 0.00271| lr: 0.0| temp: 1.95935 | loss: 1.12676| constrast_loss: 4.43609| div_loss: 0.70949| %_mask_idx: 0.40304| ppl: 185.92572| %_neg_is_pos: 0.00451| lr: 0.0| temp: 1.95935 | loss: 1.13363| constrast_loss: 4.46386| div_loss: 0.70652| %_mask_idx: 0.38189| ppl: 187.82938| %_neg_is_pos: 0.00339| lr: 0.0| temp: 1.95934 | loss: 1.14508| constrast_loss: 4.51018| div_loss: 0.70153| %_mask_idx: 0.40993| ppl: 191.01985| %_neg_is_pos: 0.00253| lr: 0.0| temp: 1.95934 | loss: 1.13225| constrast_loss: 4.45809| div_loss: 0.70909| %_mask_idx: 0.38972| ppl: 186.18425| %_neg_is_pos: 0.00552| lr: 0.0| temp: 1.95932 | loss: 1.13834| constrast_loss: 4.48364| div_loss: 0.69705| %_mask_idx: 0.3963| ppl: 193.88924| %_neg_is_pos: 0.00405| lr: 0.0| temp: 1.95932 | loss: 1.13388| constrast_loss: 4.46472| div_loss: 0.70791| %_mask_idx: 0.40633| ppl: 186.93813| %_neg_is_pos: 0.00565| lr: 0.0| temp: 1.95931 | loss: 1.13474| constrast_loss: 4.46842| div_loss: 0.70541| %_mask_idx: 0.39239| ppl: 188.5388| %_neg_is_pos: 0.00458| lr: 0.0| temp: 1.95931 | loss: 1.13195| constrast_loss: 4.45734| div_loss: 0.70459| %_mask_idx: 0.36779| ppl: 189.05933| %_neg_is_pos: 0.00358| lr: 0.0| temp: 1.9593 | loss: 1.12824| constrast_loss: 4.442| div_loss: 0.70951| %_mask_idx: 0.40586| ppl: 185.91577| %_neg_is_pos: 0.00302| lr: 0.0| temp: 1.9593 | loss: 1.13587| constrast_loss: 4.47369| div_loss: 0.6981| %_mask_idx: 0.41635| ppl: 193.21487| %_neg_is_pos: 0.00317| lr: 0.0| temp: 1.95929 | loss: 1.13647| constrast_loss: 4.47614| div_loss: 0.69747| %_mask_idx: 0.38409| ppl: 193.62111| %_neg_is_pos: 0.00323| lr: 0.0| temp: 1.95929 | loss: 1.13345| constrast_loss: 4.4641| div_loss: 0.69679| %_mask_idx: 0.43311| ppl: 194.05286| %_neg_is_pos: 0.00289| lr: 0.0| temp: 1.95927 | loss: 1.13638| constrast_loss: 4.47527| div_loss: 0.70258| %_mask_idx: 0.38471| ppl: 190.34813| %_neg_is_pos: 0.00353| lr: 0.0| temp: 1.95927 | loss: 1.12505| constrast_loss: 4.42916| div_loss: 0.71034| %_mask_idx: 0.34915| ppl: 185.38094| %_neg_is_pos: 0.00548| lr: 0.0| temp: 1.95926 | loss: 1.1265| constrast_loss: 4.43555| div_loss: 0.70436| %_mask_idx: 0.39129| ppl: 189.21008| %_neg_is_pos: 0.00324| lr: 0.0| temp: 1.95926 | loss: 1.13386| constrast_loss: 4.46515| div_loss: 0.70313| %_mask_idx: 0.40132| ppl: 189.99611| %_neg_is_pos: 0.00218| lr: 0.0| temp: 1.95925 | loss: 1.13896| constrast_loss: 4.48536| div_loss: 0.70467| %_mask_idx: 0.42638| ppl: 189.01297| %_neg_is_pos: 0.0034| lr: 0.0| temp: 1.95925 | loss: 1.13193| constrast_loss: 4.45738| div_loss: 0.70326| %_mask_idx: 0.36748| ppl: 189.91318| %_neg_is_pos: 0.00375| lr: 0.0| temp: 1.95924 | loss: 1.137| constrast_loss: 4.4772| div_loss: 0.70794| %_mask_idx: 0.40852| ppl: 186.9156| %_neg_is_pos: 0.00402| lr: 0.0| temp: 1.95924 | loss: 1.14406| constrast_loss: 4.50616| div_loss: 0.70075| %_mask_idx: 0.39959| ppl: 191.51797| %_neg_is_pos: 0.00431| lr: 0.0| temp: 1.95922 | loss: 1.12692| constrast_loss: 4.43732| div_loss: 0.70362| %_mask_idx: 0.43264| ppl: 189.68402| %_neg_is_pos: 0.0031| lr: 0.0| temp: 1.95922 | loss: 1.12377| constrast_loss: 4.42464| div_loss: 0.70449| %_mask_idx: 0.36795| ppl: 189.12456| %_neg_is_pos: 0.00503| lr: 0.0| temp: 1.95921 | loss: 1.12725| constrast_loss: 4.43889| div_loss: 0.70113| %_mask_idx: 0.34398| ppl: 191.27861| %_neg_is_pos: 0.00399| lr: 0.0| temp: 1.95921 | loss: 1.13282| constrast_loss: 4.46151| div_loss: 0.69787| %_mask_idx: 0.41776| ppl: 193.36107| %_neg_is_pos: 0.00268| lr: 0.0| temp: 1.95919 | loss: 1.14243| constrast_loss: 4.50055| div_loss: 0.69189| %_mask_idx: 0.42293| ppl: 197.19238| %_neg_is_pos: 0.00163| lr: 0.0| temp: 1.95919 | loss: 1.12655| constrast_loss: 4.43507| div_loss: 0.71129| %_mask_idx: 0.3916| ppl: 184.77136| %_neg_is_pos: 0.00418| lr: 0.0| temp: 1.95918 | loss: 1.13981| constrast_loss: 4.48896| div_loss: 0.70288| %_mask_idx: 0.34821| ppl: 190.15974| %_neg_is_pos: 0.0031| lr: 0.0| temp: 1.95918 | loss: 1.12902| constrast_loss: 4.44564| div_loss: 0.70439| %_mask_idx: 0.41291| ppl: 189.18774| %_neg_is_pos: 0.00267| lr: 0.0| temp: 1.95917 | loss: 1.13978| constrast_loss: 4.48938| div_loss: 0.69764| %_mask_idx: 0.3927| ppl: 193.51096| %_neg_is_pos: 0.00364| lr: 0.0| temp: 1.95917 | loss: 1.13532| constrast_loss: 4.47049| div_loss: 0.70798| %_mask_idx: 0.37578| ppl: 186.89294| %_neg_is_pos: 0.00489| lr: 0.0| temp: 1.95916 | loss: 1.15175| constrast_loss: 4.53738| div_loss: 0.6961| %_mask_idx: 0.39333| ppl: 194.4942| %_neg_is_pos: 0.00236| lr: 0.0| temp: 1.95916 | loss: 1.12539| constrast_loss: 4.43105| div_loss: 0.70488| %_mask_idx: 0.36451| ppl: 188.87778| %_neg_is_pos: 0.00558| lr: 0.0| temp: 1.95914 | loss: 1.12913| constrast_loss: 4.446| div_loss: 0.70515| %_mask_idx: 0.3844| ppl: 188.70288| %_neg_is_pos: 0.00406| lr: 0.0| temp: 1.95914 | loss: 1.14351| constrast_loss: 4.50475| div_loss: 0.693| %_mask_idx: 0.40288| ppl: 196.48218| %_neg_is_pos: 0.0022| lr: 0.0| temp: 1.95913 | loss: 1.13143| constrast_loss: 4.4551| div_loss: 0.7063| %_mask_idx: 0.35808| ppl: 187.96536| %_neg_is_pos: 0.00411| lr: 0.0| temp: 1.95913 | loss: 1.13757| constrast_loss: 4.48043| div_loss: 0.69829| %_mask_idx: 0.4032| ppl: 193.09341| %_neg_is_pos: 0.00262| lr: 0.0| temp: 1.95912 | loss: 1.13389| constrast_loss: 4.46574| div_loss: 0.69809| %_mask_idx: 0.40962| ppl: 193.22539| %_neg_is_pos: 0.00234| lr: 0.0| temp: 1.95912 | loss: 1.12557| constrast_loss: 4.43102| div_loss: 0.71277| %_mask_idx: 0.34931| ppl: 183.82571| %_neg_is_pos: 0.00524| lr: 0.0| temp: 1.95911 | loss: 1.13575| constrast_loss: 4.47235| div_loss: 0.70637| %_mask_idx: 0.34853| ppl: 187.92145| %_neg_is_pos: 0.00499| lr: 0.0| temp: 1.95911 | loss: 1.12476| constrast_loss: 4.42778| div_loss: 0.71249| %_mask_idx: 0.3761| ppl: 184.00684| %_neg_is_pos: 0.00578| lr: 0.0| temp: 1.95909 | loss: 1.13861| constrast_loss: 4.48415| div_loss: 0.70305| %_mask_idx: 0.37422| ppl: 190.04503| %_neg_is_pos: 0.00645| lr: 0.0| temp: 1.95909 | loss: 1.13852| constrast_loss: 4.48386| div_loss: 0.70209| %_mask_idx: 0.4281| ppl: 190.66393| %_neg_is_pos: 0.00411| lr: 0.0| temp: 1.95908 | loss: 1.126| constrast_loss: 4.43209| div_loss: 0.71898| %_mask_idx: 0.41604| ppl: 179.85559| %_neg_is_pos: 0.00391| lr: 0.0| temp: 1.95908 | loss: 1.12707| constrast_loss: 4.43768| div_loss: 0.70615| %_mask_idx: 0.3916| ppl: 188.06126| %_neg_is_pos: 0.00312| lr: 0.0| temp: 1.95907 | loss: 1.12148| constrast_loss: 4.41526| div_loss: 0.70673| %_mask_idx: 0.37672| ppl: 187.69601| %_neg_is_pos: 0.00507| lr: 0.0| temp: 1.95907 | loss: 1.14008| constrast_loss: 4.48991| div_loss: 0.70411| %_mask_idx: 0.3833| ppl: 189.37225| %_neg_is_pos: 0.00385| lr: 0.0| temp: 1.95906 | loss: 1.14093| constrast_loss: 4.495| div_loss: 0.68729| %_mask_idx: 0.40742| ppl: 200.13133| %_neg_is_pos: 0.00204| lr: 0.0| temp: 1.95906 | loss: 1.12782| constrast_loss: 4.4404| div_loss: 0.70887| %_mask_idx: 0.34602| ppl: 186.32411| %_neg_is_pos: 0.00704| lr: 0.0| temp: 1.95905 | loss: 1.13734| constrast_loss: 4.47858| div_loss: 0.70787| %_mask_idx: 0.35448| ppl: 186.96544| %_neg_is_pos: 0.00353| lr: 0.0| temp: 1.95905 | loss: 1.14217| constrast_loss: 4.49904| div_loss: 0.69637| %_mask_idx: 0.41886| ppl: 194.32065| %_neg_is_pos: 0.00341| lr: 0.0| temp: 1.95904 | loss: 1.12753| constrast_loss: 4.43941| div_loss: 0.7072| %_mask_idx: 0.36576| ppl: 187.39069| %_neg_is_pos: 0.00493| lr: 0.0| temp: 1.95904 | loss: 1.13022| constrast_loss: 4.45033| div_loss: 0.70563| %_mask_idx: 0.39333| ppl: 188.39761| %_neg_is_pos: 0.00304| lr: 0.0| temp: 1.95902 | loss: 1.13024| constrast_loss: 4.45148| div_loss: 0.69497| %_mask_idx: 0.40226| ppl: 195.21744| %_neg_is_pos: 0.0037| lr: 0.0| temp: 1.95902 | loss: 1.13166| constrast_loss: 4.45584| div_loss: 0.70793| %_mask_idx: 0.43092| ppl: 186.92279| %_neg_is_pos: 0.00464| lr: 0.0| temp: 1.95901 | loss: 1.13016| constrast_loss: 4.44944| div_loss: 0.71193| %_mask_idx: 0.41526| ppl: 184.36395| %_neg_is_pos: 0.00333| lr: 0.0| temp: 1.95901 | loss: 1.12604| constrast_loss: 4.4336| div_loss: 0.70568| %_mask_idx: 0.40461| ppl: 188.36661| %_neg_is_pos: 0.00604| lr: 0.0| temp: 1.959 | loss: 1.14532| constrast_loss: 4.51159| div_loss: 0.697| %_mask_idx: 0.43797| ppl: 193.9202| %_neg_is_pos: 0.00193| lr: 0.0| temp: 1.959 | loss: 1.13444| constrast_loss: 4.46781| div_loss: 0.6994| %_mask_idx: 0.37923| ppl: 192.38538| %_neg_is_pos: 0.0041| lr: 0.0| temp: 1.95899 | loss: 1.13526| constrast_loss: 4.47072| div_loss: 0.70324| %_mask_idx: 0.39348| ppl: 189.92346| %_neg_is_pos: 0.00531| lr: 0.0| temp: 1.95899 | loss: 1.13638| constrast_loss: 4.4743| div_loss: 0.71212| %_mask_idx: 0.39693| ppl: 184.24149| %_neg_is_pos: 0.00319| lr: 0.0| temp: 1.95897 | loss: 1.13124| constrast_loss: 4.4547| div_loss: 0.70255| %_mask_idx: 0.45426| ppl: 190.36703| %_neg_is_pos: 0.00314| lr: 0.0| temp: 1.95897 | loss: 1.12911| constrast_loss: 4.44612| div_loss: 0.70299| %_mask_idx: 0.37986| ppl: 190.08458| %_neg_is_pos: 0.00412| lr: 0.0| temp: 1.95896 | loss: 1.14837| constrast_loss: 4.52445| div_loss: 0.69049| %_mask_idx: 0.4021| ppl: 198.08745| %_neg_is_pos: 0.00232| lr: 0.0| temp: 1.95896 | loss: 1.13751| constrast_loss: 4.48044| div_loss: 0.69596| %_mask_idx: 0.39113| ppl: 194.5881| %_neg_is_pos: 0.00351| lr: 0.0| temp: 1.95895 | loss: 1.12825| constrast_loss: 4.44245| div_loss: 0.70534| %_mask_idx: 0.39803| ppl: 188.58118| %_neg_is_pos: 0.00427| lr: 0.0| temp: 1.95895 | loss: 1.131| constrast_loss: 4.45305| div_loss: 0.70951| %_mask_idx: 0.37907| ppl: 185.91055| %_neg_is_pos: 0.0031| lr: 0.0| temp: 1.95894 | loss: 1.14225| constrast_loss: 4.50034| div_loss: 0.68661| %_mask_idx: 0.35511| ppl: 200.57196| %_neg_is_pos: 0.00253| lr: 0.0| temp: 1.95894 | loss: 1.13473| constrast_loss: 4.46775| div_loss: 0.71163| %_mask_idx: 0.36654| ppl: 184.55701| %_neg_is_pos: 0.00356| lr: 0.0| temp: 1.95892 | loss: 1.13777| constrast_loss: 4.48127| div_loss: 0.69794| %_mask_idx: 0.43264| ppl: 193.32039| %_neg_is_pos: 0.00233| lr: 0.0| temp: 1.95892 | loss: 1.13915| constrast_loss: 4.48755| div_loss: 0.69055| %_mask_idx: 0.39348| ppl: 198.04678| %_neg_is_pos: 0.00302| lr: 0.0| temp: 1.95891 | loss: 1.13938| constrast_loss: 4.48731| div_loss: 0.70215| %_mask_idx: 0.41056| ppl: 190.62448| %_neg_is_pos: 0.00289| lr: 0.0| temp: 1.95891 | loss: 1.12191| constrast_loss: 4.41551| div_loss: 0.72141| %_mask_idx: 0.37594| ppl: 178.29921| %_neg_is_pos: 0.00698| lr: 0.0| temp: 1.9589 | loss: 1.13199| constrast_loss: 4.45771| div_loss: 0.70269| %_mask_idx: 0.32581| ppl: 190.27832| %_neg_is_pos: 0.00632| lr: 0.0| temp: 1.9589 | loss: 1.12836| constrast_loss: 4.44253| div_loss: 0.70904| %_mask_idx: 0.33208| ppl: 186.21393| %_neg_is_pos: 0.00493| lr: 0.0| temp: 1.95889 | loss: 1.13129| constrast_loss: 4.45459| div_loss: 0.70552| %_mask_idx: 0.35025| ppl: 188.46854| %_neg_is_pos: 0.00452| lr: 0.0| temp: 1.95889 | loss: 1.12636| constrast_loss: 4.43499| div_loss: 0.7044| %_mask_idx: 0.39928| ppl: 189.18236| %_neg_is_pos: 0.0045| lr: 0.0| temp: 1.95887 | loss: 1.13603| constrast_loss: 4.47414| div_loss: 0.69975| %_mask_idx: 0.38643| ppl: 192.15869| %_neg_is_pos: 0.00423| lr: 0.0| temp: 1.95887 | loss: 1.14925| constrast_loss: 4.5274| div_loss: 0.696| %_mask_idx: 0.38518| ppl: 194.55852| %_neg_is_pos: 0.00258| lr: 0.0| temp: 1.95886 | loss: 1.13604| constrast_loss: 4.47353| div_loss: 0.70638| %_mask_idx: 0.40664| ppl: 187.91621| %_neg_is_pos: 0.00711| lr: 0.0| temp: 1.95886 [2021-09-02 06:11:10,324] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 06:11:10,324] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.12309| constrast_loss: 4.42052| div_loss: 0.7185| %_mask_idx: 0.41212| ppl: 180.16028| %_neg_is_pos: 0.00477| lr: 0.0| temp: 1.95884 | loss: 1.12573| constrast_loss: 4.4311| div_loss: 0.718| %_mask_idx: 0.38471| ppl: 180.48024| %_neg_is_pos: 0.0061| lr: 0.0| temp: 1.95884 | loss: 1.12864| constrast_loss: 4.44417| div_loss: 0.70384| %_mask_idx: 0.41776| ppl: 189.53952| %_neg_is_pos: 0.0039| lr: 0.0| temp: 1.95883 | loss: 1.12121| constrast_loss: 4.41413| div_loss: 0.70702| %_mask_idx: 0.3844| ppl: 187.50696| %_neg_is_pos: 0.00504| lr: 0.0| temp: 1.95883 | loss: 1.11966| constrast_loss: 4.40664| div_loss: 0.71997| %_mask_idx: 0.42654| ppl: 179.22098| %_neg_is_pos: 0.00434| lr: 0.0| temp: 1.95882 | loss: 1.13096| constrast_loss: 4.45341| div_loss: 0.7041| %_mask_idx: 0.34649| ppl: 189.37784| %_neg_is_pos: 0.00436| lr: 0.0| temp: 1.95882 | loss: 1.13453| constrast_loss: 4.46678| div_loss: 0.71346| %_mask_idx: 0.36294| ppl: 183.38525| %_neg_is_pos: 0.00667| lr: 0.0| temp: 1.95881 | loss: 1.13878| constrast_loss: 4.48569| div_loss: 0.69447| %_mask_idx: 0.38487| ppl: 195.53697| %_neg_is_pos: 0.00378| lr: 0.0| temp: 1.95881 | loss: 1.13167| constrast_loss: 4.45546| div_loss: 0.71211| %_mask_idx: 0.37265| ppl: 184.24759| %_neg_is_pos: 0.00326| lr: 0.0| temp: 1.95879| loss: 1.13252| constrast_loss: 4.45891| div_loss: 0.71175| %_mask_idx: 0.35197| ppl: 184.48007| %_neg_is_pos: 0.00306| lr: 0.0| temp: 1.95879 | loss: 1.1363| constrast_loss: 4.47496| div_loss: 0.70221| %_mask_idx: 0.4256| ppl: 190.58621| %_neg_is_pos: 0.00203| lr: 0.0| temp: 1.95878 | loss: 1.12774| constrast_loss: 4.43985| div_loss: 0.71126| %_mask_idx: 0.35088| ppl: 184.79268| %_neg_is_pos: 0.00504| lr: 0.0| temp: 1.95878 | loss: 1.1391| constrast_loss: 4.48611| div_loss: 0.70282| %_mask_idx: 0.36059| ppl: 190.19421| %_neg_is_pos: 0.00171| lr: 0.0| temp: 1.95877 | loss: 1.13584| constrast_loss: 4.47286| div_loss: 0.70504| %_mask_idx: 0.39897| ppl: 188.77122| %_neg_is_pos: 0.00172| lr: 0.0| temp: 1.95877 | loss: 1.13198| constrast_loss: 4.45764| div_loss: 0.70281| %_mask_idx: 0.39411| ppl: 190.20074| %_neg_is_pos: 0.00172| lr: 0.0| temp: 1.95876 | loss: 1.13392| constrast_loss: 4.46486| div_loss: 0.70824| %_mask_idx: 0.44799| ppl: 186.72559| %_neg_is_pos: 0.00212| lr: 0.0| temp: 1.95876 | loss: 1.13964| constrast_loss: 4.48831| div_loss: 0.70248| %_mask_idx: 0.42497| ppl: 190.41086| %_neg_is_pos: 0.00236| lr: 0.0| temp: 1.95874 | loss: 1.14501| constrast_loss: 4.51015| div_loss: 0.69893| %_mask_idx: 0.4375| ppl: 192.68262| %_neg_is_pos: 0.00117| lr: 0.0| temp: 1.95874 | loss: 1.13975| constrast_loss: 4.48842| div_loss: 0.70594| %_mask_idx: 0.37359| ppl: 188.20078| %_neg_is_pos: 0.00201| lr: 0.0| temp: 1.95873 | loss: 1.14378| constrast_loss: 4.50494| div_loss: 0.70157| %_mask_idx: 0.39975| ppl: 190.99255| %_neg_is_pos: 0.00232| lr: 0.0| temp: 1.95873 | loss: 1.14187| constrast_loss: 4.49659| div_loss: 0.70887| %_mask_idx: 0.41933| ppl: 186.32254| %_neg_is_pos: 0.00207| lr: 0.0| temp: 1.95872 | loss: 1.13918| constrast_loss: 4.48649| div_loss: 0.7022| %_mask_idx: 0.37704| ppl: 190.59384| %_neg_is_pos: 0.00177| lr: 0.0| temp: 1.95872 | loss: 1.12243| constrast_loss: 4.41753| div_loss: 0.72203| %_mask_idx: 0.35025| ppl: 177.90036| %_neg_is_pos: 0.0026| lr: 0.0| temp: 1.95871 | loss: 1.12381| constrast_loss: 4.42346| div_loss: 0.71793| %_mask_idx: 0.39881| ppl: 180.52197| %_neg_is_pos: 0.0024| lr: 0.0| temp: 1.95871 | loss: 1.14044| constrast_loss: 4.49126| div_loss: 0.70499| %_mask_idx: 0.35448| ppl: 188.8055| %_neg_is_pos: 0.00232| lr: 0.0| temp: 1.95869 | loss: 1.13785| constrast_loss: 4.48022| div_loss: 0.71173| %_mask_idx: 0.42168| ppl: 184.49561| %_neg_is_pos: 0.00211| lr: 0.0| temp: 1.95869 | loss: 1.13163| constrast_loss: 4.45454| div_loss: 0.71992| %_mask_idx: 0.31751| ppl: 179.25317| %_neg_is_pos: 0.00416| lr: 0.0| temp: 1.95868 | loss: 1.14725| constrast_loss: 4.51863| div_loss: 0.70372| %_mask_idx: 0.37046| ppl: 189.61934| %_neg_is_pos: 0.00162| lr: 0.0| temp: 1.95868 | loss: 1.13877| constrast_loss: 4.48426| div_loss: 0.70823| %_mask_idx: 0.44173| ppl: 186.73233| %_neg_is_pos: 0.00211| lr: 0.0| temp: 1.95866 | loss: 1.14224| constrast_loss: 4.49836| div_loss: 0.70603| %_mask_idx: 0.38612| ppl: 188.13899| %_neg_is_pos: 0.0035| lr: 0.0| temp: 1.95866 | loss: 1.14463| constrast_loss: 4.50774| div_loss: 0.70757| %_mask_idx: 0.36591| ppl: 187.15602| %_neg_is_pos: 0.002| lr: 0.0| temp: 1.95865 | loss: 1.13065| constrast_loss: 4.45079| div_loss: 0.71817| %_mask_idx: 0.36184| ppl: 180.37041| %_neg_is_pos: 0.00356| lr: 0.0| temp: 1.95865 | loss: 1.13318| constrast_loss: 4.46214| div_loss: 0.70577| %_mask_idx: 0.42152| ppl: 188.30927| %_neg_is_pos: 0.00313| lr: 0.0| temp: 1.95864 | loss: 1.13309| constrast_loss: 4.46104| div_loss: 0.71324| %_mask_idx: 0.40335| ppl: 183.52423| %_neg_is_pos: 0.0025| lr: 0.0| temp: 1.95864 | loss: 1.13183| constrast_loss: 4.45574| div_loss: 0.71566| %_mask_idx: 0.36529| ppl: 181.97983| %_neg_is_pos: 0.00346| lr: 0.0| temp: 1.95863 | loss: 1.13495| constrast_loss: 4.46826| div_loss: 0.71541| %_mask_idx: 0.35761| ppl: 182.13577| %_neg_is_pos: 0.00279| lr: 0.0| temp: 1.95863 | loss: 1.13197| constrast_loss: 4.45563| div_loss: 0.72246| %_mask_idx: 0.38158| ppl: 177.62756| %_neg_is_pos: 0.00397| lr: 0.0| temp: 1.95861 | loss: 1.13302| constrast_loss: 4.46132| div_loss: 0.7076| %_mask_idx: 0.42121| ppl: 187.13376| %_neg_is_pos: 0.00225| lr: 0.0| temp: 1.95861 | loss: 1.12455| constrast_loss: 4.42643| div_loss: 0.71787| %_mask_idx: 0.34179| ppl: 180.56342| %_neg_is_pos: 0.00324| lr: 0.0| temp: 1.9586 | loss: 1.13105| constrast_loss: 4.45275| div_loss: 0.71454| %_mask_idx: 0.39254| ppl: 182.69681| %_neg_is_pos: 0.00301| lr: 0.0| temp: 1.9586 | loss: 1.13593| constrast_loss: 4.47306| div_loss: 0.70649| %_mask_idx: 0.4187| ppl: 187.84589| %_neg_is_pos: 0.00193| lr: 0.0| temp: 1.95859 | loss: 1.13843| constrast_loss: 4.48187| div_loss: 0.71832| %_mask_idx: 0.38221| ppl: 180.27275| %_neg_is_pos: 0.00143| lr: 0.0| temp: 1.95859 | loss: 1.12363| constrast_loss: 4.42328| div_loss: 0.71225| %_mask_idx: 0.38487| ppl: 184.15695| %_neg_is_pos: 0.00328| lr: 0.0| temp: 1.95858 | loss: 1.13118| constrast_loss: 4.45275| div_loss: 0.71946| %_mask_idx: 0.36169| ppl: 179.54303| %_neg_is_pos: 0.00272| lr: 0.0| temp: 1.95858 | loss: 1.13514| constrast_loss: 4.46914| div_loss: 0.71415| %_mask_idx: 0.39928| ppl: 182.94208| %_neg_is_pos: 0.00332| lr: 0.0| temp: 1.95857 | loss: 1.12595| constrast_loss: 4.43249| div_loss: 0.71306| %_mask_idx: 0.36466| ppl: 183.63873| %_neg_is_pos: 0.00291| lr: 0.0| temp: 1.95857 | loss: 1.13139| constrast_loss: 4.45395| div_loss: 0.71614| %_mask_idx: 0.41604| ppl: 181.67021| %_neg_is_pos: 0.00294| lr: 0.0| temp: 1.95856 | loss: 1.14| constrast_loss: 4.48944| div_loss: 0.7056| %_mask_idx: 0.41479| ppl: 188.41795| %_neg_is_pos: 0.00228| lr: 0.0| temp: 1.95856 | loss: 1.13631| constrast_loss: 4.47403| div_loss: 0.71201| %_mask_idx: 0.33239| ppl: 184.31183| %_neg_is_pos: 0.0026| lr: 0.0| temp: 1.95855 | loss: 1.13041| constrast_loss: 4.45019| div_loss: 0.7146| %_mask_idx: 0.39599| ppl: 182.65646| %_neg_is_pos: 0.00235| lr: 0.0| temp: 1.95855 | loss: 1.12563| constrast_loss: 4.43087| div_loss: 0.71665| %_mask_idx: 0.39552| ppl: 181.34526| %_neg_is_pos: 0.00244| lr: 0.0| temp: 1.95854 | loss: 1.12508| constrast_loss: 4.42873| div_loss: 0.71569| %_mask_idx: 0.33553| ppl: 181.95786| %_neg_is_pos: 0.00606| lr: 0.0| temp: 1.95854 | loss: 1.14335| constrast_loss: 4.50243| div_loss: 0.70954| %_mask_idx: 0.42105| ppl: 185.89493| %_neg_is_pos: 0.00309| lr: 0.0| temp: 1.95852 | loss: 1.1302| constrast_loss: 4.44997| div_loss: 0.70829| %_mask_idx: 0.39098| ppl: 186.69595| %_neg_is_pos: 0.00429| lr: 0.0| temp: 1.95852 | loss: 1.13486| constrast_loss: 4.46802| div_loss: 0.71415| %_mask_idx: 0.38863| ppl: 182.94095| %_neg_is_pos: 0.00434| lr: 0.0| temp: 1.95851 | loss: 1.12952| constrast_loss: 4.44771| div_loss: 0.70378| %_mask_idx: 0.40179| ppl: 189.57962| %_neg_is_pos: 0.00207| lr: 0.0| temp: 1.95851 | loss: 1.13788| constrast_loss: 4.48109| div_loss: 0.7044| %_mask_idx: 0.38487| ppl: 189.18404| %_neg_is_pos: 0.00389| lr: 0.0| temp: 1.95849 | loss: 1.12319| constrast_loss: 4.42057| div_loss: 0.72181| %_mask_idx: 0.32816| ppl: 178.04189| %_neg_is_pos: 0.00316| lr: 0.0| temp: 1.95849 | loss: 1.14449| constrast_loss: 4.50724| div_loss: 0.70722| %_mask_idx: 0.36544| ppl: 187.38008| %_neg_is_pos: 0.0032| lr: 0.0| temp: 1.95848 | loss: 1.13719| constrast_loss: 4.47851| div_loss: 0.7024| %_mask_idx: 0.40836| ppl: 190.46167| %_neg_is_pos: 0.00239| lr: 0.0| temp: 1.95848 | loss: 1.14288| constrast_loss: 4.50105| div_loss: 0.70486| %_mask_idx: 0.42701| ppl: 188.89175| %_neg_is_pos: 0.00147| lr: 0.0| temp: 1.95847 | loss: 1.13367| constrast_loss: 4.46302| div_loss: 0.71671| %_mask_idx: 0.4151| ppl: 181.30798| %_neg_is_pos: 0.00453| lr: 0.0| temp: 1.95847 | loss: 1.13823| constrast_loss: 4.48083| div_loss: 0.72099| %_mask_idx: 0.37751| ppl: 178.56938| %_neg_is_pos: 0.00194| lr: 0.0| temp: 1.95846 | loss: 1.13336| constrast_loss: 4.46237| div_loss: 0.71052| %_mask_idx: 0.39944| ppl: 185.27025| %_neg_is_pos: 0.00288| lr: 0.0| temp: 1.95846 | loss: 1.14058| constrast_loss: 4.49084| div_loss: 0.71471| %_mask_idx: 0.42387| ppl: 182.58713| %_neg_is_pos: 0.0025| lr: 0.0| temp: 1.95844 | loss: 1.13781| constrast_loss: 4.47983| div_loss: 0.71405| %_mask_idx: 0.38346| ppl: 183.00856| %_neg_is_pos: 0.00331| lr: 0.0| temp: 1.95844 | loss: 1.13236| constrast_loss: 4.45787| div_loss: 0.71558| %_mask_idx: 0.40492| ppl: 182.03056| %_neg_is_pos: 0.00228| lr: 0.0| temp: 1.95843 | loss: 1.13823| constrast_loss: 4.4821| div_loss: 0.7083| %_mask_idx: 0.3833| ppl: 186.68996| %_neg_is_pos: 0.00181| lr: 0.0| temp: 1.95843 | loss: 1.11999| constrast_loss: 4.40871| div_loss: 0.71261| %_mask_idx: 0.32315| ppl: 183.92734| %_neg_is_pos: 0.00443| lr: 0.0| temp: 1.95842 | loss: 1.13718| constrast_loss: 4.4776| div_loss: 0.71117| %_mask_idx: 0.42168| ppl: 184.85085| %_neg_is_pos: 0.0024| lr: 0.0| temp: 1.95842 | loss: 1.13402| constrast_loss: 4.46502| div_loss: 0.71047| %_mask_idx: 0.37343| ppl: 185.29987| %_neg_is_pos: 0.00272| lr: 0.0| temp: 1.95841 | loss: 1.13692| constrast_loss: 4.47674| div_loss: 0.70955| %_mask_idx: 0.43202| ppl: 185.88885| %_neg_is_pos: 0.00175| lr: 0.0| temp: 1.95841 | loss: 1.13962| constrast_loss: 4.48701| div_loss: 0.71478| %_mask_idx: 0.42669| ppl: 182.53835| %_neg_is_pos: 0.00182| lr: 0.0| temp: 1.95839 | loss: 1.13572| constrast_loss: 4.47234| div_loss: 0.70547| %_mask_idx: 0.40883| ppl: 188.49815| %_neg_is_pos: 0.0024| lr: 0.0| temp: 1.95839 | loss: 1.14502| constrast_loss: 4.50971| div_loss: 0.70376| %_mask_idx: 0.42419| ppl: 189.59277| %_neg_is_pos: 0.00181| lr: 0.0| temp: 1.95838 | loss: 1.13754| constrast_loss: 4.4789| div_loss: 0.71246| %_mask_idx: 0.40539| ppl: 184.02824| %_neg_is_pos: 0.0018| lr: 0.0| temp: 1.95838 | loss: 1.13828| constrast_loss: 4.48241| div_loss: 0.70731| %_mask_idx: 0.4068| ppl: 187.32166| %_neg_is_pos: 0.0041| lr: 0.0| temp: 1.95837 | loss: 1.13742| constrast_loss: 4.47781| div_loss: 0.71853| %_mask_idx: 0.40038| ppl: 180.14055| %_neg_is_pos: 0.00225| lr: 0.0| temp: 1.95837 | loss: 1.13345| constrast_loss: 4.46252| div_loss: 0.71294| %_mask_idx: 0.39286| ppl: 183.71579| %_neg_is_pos: 0.00347| lr: 0.0| temp: 1.95836 | loss: 1.13581| constrast_loss: 4.47206| div_loss: 0.71177| %_mask_idx: 0.36811| ppl: 184.46941| %_neg_is_pos: 0.00357| lr: 0.0| temp: 1.95836 | loss: 1.14273| constrast_loss: 4.50069| div_loss: 0.7024| %_mask_idx: 0.38424| ppl: 190.46707| %_neg_is_pos: 0.0021| lr: 0.0| temp: 1.95834 | loss: 1.12868| constrast_loss: 4.44333| div_loss: 0.71399| %_mask_idx: 0.39207| ppl: 183.04738| %_neg_is_pos: 0.00297| lr: 0.0| temp: 1.95834 | loss: 1.12752| constrast_loss: 4.43866| div_loss: 0.71439| %_mask_idx: 0.39771| ppl: 182.79097| %_neg_is_pos: 0.00312| lr: 0.0| temp: 1.95833 | loss: 1.13113| constrast_loss: 4.45301| div_loss: 0.71499| %_mask_idx: 0.40633| ppl: 182.40396| %_neg_is_pos: 0.00247| lr: 0.0| temp: 1.95833 | loss: 1.1241| constrast_loss: 4.42483| div_loss: 0.71553| %_mask_idx: 0.3313| ppl: 182.05878| %_neg_is_pos: 0.00439| lr: 0.0| temp: 1.95831 | loss: 1.1369| constrast_loss: 4.47643| div_loss: 0.71159| %_mask_idx: 0.39724| ppl: 184.58301| %_neg_is_pos: 0.00286| lr: 0.0| temp: 1.95831 | loss: 1.1351| constrast_loss: 4.46915| div_loss: 0.71272| %_mask_idx: 0.37547| ppl: 183.86127| %_neg_is_pos: 0.00295| lr: 0.0| temp: 1.9583 | loss: 1.14233| constrast_loss: 4.4985| div_loss: 0.70812| %_mask_idx: 0.39991| ppl: 186.8063| %_neg_is_pos: 0.00168| lr: 0.0| temp: 1.9583 | loss: 1.13405| constrast_loss: 4.4649| div_loss: 0.71305| %_mask_idx: 0.41526| ppl: 183.64781| %_neg_is_pos: 0.00153| lr: 0.0| temp: 1.95829 | loss: 1.1298| constrast_loss: 4.44769| div_loss: 0.71492| %_mask_idx: 0.35417| ppl: 182.45433| %_neg_is_pos: 0.00485| lr: 0.0| temp: 1.95829 | loss: 1.13252| constrast_loss: 4.45837| div_loss: 0.71727| %_mask_idx: 0.38878| ppl: 180.94647| %_neg_is_pos: 0.00191| lr: 0.0| temp: 1.95828 | loss: 1.13252| constrast_loss: 4.45951| div_loss: 0.70582| %_mask_idx: 0.37437| ppl: 188.27411| %_neg_is_pos: 0.00458| lr: 0.0| temp: 1.95828 | loss: 1.13026| constrast_loss: 4.4492| div_loss: 0.71849| %_mask_idx: 0.38628| ppl: 180.16486| %_neg_is_pos: 0.00314| lr: 0.0| temp: 1.95826 | loss: 1.14419| constrast_loss: 4.50616| div_loss: 0.70607| %_mask_idx: 0.40053| ppl: 188.11386| %_neg_is_pos: 0.00124| lr: 0.0| temp: 1.95826 | loss: 1.13081| constrast_loss: 4.4522| div_loss: 0.71034| %_mask_idx: 0.37798| ppl: 185.38268| %_neg_is_pos: 0.00375| lr: 0.0| temp: 1.95825 | loss: 1.13663| constrast_loss: 4.47605| div_loss: 0.70454| %_mask_idx: 0.36607| ppl: 189.09337| %_neg_is_pos: 0.0024| lr: 0.0| temp: 1.95825 | loss: 1.13031| constrast_loss: 4.44922| div_loss: 0.72004| %_mask_idx: 0.35636| ppl: 179.17415| %_neg_is_pos: 0.0037| lr: 0.0| temp: 1.95824 | loss: 1.13876| constrast_loss: 4.48421| div_loss: 0.70841| %_mask_idx: 0.41557| ppl: 186.62| %_neg_is_pos: 0.00184| lr: 0.0| temp: 1.95824 | loss: 1.14278| constrast_loss: 4.50008| div_loss: 0.71017| %_mask_idx: 0.43014| ppl: 185.49043| %_neg_is_pos: 0.00173| lr: 0.0| temp: 1.95823 | loss: 1.13114| constrast_loss: 4.45277| div_loss: 0.71804| %_mask_idx: 0.39098| ppl: 180.45428| %_neg_is_pos: 0.00437| lr: 0.0| temp: 1.95823 | loss: 1.13961| constrast_loss: 4.48693| div_loss: 0.715| %_mask_idx: 0.40351| ppl: 182.39844| %_neg_is_pos: 0.00232| lr: 0.0| temp: 1.95821 | loss: 1.14201| constrast_loss: 4.49723| div_loss: 0.70826| %_mask_idx: 0.36263| ppl: 186.7155| %_neg_is_pos: 0.00213| lr: 0.0| temp: 1.95821 | loss: 1.13228| constrast_loss: 4.45799| div_loss: 0.71114| %_mask_idx: 0.43546| ppl: 184.86737| %_neg_is_pos: 0.00247| lr: 0.0| temp: 1.9582 | loss: 1.13075| constrast_loss: 4.45232| div_loss: 0.70682| %_mask_idx: 0.36497| ppl: 187.63725| %_neg_is_pos: 0.0022| lr: 0.0| temp: 1.9582 | loss: 1.12165| constrast_loss: 4.41436| div_loss: 0.72229| %_mask_idx: 0.35276| ppl: 177.73721| %_neg_is_pos: 0.0035| lr: 0.0| temp: 1.95819 | loss: 1.13449| constrast_loss: 4.46673| div_loss: 0.71226| %_mask_idx: 0.38158| ppl: 184.15594| %_neg_is_pos: 0.00332| lr: 0.0| temp: 1.95819 | loss: 1.14418| constrast_loss: 4.50629| div_loss: 0.70443| %_mask_idx: 0.42278| ppl: 189.16422| %_neg_is_pos: 0.00151| lr: 0.0| temp: 1.95818 | loss: 1.13063| constrast_loss: 4.45166| div_loss: 0.70851| %_mask_idx: 0.4234| ppl: 186.55533| %_neg_is_pos: 0.00316| lr: 0.0| temp: 1.95818 | loss: 1.13249| constrast_loss: 4.4584| div_loss: 0.71566| %_mask_idx: 0.35808| ppl: 181.97682| %_neg_is_pos: 0.00159| lr: 0.0| temp: 1.95816 | loss: 1.13232| constrast_loss: 4.4586| div_loss: 0.70681| %_mask_idx: 0.37813| ppl: 187.6427| %_neg_is_pos: 0.0026| lr: 0.0| temp: 1.95816 | loss: 1.11818| constrast_loss: 4.40108| div_loss: 0.71652| %_mask_idx: 0.31751| ppl: 181.42905| %_neg_is_pos: 0.00285| lr: 0.0| temp: 1.95815 | loss: 1.13804| constrast_loss: 4.48165| div_loss: 0.7052| %_mask_idx: 0.43687| ppl: 188.67052| %_neg_is_pos: 0.00177| lr: 0.0| temp: 1.95815 [2021-09-02 06:20:25,536] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 06:20:25,536] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.13483| constrast_loss: 4.46884| div_loss: 0.70466| %_mask_idx: 0.42716| ppl: 189.02042| %_neg_is_pos: 0.00312| lr: 0.0| temp: 1.95813 | loss: 1.1293| constrast_loss: 4.44509| div_loss: 0.72128| %_mask_idx: 0.45097| ppl: 178.38362| %_neg_is_pos: 0.00248| lr: 0.0| temp: 1.95813 | loss: 1.1324| constrast_loss: 4.45845| div_loss: 0.71169| %_mask_idx: 0.39317| ppl: 184.51973| %_neg_is_pos: 0.00226| lr: 0.0| temp: 1.95812 | loss: 1.12417| constrast_loss: 4.42439| div_loss: 0.72289| %_mask_idx: 0.35965| ppl: 177.35187| %_neg_is_pos: 0.00651| lr: 0.0| temp: 1.95812 | loss: 1.13767| constrast_loss: 4.47876| div_loss: 0.71902| %_mask_idx: 0.33506| ppl: 179.8291| %_neg_is_pos: 0.0056| lr: 0.0| temp: 1.95811 | loss: 1.1382| constrast_loss: 4.48163| div_loss: 0.71182| %_mask_idx: 0.414| ppl: 184.43564| %_neg_is_pos: 0.00384| lr: 0.0| temp: 1.95811 | loss: 1.13788| constrast_loss: 4.47976| div_loss: 0.71761| %_mask_idx: 0.34962| ppl: 180.73062| %_neg_is_pos: 0.00312| lr: 0.0| temp: 1.9581 | loss: 1.12838| constrast_loss: 4.44255| div_loss: 0.70976| %_mask_idx: 0.4187| ppl: 185.75275| %_neg_is_pos: 0.00383| lr: 0.0| temp: 1.9581 | loss: 1.13217| constrast_loss: 4.45716| div_loss: 0.71502| %_mask_idx: 0.40476| ppl: 182.38901| %_neg_is_pos: 0.00305| lr: 0.0| temp: 1.95809 | loss: 1.13842| constrast_loss: 4.48215| div_loss: 0.71534| %_mask_idx: 0.33004| ppl: 182.18512| %_neg_is_pos: 0.00248| lr: 0.0| temp: 1.95809 | loss: 1.12912| constrast_loss: 4.44355| div_loss: 0.7292| %_mask_idx: 0.35573| ppl: 173.31| %_neg_is_pos: 0.00424| lr: 0.0| temp: 1.95808 | loss: 1.1387| constrast_loss: 4.48306| div_loss: 0.71739| %_mask_idx: 0.38941| ppl: 180.87207| %_neg_is_pos: 0.00227| lr: 0.0| temp: 1.95808 | loss: 1.1286| constrast_loss: 4.44212| div_loss: 0.72276| %_mask_idx: 0.42497| ppl: 177.43475| %_neg_is_pos: 0.00199| lr: 0.0| temp: 1.95807 | loss: 1.14447| constrast_loss: 4.50687| div_loss: 0.71026| %_mask_idx: 0.44298| ppl: 185.43314| %_neg_is_pos: 0.00182| lr: 0.0| temp: 1.95807 | loss: 1.14359| constrast_loss: 4.50258| div_loss: 0.71764| %_mask_idx: 0.40899| ppl: 180.70805| %_neg_is_pos: 0.00179| lr: 0.0| temp: 1.95806 | loss: 1.12729| constrast_loss: 4.4376| div_loss: 0.71549| %_mask_idx: 0.35887| ppl: 182.08475| %_neg_is_pos: 0.00245| lr: 0.0| temp: 1.95806 | loss: 1.13767| constrast_loss: 4.47861| div_loss: 0.72056| %_mask_idx: 0.35056| ppl: 178.84061| %_neg_is_pos: 0.00233| lr: 0.0| temp: 1.95804 | loss: 1.13904| constrast_loss: 4.48453| div_loss: 0.71612| %_mask_idx: 0.37281| ppl: 181.68507| %_neg_is_pos: 0.0025| lr: 0.0| temp: 1.95804 | loss: 1.13692| constrast_loss: 4.47576| div_loss: 0.71911| %_mask_idx: 0.4104| ppl: 179.77246| %_neg_is_pos: 0.0019| lr: 0.0| temp: 1.95803 | loss: 1.13166| constrast_loss: 4.45457| div_loss: 0.72057| %_mask_idx: 0.38503| ppl: 178.83644| %_neg_is_pos: 0.0021| lr: 0.0| temp: 1.95803 | loss: 1.1347| constrast_loss: 4.46691| div_loss: 0.71877| %_mask_idx: 0.43828| ppl: 179.99004| %_neg_is_pos: 0.00195| lr: 0.0| temp: 1.95802 | loss: 1.1254| constrast_loss: 4.42826| div_loss: 0.73343| %_mask_idx: 0.3714| ppl: 170.6062| %_neg_is_pos: 0.00291| lr: 0.0| temp: 1.95802 | loss: 1.13331| constrast_loss: 4.46137| div_loss: 0.71865| %_mask_idx: 0.39113| ppl: 180.0641| %_neg_is_pos: 0.00217| lr: 0.0| temp: 1.95801 | loss: 1.13526| constrast_loss: 4.46998| div_loss: 0.71056| %_mask_idx: 0.38189| ppl: 185.24341| %_neg_is_pos: 0.00203| lr: 0.0| temp: 1.95801 | loss: 1.14988| constrast_loss: 4.52809| div_loss: 0.71433| %_mask_idx: 0.39207| ppl: 182.82629| %_neg_is_pos: 0.0023| lr: 0.0| temp: 1.95799 | loss: 1.14378| constrast_loss: 4.50217| div_loss: 0.72926| %_mask_idx: 0.3739| ppl: 173.27057| %_neg_is_pos: 0.00181| lr: 0.0| temp: 1.95799 | loss: 1.14534| constrast_loss: 4.50976| div_loss: 0.71608| %_mask_idx: 0.37704| ppl: 181.71115| %_neg_is_pos: 0.00149| lr: 0.0| temp: 1.95798 | loss: 1.12067| constrast_loss: 4.41029| div_loss: 0.72383| %_mask_idx: 0.39865| ppl: 176.7494| %_neg_is_pos: 0.00211| lr: 0.0| temp: 1.95798 | loss: 1.13567| constrast_loss: 4.47058| div_loss: 0.72092| %_mask_idx: 0.4093| ppl: 178.61229| %_neg_is_pos: 0.00289| lr: 0.0| temp: 1.95796 | loss: 1.14924| constrast_loss: 4.52552| div_loss: 0.71447| %_mask_idx: 0.41103| ppl: 182.74222| %_neg_is_pos: 0.00139| lr: 0.0| temp: 1.95796 | loss: 1.13914| constrast_loss: 4.48483| div_loss: 0.71719| %_mask_idx: 0.47149| ppl: 180.99548| %_neg_is_pos: 0.00227| lr: 0.0| temp: 1.95795 | loss: 1.14775| constrast_loss: 4.51915| div_loss: 0.71872| %_mask_idx: 0.44361| ppl: 180.01808| %_neg_is_pos: 0.00172| lr: 0.0| temp: 1.95795 | loss: 1.1318| constrast_loss: 4.4545| div_loss: 0.72684| %_mask_idx: 0.36638| ppl: 174.82359| %_neg_is_pos: 0.00404| lr: 0.0| temp: 1.95794 | loss: 1.12574| constrast_loss: 4.43037| div_loss: 0.72585| %_mask_idx: 0.35182| ppl: 175.45349| %_neg_is_pos: 0.00454| lr: 0.0| temp: 1.95794 | loss: 1.12825| constrast_loss: 4.44053| div_loss: 0.72487| %_mask_idx: 0.35667| ppl: 176.08099| %_neg_is_pos: 0.00279| lr: 0.0| temp: 1.95793 | loss: 1.14357| constrast_loss: 4.5014| div_loss: 0.72872| %_mask_idx: 0.41306| ppl: 173.61703| %_neg_is_pos: 0.00199| lr: 0.0| temp: 1.95793 | loss: 1.11854| constrast_loss: 4.4011| div_loss: 0.73064| %_mask_idx: 0.36122| ppl: 172.38922| %_neg_is_pos: 0.00339| lr: 0.0| temp: 1.95791 | loss: 1.14755| constrast_loss: 4.51766| div_loss: 0.72519| %_mask_idx: 0.38127| ppl: 175.88005| %_neg_is_pos: 0.0023| lr: 0.0| temp: 1.95791 | loss: 1.13864| constrast_loss: 4.48265| div_loss: 0.71915| %_mask_idx: 0.39756| ppl: 179.74194| %_neg_is_pos: 0.00219| lr: 0.0| temp: 1.9579 | loss: 1.12995| constrast_loss: 4.4472| div_loss: 0.72611| %_mask_idx: 0.39724| ppl: 175.29202| %_neg_is_pos: 0.00253| lr: 0.0| temp: 1.9579 | loss: 1.13893| constrast_loss: 4.48337| div_loss: 0.72353| %_mask_idx: 0.4115| ppl: 176.93936| %_neg_is_pos: 0.0022| lr: 0.0| temp: 1.95789 | loss: 1.14145| constrast_loss: 4.49331| div_loss: 0.72483| %_mask_idx: 0.34164| ppl: 176.11201| %_neg_is_pos: 0.0019| lr: 0.0| temp: 1.95789 | loss: 1.12612| constrast_loss: 4.43108| div_loss: 0.7339| %_mask_idx: 0.34774| ppl: 170.30472| %_neg_is_pos: 0.00316| lr: 0.0| temp: 1.95788 | loss: 1.14289| constrast_loss: 4.49995| div_loss: 0.71625| %_mask_idx: 0.38565| ppl: 181.60251| %_neg_is_pos: 0.00229| lr: 0.0| temp: 1.95788 | loss: 1.13372| constrast_loss: 4.46286| div_loss: 0.72042| %_mask_idx: 0.40617| ppl: 178.93413| %_neg_is_pos: 0.00248| lr: 0.0| temp: 1.95786 | loss: 1.12747| constrast_loss: 4.43655| div_loss: 0.73321| %_mask_idx: 0.36419| ppl: 170.74852| %_neg_is_pos: 0.00353| lr: 0.0| temp: 1.95786 | loss: 1.1331| constrast_loss: 4.45981| div_loss: 0.72603| %_mask_idx: 0.40226| ppl: 175.33784| %_neg_is_pos: 0.00284| lr: 0.0| temp: 1.95785 | loss: 1.14782| constrast_loss: 4.51925| div_loss: 0.72043| %_mask_idx: 0.41808| ppl: 178.9223| %_neg_is_pos: 0.0014| lr: 0.0| temp: 1.95785 | loss: 1.13813| constrast_loss: 4.47975| div_loss: 0.72752| %_mask_idx: 0.36184| ppl: 174.38599| %_neg_is_pos: 0.00426| lr: 0.0| temp: 1.95784 | loss: 1.13977| constrast_loss: 4.48766| div_loss: 0.71416| %_mask_idx: 0.40038| ppl: 182.93642| %_neg_is_pos: 0.0022| lr: 0.0| temp: 1.95784 | loss: 1.13779| constrast_loss: 4.4789| div_loss: 0.72242| %_mask_idx: 0.37845| ppl: 177.65227| %_neg_is_pos: 0.00206| lr: 0.0| temp: 1.95783 | loss: 1.13523| constrast_loss: 4.46815| div_loss: 0.72766| %_mask_idx: 0.38706| ppl: 174.29854| %_neg_is_pos: 0.00315| lr: 0.0| temp: 1.95783 | loss: 1.1357| constrast_loss: 4.47057| div_loss: 0.72206| %_mask_idx: 0.38048| ppl: 177.87897| %_neg_is_pos: 0.00233| lr: 0.0| temp: 1.95781 | loss: 1.12644| constrast_loss: 4.43208| div_loss: 0.73689| %_mask_idx: 0.3963| ppl: 168.38794| %_neg_is_pos: 0.00348| lr: 0.0| temp: 1.95781 | loss: 1.13379| constrast_loss: 4.4626| div_loss: 0.72557| %_mask_idx: 0.36247| ppl: 175.63695| %_neg_is_pos: 0.00325| lr: 0.0| temp: 1.9578 | loss: 1.13726| constrast_loss: 4.47717| div_loss: 0.71869| %_mask_idx: 0.36701| ppl: 180.03937| %_neg_is_pos: 0.00285| lr: 0.0| temp: 1.9578 | loss: 1.1434| constrast_loss: 4.50158| div_loss: 0.72024| %_mask_idx: 0.38643| ppl: 179.04915| %_neg_is_pos: 0.00212| lr: 0.0| temp: 1.95778 | loss: 1.14368| constrast_loss: 4.50323| div_loss: 0.71496| %_mask_idx: 0.43437| ppl: 182.42822| %_neg_is_pos: 0.00161| lr: 0.0| temp: 1.95778 | loss: 1.12431| constrast_loss: 4.42506| div_loss: 0.72166| %_mask_idx: 0.35401| ppl: 178.14076| %_neg_is_pos: 0.00225| lr: 0.0| temp: 1.95777 | loss: 1.1378| constrast_loss: 4.47857| div_loss: 0.72619| %_mask_idx: 0.38534| ppl: 175.2411| %_neg_is_pos: 0.00323| lr: 0.0| temp: 1.95777 | loss: 1.1385| constrast_loss: 4.48279| div_loss: 0.71227| %_mask_idx: 0.40288| ppl: 184.14577| %_neg_is_pos: 0.00249| lr: 0.0| temp: 1.95776 | loss: 1.13299| constrast_loss: 4.46015| div_loss: 0.71799| %_mask_idx: 0.41056| ppl: 180.48956| %_neg_is_pos: 0.00318| lr: 0.0| temp: 1.95776 | loss: 1.14328| constrast_loss: 4.50127| div_loss: 0.71859| %_mask_idx: 0.40414| ppl: 180.10419| %_neg_is_pos: 0.0021| lr: 0.0| temp: 1.95775 | loss: 1.14089| constrast_loss: 4.49089| div_loss: 0.72684| %_mask_idx: 0.3692| ppl: 174.82251| %_neg_is_pos: 0.00152| lr: 0.0| temp: 1.95775 | loss: 1.13962| constrast_loss: 4.48615| div_loss: 0.72344| %_mask_idx: 0.39489| ppl: 176.99617| %_neg_is_pos: 0.00356| lr: 0.0| temp: 1.95773 | loss: 1.139| constrast_loss: 4.48373| div_loss: 0.72271| %_mask_idx: 0.39113| ppl: 177.46725| %_neg_is_pos: 0.00287| lr: 0.0| temp: 1.95773 | loss: 1.13565| constrast_loss: 4.47031| div_loss: 0.72306| %_mask_idx: 0.42888| ppl: 177.23975| %_neg_is_pos: 0.00291| lr: 0.0| temp: 1.95772 | loss: 1.12768| constrast_loss: 4.43813| div_loss: 0.72595| %_mask_idx: 0.36952| ppl: 175.39021| %_neg_is_pos: 0.00391| lr: 0.0| temp: 1.95772 | loss: 1.14904| constrast_loss: 4.5242| div_loss: 0.71946| %_mask_idx: 0.41432| ppl: 179.54817| %_neg_is_pos: 0.00165| lr: 0.0| temp: 1.95771 | loss: 1.13858| constrast_loss: 4.48222| div_loss: 0.72102| %_mask_idx: 0.38252| ppl: 178.54517| %_neg_is_pos: 0.0039| lr: 0.0| temp: 1.95771 | loss: 1.13507| constrast_loss: 4.46846| div_loss: 0.71801| %_mask_idx: 0.37281| ppl: 180.4747| %_neg_is_pos: 0.00384| lr: 0.0| temp: 1.9577 | loss: 1.13584| constrast_loss: 4.47089| div_loss: 0.72482| %_mask_idx: 0.38409| ppl: 176.11475| %_neg_is_pos: 0.00194| lr: 0.0| temp: 1.9577 | loss: 1.13601| constrast_loss: 4.47112| div_loss: 0.72924| %_mask_idx: 0.41103| ppl: 173.28882| %_neg_is_pos: 0.00311| lr: 0.0| temp: 1.95768 | loss: 1.13756| constrast_loss: 4.47791| div_loss: 0.72336| %_mask_idx: 0.38017| ppl: 177.0488| %_neg_is_pos: 0.00296| lr: 0.0| temp: 1.95768 | loss: 1.148| constrast_loss: 4.5199| div_loss: 0.72082| %_mask_idx: 0.37046| ppl: 178.67824| %_neg_is_pos: 0.00217| lr: 0.0| temp: 1.95767 | loss: 1.13857| constrast_loss: 4.4828| div_loss: 0.71478| %_mask_idx: 0.40648| ppl: 182.54225| %_neg_is_pos: 0.00372| lr: 0.0| temp: 1.95767 | loss: 1.1328| constrast_loss: 4.459| div_loss: 0.72181| %_mask_idx: 0.33615| ppl: 178.04131| %_neg_is_pos: 0.00441| lr: 0.0| temp: 1.95766 | loss: 1.13322| constrast_loss: 4.46101| div_loss: 0.71894| %_mask_idx: 0.36576| ppl: 179.88055| %_neg_is_pos: 0.0019| lr: 0.0| temp: 1.95766 | loss: 1.13123| constrast_loss: 4.4531| div_loss: 0.71835| %_mask_idx: 0.39505| ppl: 180.25569| %_neg_is_pos: 0.00384| lr: 0.0| temp: 1.95765 | loss: 1.14183| constrast_loss: 4.4954| div_loss: 0.71925| %_mask_idx: 0.42951| ppl: 179.68015| %_neg_is_pos: 0.00199| lr: 0.0| temp: 1.95765 | loss: 1.13077| constrast_loss: 4.45102| div_loss: 0.72037| %_mask_idx: 0.38001| ppl: 178.96301| %_neg_is_pos: 0.00227| lr: 0.0| temp: 1.95764 | loss: 1.13919| constrast_loss: 4.48384| div_loss: 0.7291| %_mask_idx: 0.42638| ppl: 173.37483| %_neg_is_pos: 0.00281| lr: 0.0| temp: 1.95764 | loss: 1.13233| constrast_loss: 4.45703| div_loss: 0.72298| %_mask_idx: 0.36466| ppl: 177.29135| %_neg_is_pos: 0.00268| lr: 0.0| temp: 1.95763 | loss: 1.13643| constrast_loss: 4.47403| div_loss: 0.71698| %_mask_idx: 0.32832| ppl: 181.13283| %_neg_is_pos: 0.0029| lr: 0.0| temp: 1.95763 | loss: 1.121| constrast_loss: 4.41125| div_loss: 0.72748| %_mask_idx: 0.34759| ppl: 174.41235| %_neg_is_pos: 0.0038| lr: 0.0| temp: 1.95761 | loss: 1.13169| constrast_loss: 4.45459| div_loss: 0.72157| %_mask_idx: 0.35636| ppl: 178.19406| %_neg_is_pos: 0.0025| lr: 0.0| temp: 1.95761 | loss: 1.13574| constrast_loss: 4.47028| div_loss: 0.72672| %_mask_idx: 0.42121| ppl: 174.90129| %_neg_is_pos: 0.00286| lr: 0.0| temp: 1.9576 | loss: 1.13201| constrast_loss: 4.45615| div_loss: 0.71889| %_mask_idx: 0.42309| ppl: 179.91301| %_neg_is_pos: 0.00283| lr: 0.0| temp: 1.9576 | loss: 1.12883| constrast_loss: 4.44284| div_loss: 0.7247| %_mask_idx: 0.40398| ppl: 176.19218| %_neg_is_pos: 0.00254| lr: 0.0| temp: 1.95759 | loss: 1.13511| constrast_loss: 4.46702| div_loss: 0.73427| %_mask_idx: 0.38283| ppl: 170.07014| %_neg_is_pos: 0.0026| lr: 0.0| temp: 1.95759 | loss: 1.13106| constrast_loss: 4.45203| div_loss: 0.72224| %_mask_idx: 0.40836| ppl: 177.76517| %_neg_is_pos: 0.00304| lr: 0.0| temp: 1.95758 | loss: 1.14186| constrast_loss: 4.49592| div_loss: 0.71526| %_mask_idx: 0.38596| ppl: 182.23148| %_neg_is_pos: 0.00246| lr: 0.0| temp: 1.95758 | loss: 1.13925| constrast_loss: 4.48497| div_loss: 0.72029| %_mask_idx: 0.35667| ppl: 179.01701| %_neg_is_pos: 0.00178| lr: 0.0| temp: 1.95756 | loss: 1.13941| constrast_loss: 4.48594| div_loss: 0.71697| %_mask_idx: 0.40555| ppl: 181.13791| %_neg_is_pos: 0.00211| lr: 0.0| temp: 1.95756 | loss: 1.13202| constrast_loss: 4.45593| div_loss: 0.7214| %_mask_idx: 0.40852| ppl: 178.30327| %_neg_is_pos: 0.00291| lr: 0.0| temp: 1.95755 | loss: 1.14435| constrast_loss: 4.50485| div_loss: 0.72532| %_mask_idx: 0.47259| ppl: 175.7932| %_neg_is_pos: 0.00248| lr: 0.0| temp: 1.95755 | loss: 1.13583| constrast_loss: 4.47116| div_loss: 0.7215| %_mask_idx: 0.3714| ppl: 178.24171| %_neg_is_pos: 0.00366| lr: 0.0| temp: 1.95754 | loss: 1.14143| constrast_loss: 4.49465| div_loss: 0.71085| %_mask_idx: 0.36341| ppl: 185.05756| %_neg_is_pos: 0.00278| lr: 0.0| temp: 1.95754 | loss: 1.12749| constrast_loss: 4.43692| div_loss: 0.73038| %_mask_idx: 0.39348| ppl: 172.55446| %_neg_is_pos: 0.00406| lr: 0.0| temp: 1.95753 | loss: 1.14108| constrast_loss: 4.49231| div_loss: 0.72015| %_mask_idx: 0.40147| ppl: 179.10172| %_neg_is_pos: 0.00208| lr: 0.0| temp: 1.95753 | loss: 1.14324| constrast_loss: 4.5008| div_loss: 0.72161| %_mask_idx: 0.38784| ppl: 178.16745| %_neg_is_pos: 0.00216| lr: 0.0| temp: 1.95751 | loss: 1.13504| constrast_loss: 4.46766| div_loss: 0.72503| %_mask_idx: 0.42779| ppl: 175.97978| %_neg_is_pos: 0.00359| lr: 0.0| temp: 1.95751 | loss: 1.13497| constrast_loss: 4.46699| div_loss: 0.72875| %_mask_idx: 0.39959| ppl: 173.6013| %_neg_is_pos: 0.00452| lr: 0.0| temp: 1.9575 | loss: 1.14008| constrast_loss: 4.48873| div_loss: 0.71585| %_mask_idx: 0.40899| ppl: 181.85522| %_neg_is_pos: 0.00132| lr: 0.0| temp: 1.9575 | loss: 1.14856| constrast_loss: 4.52115| div_loss: 0.73084| %_mask_idx: 0.37469| ppl: 172.26358| %_neg_is_pos: 0.00313| lr: 0.0| temp: 1.95749 | loss: 1.14084| constrast_loss: 4.49164| div_loss: 0.71705| %_mask_idx: 0.38346| ppl: 181.08736| %_neg_is_pos: 0.00253| lr: 0.0| temp: 1.95749 | loss: 1.13133| constrast_loss: 4.45239| div_loss: 0.72941| %_mask_idx: 0.35887| ppl: 173.17883| %_neg_is_pos: 0.0039| lr: 0.0| temp: 1.95748 | loss: 1.13925| constrast_loss: 4.48555| div_loss: 0.71436| %_mask_idx: 0.34085| ppl: 182.81093| %_neg_is_pos: 0.00343| lr: 0.0| temp: 1.95748 | loss: 1.13867| constrast_loss: 4.482| div_loss: 0.72688| %_mask_idx: 0.38518| ppl: 174.79732| %_neg_is_pos: 0.00311| lr: 0.0| temp: 1.95746 | loss: 1.12417| constrast_loss: 4.42433| div_loss: 0.72352| %_mask_idx: 0.35793| ppl: 176.94954| %_neg_is_pos: 0.00268| lr: 0.0| temp: 1.95746 | loss: 1.13142| constrast_loss: 4.45316| div_loss: 0.7254| %_mask_idx: 0.3985| ppl: 175.74713| %_neg_is_pos: 0.00211| lr: 0.0| temp: 1.95745 | loss: 1.1305| constrast_loss: 4.44991| div_loss: 0.7207| %_mask_idx: 0.38299| ppl: 178.74968| %_neg_is_pos: 0.0038| lr: 0.0| temp: 1.95745 [2021-09-02 06:29:38,301] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 06:29:38,301] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.12719| constrast_loss: 4.4357| div_loss: 0.7307| %_mask_idx: 0.40742| ppl: 172.3526| %_neg_is_pos: 0.00279| lr: 0.0| temp: 1.95743 | loss: 1.13244| constrast_loss: 4.45686| div_loss: 0.72907| %_mask_idx: 0.36889| ppl: 173.39253| %_neg_is_pos: 0.00255| lr: 0.0| temp: 1.95743 | loss: 1.12628| constrast_loss: 4.43233| div_loss: 0.72775| %_mask_idx: 0.37437| ppl: 174.24013| %_neg_is_pos: 0.00382| lr: 0.0| temp: 1.95742 | loss: 1.13198| constrast_loss: 4.45558| div_loss: 0.72359| %_mask_idx: 0.32253| ppl: 176.90105| %_neg_is_pos: 0.00445| lr: 0.0| temp: 1.95742 | loss: 1.13086| constrast_loss: 4.45025| div_loss: 0.73194| %_mask_idx: 0.33271| ppl: 171.56116| %_neg_is_pos: 0.00353| lr: 0.0| temp: 1.95741 | loss: 1.12834| constrast_loss: 4.44024| div_loss: 0.7312| %_mask_idx: 0.37657| ppl: 172.03201| %_neg_is_pos: 0.00262| lr: 0.0| temp: 1.95741 | loss: 1.13696| constrast_loss: 4.47516| div_loss: 0.72665| %_mask_idx: 0.37876| ppl: 174.94711| %_neg_is_pos: 0.00204| lr: 0.0| temp: 1.9574 | loss: 1.12941| constrast_loss: 4.44503| div_loss: 0.72595| %_mask_idx: 0.36137| ppl: 175.38994| %_neg_is_pos: 0.00273| lr: 0.0| temp: 1.9574 | loss: 1.13643| constrast_loss: 4.47352| div_loss: 0.72217| %_mask_idx: 0.41902| ppl: 177.81094| %_neg_is_pos: 0.00204| lr: 0.0| temp: 1.95738 | loss: 1.1386| constrast_loss: 4.48135| div_loss: 0.73054| %_mask_idx: 0.38236| ppl: 172.4549| %_neg_is_pos: 0.00255| lr: 0.0| temp: 1.95738 | loss: 1.13207| constrast_loss: 4.45559| div_loss: 0.72676| %_mask_idx: 0.40539| ppl: 174.87523| %_neg_is_pos: 0.00269| lr: 0.0| temp: 1.95737 | loss: 1.13831| constrast_loss: 4.48074| div_loss: 0.72491| %_mask_idx: 0.38737| ppl: 176.05521| %_neg_is_pos: 0.0021| lr: 0.0| temp: 1.95737 | loss: 1.144| constrast_loss: 4.50313| div_loss: 0.72881| %_mask_idx: 0.38925| ppl: 173.5625| %_neg_is_pos: 0.00252| lr: 0.0| temp: 1.95736 | loss: 1.12557| constrast_loss: 4.42899| div_loss: 0.73295| %_mask_idx: 0.33788| ppl: 170.914| %_neg_is_pos: 0.00417| lr: 0.0| temp: 1.95736 | loss: 1.13112| constrast_loss: 4.4514| div_loss: 0.73061| %_mask_idx: 0.37061| ppl: 172.41147| %_neg_is_pos: 0.00282| lr: 0.0| temp: 1.95735 | loss: 1.13511| constrast_loss: 4.46677| div_loss: 0.7369| %_mask_idx: 0.38236| ppl: 168.38626| %_neg_is_pos: 0.00303| lr: 0.0| temp: 1.95735 | loss: 1.1369| constrast_loss: 4.47476| div_loss: 0.7285| %_mask_idx: 0.37766| ppl: 173.7569| %_neg_is_pos: 0.00245| lr: 0.0| temp: 1.95733 | loss: 1.13035| constrast_loss: 4.4485| div_loss: 0.72881| %_mask_idx: 0.43296| ppl: 173.56137| %_neg_is_pos: 0.0022| lr: 0.0| temp: 1.95733 | loss: 1.1384| constrast_loss: 4.47996| div_loss: 0.73657| %_mask_idx: 0.41745| ppl: 168.59296| %_neg_is_pos: 0.00164| lr: 0.0| temp: 1.95732 | loss: 1.13897| constrast_loss: 4.4822| div_loss: 0.73688| %_mask_idx: 0.36482| ppl: 168.3974| %_neg_is_pos: 0.00249| lr: 0.0| temp: 1.95732 | loss: 1.14312| constrast_loss: 4.49859| div_loss: 0.73872| %_mask_idx: 0.38048| ppl: 167.21637| %_neg_is_pos: 0.00203| lr: 0.0| temp: 1.95731 | loss: 1.12203| constrast_loss: 4.41326| div_loss: 0.74841| %_mask_idx: 0.3938| ppl: 161.0162| %_neg_is_pos: 0.0035| lr: 0.0| temp: 1.95731 | loss: 1.13062| constrast_loss: 4.44844| div_loss: 0.7403| %_mask_idx: 0.37657| ppl: 166.21008| %_neg_is_pos: 0.00219| lr: 0.0| temp: 1.9573 | loss: 1.14524| constrast_loss: 4.50624| div_loss: 0.74731| %_mask_idx: 0.4256| ppl: 161.72458| %_neg_is_pos: 0.0018| lr: 0.0| temp: 1.9573 | loss: 1.1359| constrast_loss: 4.46898| div_loss: 0.74618| %_mask_idx: 0.40226| ppl: 162.44501| %_neg_is_pos: 0.00272| lr: 0.0| temp: 1.95728 | loss: 1.13647| constrast_loss: 4.47178| div_loss: 0.74093| %_mask_idx: 0.36278| ppl: 165.80644| %_neg_is_pos: 0.00273| lr: 0.0| temp: 1.95728 | loss: 1.13227| constrast_loss: 4.45406| div_loss: 0.75004| %_mask_idx: 0.3407| ppl: 159.97705| %_neg_is_pos: 0.0029| lr: 0.0| temp: 1.95727 | loss: 1.13336| constrast_loss: 4.45821| div_loss: 0.75229| %_mask_idx: 0.38957| ppl: 158.53737| %_neg_is_pos: 0.00245| lr: 0.0| temp: 1.95727 | loss: 1.13614| constrast_loss: 4.46937| div_loss: 0.7518| %_mask_idx: 0.38847| ppl: 158.85104| %_neg_is_pos: 0.00207| lr: 0.0| temp: 1.95725 | loss: 1.13399| constrast_loss: 4.46084| div_loss: 0.75142| %_mask_idx: 0.39724| ppl: 159.09204| %_neg_is_pos: 0.00199| lr: 0.0| temp: 1.95725 | loss: 1.14443| constrast_loss: 4.50229| div_loss: 0.75445| %_mask_idx: 0.40069| ppl: 157.15088| %_neg_is_pos: 0.00225| lr: 0.0| temp: 1.95724 | loss: 1.14566| constrast_loss: 4.50836| div_loss: 0.74301| %_mask_idx: 0.37469| ppl: 164.47623| %_neg_is_pos: 0.00199| lr: 0.0| temp: 1.95724 | loss: 1.13848| constrast_loss: 4.47876| div_loss: 0.75158| %_mask_idx: 0.37516| ppl: 158.98784| %_neg_is_pos: 0.00192| lr: 0.0| temp: 1.95723 | loss: 1.13784| constrast_loss: 4.47585| div_loss: 0.75522| %_mask_idx: 0.34633| ppl: 156.65891| %_neg_is_pos: 0.00257| lr: 0.0| temp: 1.95723 | loss: 1.12954| constrast_loss: 4.44251| div_loss: 0.75671| %_mask_idx: 0.40836| ppl: 155.70599| %_neg_is_pos: 0.00202| lr: 0.0| temp: 1.95722 | loss: 1.14398| constrast_loss: 4.50056| div_loss: 0.75351| %_mask_idx: 0.38001| ppl: 157.75218| %_neg_is_pos: 0.00177| lr: 0.0| temp: 1.95722 | loss: 1.14088| constrast_loss: 4.48763| div_loss: 0.75871| %_mask_idx: 0.45724| ppl: 154.42378| %_neg_is_pos: 0.00252| lr: 0.0| temp: 1.9572 | loss: 1.1492| constrast_loss: 4.52126| div_loss: 0.75524| %_mask_idx: 0.46084| ppl: 156.64407| %_neg_is_pos: 0.00183| lr: 0.0| temp: 1.9572 | loss: 1.13622| constrast_loss: 4.46908| div_loss: 0.75786| %_mask_idx: 0.3938| ppl: 154.97089| %_neg_is_pos: 0.00222| lr: 0.0| temp: 1.95719 | loss: 1.13437| constrast_loss: 4.46229| div_loss: 0.75183| %_mask_idx: 0.37547| ppl: 158.82823| %_neg_is_pos: 0.00202| lr: 0.0| temp: 1.95719 | loss: 1.13227| constrast_loss: 4.45394| div_loss: 0.75128| %_mask_idx: 0.32206| ppl: 159.1795| %_neg_is_pos: 0.00224| lr: 0.0| temp: 1.95718 | loss: 1.13877| constrast_loss: 4.48049| div_loss: 0.74573| %_mask_idx: 0.39677| ppl: 162.73404| %_neg_is_pos: 0.00175| lr: 0.0| temp: 1.95718 | loss: 1.14303| constrast_loss: 4.49733| div_loss: 0.74798| %_mask_idx: 0.45238| ppl: 161.29233| %_neg_is_pos: 0.00177| lr: 0.0| temp: 1.95718 | loss: 1.14143| constrast_loss: 4.4904| div_loss: 0.75315| %_mask_idx: 0.41823| ppl: 157.98657| %_neg_is_pos: 0.00215| lr: 0.0| temp: 1.95718 | loss: 1.14297| constrast_loss: 4.49602| div_loss: 0.75876| %_mask_idx: 0.41635| ppl: 154.39642| %_neg_is_pos: 0.00195| lr: 0.0| temp: 1.95716 | loss: 1.12927| constrast_loss: 4.44149| div_loss: 0.756| %_mask_idx: 0.38424| ppl: 156.15955| %_neg_is_pos: 0.00255| lr: 0.0| temp: 1.95716 | loss: 1.1408| constrast_loss: 4.48811| div_loss: 0.75098| %_mask_idx: 0.43061| ppl: 159.37552| %_neg_is_pos: 0.00186| lr: 0.0| temp: 1.95715 | loss: 1.14569| constrast_loss: 4.50736| div_loss: 0.75402| %_mask_idx: 0.45395| ppl: 157.42523| %_neg_is_pos: 0.0015| lr: 0.0| temp: 1.95715 | loss: 1.14227| constrast_loss: 4.49387| div_loss: 0.75198| %_mask_idx: 0.39991| ppl: 158.73279| %_neg_is_pos: 0.00177| lr: 0.0| temp: 1.95714 | loss: 1.1393| constrast_loss: 4.48212| div_loss: 0.75082| %_mask_idx: 0.41291| ppl: 159.47702| %_neg_is_pos: 0.00234| lr: 0.0| temp: 1.95714 | loss: 1.14477| constrast_loss: 4.5044| div_loss: 0.74688| %_mask_idx: 0.388| ppl: 161.99403| %_neg_is_pos: 0.00227| lr: 0.0| temp: 1.95713 | loss: 1.13964| constrast_loss: 4.4837| div_loss: 0.74868| %_mask_idx: 0.39395| ppl: 160.84377| %_neg_is_pos: 0.00178| lr: 0.0| temp: 1.95713 | loss: 1.13559| constrast_loss: 4.46727| div_loss: 0.75083| %_mask_idx: 0.37939| ppl: 159.46909| %_neg_is_pos: 0.00261| lr: 0.0| temp: 1.95711 | loss: 1.14989| constrast_loss: 4.52366| div_loss: 0.75891| %_mask_idx: 0.38628| ppl: 154.29727| %_neg_is_pos: 0.00147| lr: 0.0| temp: 1.95711 | loss: 1.13554| constrast_loss: 4.46647| div_loss: 0.75693| %_mask_idx: 0.43703| ppl: 155.56439| %_neg_is_pos: 0.00173| lr: 0.0| temp: 1.9571 | loss: 1.1343| constrast_loss: 4.46138| div_loss: 0.75807| %_mask_idx: 0.36748| ppl: 154.83833| %_neg_is_pos: 0.00264| lr: 0.0| temp: 1.9571 | loss: 1.12916| constrast_loss: 4.44112| div_loss: 0.75529| %_mask_idx: 0.33866| ppl: 156.61269| %_neg_is_pos: 0.00287| lr: 0.0| temp: 1.95708 | loss: 1.13516| constrast_loss: 4.46567| div_loss: 0.74975| %_mask_idx: 0.38424| ppl: 160.16304| %_neg_is_pos: 0.00212| lr: 0.0| temp: 1.95708 | loss: 1.13631| constrast_loss: 4.46942| div_loss: 0.75832| %_mask_idx: 0.41009| ppl: 154.67203| %_neg_is_pos: 0.00216| lr: 0.0| temp: 1.95707 | loss: 1.14148| constrast_loss: 4.49023| div_loss: 0.75692| %_mask_idx: 0.33866| ppl: 155.57272| %_neg_is_pos: 0.00201| lr: 0.0| temp: 1.95707 | loss: 1.13791| constrast_loss: 4.47631| div_loss: 0.75341| %_mask_idx: 0.42466| ppl: 157.81633| %_neg_is_pos: 0.00202| lr: 0.0| temp: 1.95706 | loss: 1.13941| constrast_loss: 4.48135| div_loss: 0.76311| %_mask_idx: 0.40288| ppl: 151.60959| %_neg_is_pos: 0.00233| lr: 0.0| temp: 1.95706 | loss: 1.13651| constrast_loss: 4.47061| div_loss: 0.75417| %_mask_idx: 0.34273| ppl: 157.3333| %_neg_is_pos: 0.00227| lr: 0.0| temp: 1.95705 | loss: 1.14259| constrast_loss: 4.49435| div_loss: 0.76002| %_mask_idx: 0.37187| ppl: 153.58795| %_neg_is_pos: 0.00278| lr: 0.0| temp: 1.95705 | loss: 1.13577| constrast_loss: 4.46821| div_loss: 0.74848| %_mask_idx: 0.41855| ppl: 160.97504| %_neg_is_pos: 0.00245| lr: 0.0| temp: 1.95703 | loss: 1.12122| constrast_loss: 4.40892| div_loss: 0.75983| %_mask_idx: 0.35871| ppl: 153.7114| %_neg_is_pos: 0.00242| lr: 0.0| temp: 1.95703 | loss: 1.14154| constrast_loss: 4.49086| div_loss: 0.75294| %_mask_idx: 0.38518| ppl: 158.12131| %_neg_is_pos: 0.00169| lr: 0.0| temp: 1.95702 | loss: 1.13983| constrast_loss: 4.4837| div_loss: 0.75612| %_mask_idx: 0.41557| ppl: 156.08047| %_neg_is_pos: 0.00168| lr: 0.0| temp: 1.95702 | loss: 1.14202| constrast_loss: 4.49249| div_loss: 0.75607| %_mask_idx: 0.37296| ppl: 156.11777| %_neg_is_pos: 0.00199| lr: 0.0| temp: 1.95701 | loss: 1.1386| constrast_loss: 4.47875| div_loss: 0.75637| %_mask_idx: 0.37625| ppl: 155.92226| %_neg_is_pos: 0.00283| lr: 0.0| temp: 1.95701 | loss: 1.13645| constrast_loss: 4.47035| div_loss: 0.75446| %_mask_idx: 0.39912| ppl: 157.14282| %_neg_is_pos: 0.00226| lr: 0.0| temp: 1.957 | loss: 1.12793| constrast_loss: 4.4366| div_loss: 0.75111| %_mask_idx: 0.38127| ppl: 159.29208| %_neg_is_pos: 0.00208| lr: 0.0| temp: 1.957 | loss: 1.14178| constrast_loss: 4.49161| div_loss: 0.75527| %_mask_idx: 0.37249| ppl: 156.62708| %_neg_is_pos: 0.00167| lr: 0.0| temp: 1.95698 | loss: 1.13145| constrast_loss: 4.45064| div_loss: 0.75162| %_mask_idx: 0.42638| ppl: 158.96521| %_neg_is_pos: 0.00201| lr: 0.0| temp: 1.95698 | loss: 1.14818| constrast_loss: 4.51752| div_loss: 0.75212| %_mask_idx: 0.43922| ppl: 158.64212| %_neg_is_pos: 0.00161| lr: 0.0| temp: 1.95697 | loss: 1.13391| constrast_loss: 4.46055| div_loss: 0.75103| %_mask_idx: 0.33631| ppl: 159.3421| %_neg_is_pos: 0.00239| lr: 0.0| temp: 1.95697 | loss: 1.14381| constrast_loss: 4.49916| div_loss: 0.76095| %_mask_idx: 0.39944| ppl: 152.99092| %_neg_is_pos: 0.00187| lr: 0.0| temp: 1.95696 | loss: 1.14727| constrast_loss: 4.51365| div_loss: 0.75409| %_mask_idx: 0.44095| ppl: 157.37924| %_neg_is_pos: 0.00138| lr: 0.0| temp: 1.95696 | loss: 1.13489| constrast_loss: 4.46379| div_loss: 0.75789| %_mask_idx: 0.3692| ppl: 154.94975| %_neg_is_pos: 0.00325| lr: 0.0| temp: 1.95695 | loss: 1.12966| constrast_loss: 4.44367| div_loss: 0.74969| %_mask_idx: 0.42951| ppl: 160.19583| %_neg_is_pos: 0.00193| lr: 0.0| temp: 1.95695 | loss: 1.1333| constrast_loss: 4.45736| div_loss: 0.75846| %_mask_idx: 0.44721| ppl: 154.58838| %_neg_is_pos: 0.00217| lr: 0.0| temp: 1.95693 | loss: 1.13563| constrast_loss: 4.46697| div_loss: 0.75557| %_mask_idx: 0.33145| ppl: 156.43539| %_neg_is_pos: 0.00256| lr: 0.0| temp: 1.95693 | loss: 1.13742| constrast_loss: 4.47484| div_loss: 0.74836| %_mask_idx: 0.39395| ppl: 161.04782| %_neg_is_pos: 0.00218| lr: 0.0| temp: 1.95692 | loss: 1.13286| constrast_loss: 4.45576| div_loss: 0.75683| %_mask_idx: 0.3927| ppl: 155.62837| %_neg_is_pos: 0.00278| lr: 0.0| temp: 1.95692 | loss: 1.13573| constrast_loss: 4.46626| div_loss: 0.76653| %_mask_idx: 0.39066| ppl: 149.42242| %_neg_is_pos: 0.00242| lr: 0.0| temp: 1.9569 | loss: 1.14231| constrast_loss: 4.49398| div_loss: 0.75264| %_mask_idx: 0.42262| ppl: 158.3131| %_neg_is_pos: 0.00186| lr: 0.0| temp: 1.9569 | loss: 1.14038| constrast_loss: 4.48542| div_loss: 0.76104| %_mask_idx: 0.40147| ppl: 152.93483| %_neg_is_pos: 0.002| lr: 0.0| temp: 1.95689 | loss: 1.13993| constrast_loss: 4.48434| div_loss: 0.75359| %_mask_idx: 0.36936| ppl: 157.70142| %_neg_is_pos: 0.00237| lr: 0.0| temp: 1.95689 | loss: 1.14253| constrast_loss: 4.49438| div_loss: 0.75755| %_mask_idx: 0.39082| ppl: 155.16545| %_neg_is_pos: 0.00243| lr: 0.0| temp: 1.95688 | loss: 1.13546| constrast_loss: 4.467| div_loss: 0.74837| %_mask_idx: 0.38001| ppl: 161.04056| %_neg_is_pos: 0.00237| lr: 0.0| temp: 1.95688 | loss: 1.14102| constrast_loss: 4.48872| div_loss: 0.75373| %_mask_idx: 0.40414| ppl: 157.61163| %_neg_is_pos: 0.00169| lr: 0.0| temp: 1.95687 | loss: 1.1362| constrast_loss: 4.46888| div_loss: 0.75938| %_mask_idx: 0.36231| ppl: 153.99658| %_neg_is_pos: 0.00184| lr: 0.0| temp: 1.95687 | loss: 1.12874| constrast_loss: 4.43986| div_loss: 0.75101| %_mask_idx: 0.37155| ppl: 159.35622| %_neg_is_pos: 0.00175| lr: 0.0| temp: 1.95685 | loss: 1.13009| constrast_loss: 4.44513| div_loss: 0.75237| %_mask_idx: 0.40351| ppl: 158.48511| %_neg_is_pos: 0.00174| lr: 0.0| temp: 1.95685 | loss: 1.13652| constrast_loss: 4.47018| div_loss: 0.75893| %_mask_idx: 0.35103| ppl: 154.28333| %_neg_is_pos: 0.00255| lr: 0.0| temp: 1.95684 | loss: 1.12876| constrast_loss: 4.43859| div_loss: 0.7646| %_mask_idx: 0.38503| ppl: 150.65529| %_neg_is_pos: 0.00309| lr: 0.0| temp: 1.95684 | loss: 1.14952| constrast_loss: 4.52269| div_loss: 0.75365| %_mask_idx: 0.43045| ppl: 157.66118| %_neg_is_pos: 0.00178| lr: 0.0| temp: 1.95683 | loss: 1.14189| constrast_loss: 4.49279| div_loss: 0.74759| %_mask_idx: 0.39082| ppl: 161.54156| %_neg_is_pos: 0.00152| lr: 0.0| temp: 1.95683 | loss: 1.13376| constrast_loss: 4.45964| div_loss: 0.75412| %_mask_idx: 0.38189| ppl: 157.36267| %_neg_is_pos: 0.00227| lr: 0.0| temp: 1.95682 | loss: 1.14182| constrast_loss: 4.49187| div_loss: 0.75422| %_mask_idx: 0.39145| ppl: 157.30186| %_neg_is_pos: 0.00164| lr: 0.0| temp: 1.95682 | loss: 1.13741| constrast_loss: 4.47392| div_loss: 0.75735| %_mask_idx: 0.30639| ppl: 155.29791| %_neg_is_pos: 0.00203| lr: 0.0| temp: 1.9568 | loss: 1.13776| constrast_loss: 4.47571| div_loss: 0.75336| %_mask_idx: 0.39928| ppl: 157.84644| %_neg_is_pos: 0.00154| lr: 0.0| temp: 1.9568 | loss: 1.14572| constrast_loss: 4.50748| div_loss: 0.75407| %_mask_idx: 0.40351| ppl: 157.39444| %_neg_is_pos: 0.00179| lr: 0.0| temp: 1.95679 | loss: 1.1409| constrast_loss: 4.48878| div_loss: 0.7481| %_mask_idx: 0.39646| ppl: 161.21596| %_neg_is_pos: 0.00169| lr: 0.0| temp: 1.95679 | loss: 1.14232| constrast_loss: 4.49371| div_loss: 0.7559| %_mask_idx: 0.3974| ppl: 156.22414| %_neg_is_pos: 0.00165| lr: 0.0| temp: 1.95678 | loss: 1.13459| constrast_loss: 4.46183| div_loss: 0.76536| %_mask_idx: 0.36842| ppl: 150.16826| %_neg_is_pos: 0.00227| lr: 0.0| temp: 1.95678 | loss: 1.14076| constrast_loss: 4.48735| div_loss: 0.75707| %_mask_idx: 0.38863| ppl: 155.47517| %_neg_is_pos: 0.00197| lr: 0.0| temp: 1.95677 | loss: 1.13986| constrast_loss: 4.48401| div_loss: 0.7543| %_mask_idx: 0.34712| ppl: 157.24739| %_neg_is_pos: 0.00218| lr: 0.0| temp: 1.95677 | loss: 1.13912| constrast_loss: 4.48123| div_loss: 0.75261| %_mask_idx: 0.35182| ppl: 158.32729| %_neg_is_pos: 0.00176| lr: 0.0| temp: 1.95675 | loss: 1.12745| constrast_loss: 4.43426| div_loss: 0.7556| %_mask_idx: 0.36028| ppl: 156.41487| %_neg_is_pos: 0.00277| lr: 0.0| temp: 1.95675 | loss: 1.13535| constrast_loss: 4.46555| div_loss: 0.75839| %_mask_idx: 0.40946| ppl: 154.63193| %_neg_is_pos: 0.00261| lr: 0.0| temp: 1.95674 | loss: 1.13322| constrast_loss: 4.4575| div_loss: 0.75373| %_mask_idx: 0.38064| ppl: 157.6158| %_neg_is_pos: 0.00219| lr: 0.0| temp: 1.95674 [2021-09-02 06:38:51,683] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 06:38:51,684] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.14217| constrast_loss: 4.4936| div_loss: 0.75095| %_mask_idx: 0.38299| ppl: 159.3909| %_neg_is_pos: 0.00261| lr: 0.0| temp: 1.95672 | loss: 1.13442| constrast_loss: 4.46231| div_loss: 0.75381| %_mask_idx: 0.38518| ppl: 157.55893| %_neg_is_pos: 0.00246| lr: 0.0| temp: 1.95672 | loss: 1.14401| constrast_loss: 4.50014| div_loss: 0.75913| %_mask_idx: 0.39223| ppl: 154.15964| %_neg_is_pos: 0.00182| lr: 0.0| temp: 1.95672 | loss: 1.13652| constrast_loss: 4.47079| div_loss: 0.75283| %_mask_idx: 0.39912| ppl: 158.19003| %_neg_is_pos: 0.002| lr: 0.0| temp: 1.95672 | loss: 1.14302| constrast_loss: 4.49628| div_loss: 0.75788| %_mask_idx: 0.37108| ppl: 154.95529| %_neg_is_pos: 0.00278| lr: 0.0| temp: 1.95671 | loss: 1.14348| constrast_loss: 4.49791| div_loss: 0.76001| %_mask_idx: 0.41024| ppl: 153.59172| %_neg_is_pos: 0.0023| lr: 0.0| temp: 1.95671 | loss: 1.13969| constrast_loss: 4.48188| div_loss: 0.76891| %_mask_idx: 0.41917| ppl: 147.89912| %_neg_is_pos: 0.00275| lr: 0.0| temp: 1.9567 | loss: 1.14451| constrast_loss: 4.50151| div_loss: 0.76523| %_mask_idx: 0.36905| ppl: 150.24968| %_neg_is_pos: 0.00293| lr: 0.0| temp: 1.9567 | loss: 1.13448| constrast_loss: 4.46061| div_loss: 0.77302| %_mask_idx: 0.39975| ppl: 145.26627| %_neg_is_pos: 0.00273| lr: 0.0| temp: 1.95668 | loss: 1.14452| constrast_loss: 4.50127| div_loss: 0.7681| %_mask_idx: 0.38894| ppl: 148.41734| %_neg_is_pos: 0.00405| lr: 0.0| temp: 1.95668 | loss: 1.14635| constrast_loss: 4.50766| div_loss: 0.77733| %_mask_idx: 0.38769| ppl: 142.50934| %_neg_is_pos: 0.00281| lr: 0.0| temp: 1.95667 | loss: 1.13322| constrast_loss: 4.45487| div_loss: 0.78014| %_mask_idx: 0.37093| ppl: 140.70784| %_neg_is_pos: 0.0046| lr: 0.0| temp: 1.95667 | loss: 1.14007| constrast_loss: 4.48263| div_loss: 0.77652| %_mask_idx: 0.36106| ppl: 143.02979| %_neg_is_pos: 0.00385| lr: 0.0| temp: 1.95666 | loss: 1.13781| constrast_loss: 4.47303| div_loss: 0.7822| %_mask_idx: 0.41917| ppl: 139.39024| %_neg_is_pos: 0.0044| lr: 0.0| temp: 1.95666 | loss: 1.13313| constrast_loss: 4.45359| div_loss: 0.78926| %_mask_idx: 0.4433| ppl: 134.87408| %_neg_is_pos: 0.00406| lr: 0.0| temp: 1.95665 | loss: 1.14267| constrast_loss: 4.4927| div_loss: 0.77972| %_mask_idx: 0.41275| ppl: 140.98065| %_neg_is_pos: 0.00388| lr: 0.0| temp: 1.95665 | loss: 1.13406| constrast_loss: 4.45788| div_loss: 0.78357| %_mask_idx: 0.40508| ppl: 138.51343| %_neg_is_pos: 0.0049| lr: 0.0| temp: 1.95663| loss: 1.13413| constrast_loss: 4.45871| div_loss: 0.77815| %_mask_idx: 0.35636| ppl: 141.98552| %_neg_is_pos: 0.00435| lr: 0.0| temp: 1.95663 | loss: 1.13337| constrast_loss: 4.45439| div_loss: 0.79103| %_mask_idx: 0.36028| ppl: 133.73891| %_neg_is_pos: 0.0046| lr: 0.0| temp: 1.95662 | loss: 1.13766| constrast_loss: 4.47165| div_loss: 0.78992| %_mask_idx: 0.34226| ppl: 134.4519| %_neg_is_pos: 0.0052| lr: 0.0| temp: 1.95662 | loss: 1.14547| constrast_loss: 4.50293| div_loss: 0.78949| %_mask_idx: 0.39568| ppl: 134.72455| %_neg_is_pos: 0.00319| lr: 0.0| temp: 1.95661 | loss: 1.14265| constrast_loss: 4.49144| div_loss: 0.79146| %_mask_idx: 0.37124| ppl: 133.46317| %_neg_is_pos: 0.00378| lr: 0.0| temp: 1.95661 | loss: 1.13692| constrast_loss: 4.46834| div_loss: 0.7935| %_mask_idx: 0.41823| ppl: 132.15724| %_neg_is_pos: 0.00436| lr: 0.0| temp: 1.9566 | loss: 1.14133| constrast_loss: 4.48651| div_loss: 0.78828| %_mask_idx: 0.45269| ppl: 135.50081| %_neg_is_pos: 0.00364| lr: 0.0| temp: 1.9566 | loss: 1.13042| constrast_loss: 4.44221| div_loss: 0.79454| %_mask_idx: 0.35307| ppl: 131.49248| %_neg_is_pos: 0.00458| lr: 0.0| temp: 1.95658| loss: 1.14205| constrast_loss: 4.48845| div_loss: 0.79743| %_mask_idx: 0.37563| ppl: 129.6458| %_neg_is_pos: 0.00383| lr: 0.0| temp: 1.95658 | loss: 1.14003| constrast_loss: 4.48045| div_loss: 0.79695| %_mask_idx: 0.40555| ppl: 129.95461| %_neg_is_pos: 0.00499| lr: 0.0| temp: 1.95657 | loss: 1.14353| constrast_loss: 4.494| div_loss: 0.80108| %_mask_idx: 0.40539| ppl: 127.30901| %_neg_is_pos: 0.00394| lr: 0.0| temp: 1.95657 | loss: 1.13529| constrast_loss: 4.46047| div_loss: 0.80689| %_mask_idx: 0.42575| ppl: 123.59093| %_neg_is_pos: 0.00615| lr: 0.0| temp: 1.95655 | loss: 1.13968| constrast_loss: 4.47839| div_loss: 0.80315| %_mask_idx: 0.38283| ppl: 125.98105| %_neg_is_pos: 0.00537| lr: 0.0| temp: 1.95655 | loss: 1.12847| constrast_loss: 4.43343| div_loss: 0.80439| %_mask_idx: 0.40946| ppl: 125.18818| %_neg_is_pos: 0.00556| lr: 0.0| temp: 1.95654 | loss: 1.14562| constrast_loss: 4.50267| div_loss: 0.79802| %_mask_idx: 0.3833| ppl: 129.26523| %_neg_is_pos: 0.00308| lr: 0.0| temp: 1.95654 | loss: 1.13551| constrast_loss: 4.46327| div_loss: 0.78781| %_mask_idx: 0.42309| ppl: 135.80313| %_neg_is_pos: 0.00444| lr: 0.0| temp: 1.95653 | loss: 1.14071| constrast_loss: 4.48326| div_loss: 0.79588| %_mask_idx: 0.3808| ppl: 130.63858| %_neg_is_pos: 0.00342| lr: 0.0| temp: 1.95653 | loss: 1.13805| constrast_loss: 4.47182| div_loss: 0.80382| %_mask_idx: 0.34915| ppl: 125.55305| %_neg_is_pos: 0.00492| lr: 0.0| temp: 1.95652 | loss: 1.13146| constrast_loss: 4.44614| div_loss: 0.7969| %_mask_idx: 0.37281| ppl: 129.98294| %_neg_is_pos: 0.00464| lr: 0.0| temp: 1.95652 | loss: 1.13807| constrast_loss: 4.47137| div_loss: 0.80916| %_mask_idx: 0.37202| ppl: 122.13861| %_neg_is_pos: 0.00708| lr: 0.0| temp: 1.9565 | loss: 1.13265| constrast_loss: 4.45128| div_loss: 0.79316| %_mask_idx: 0.35354| ppl: 132.37781| %_neg_is_pos: 0.00494| lr: 0.0| temp: 1.9565 | loss: 1.14245| constrast_loss: 4.49018| div_loss: 0.79616| %_mask_idx: 0.38581| ppl: 130.45619| %_neg_is_pos: 0.00401| lr: 0.0| temp: 1.95649 | loss: 1.14066| constrast_loss: 4.48281| div_loss: 0.79826| %_mask_idx: 0.38283| ppl: 129.11201| %_neg_is_pos: 0.00498| lr: 0.0| temp: 1.95649 | loss: 1.13076| constrast_loss: 4.44264| div_loss: 0.80416| %_mask_idx: 0.35636| ppl: 125.3348| %_neg_is_pos: 0.00532| lr: 0.0| temp: 1.95648 | loss: 1.13592| constrast_loss: 4.46356| div_loss: 0.80105| %_mask_idx: 0.44706| ppl: 127.33067| %_neg_is_pos: 0.00422| lr: 0.0| temp: 1.95648 | loss: 1.13495| constrast_loss: 4.45986| div_loss: 0.79947| %_mask_idx: 0.43108| ppl: 128.33882| %_neg_is_pos: 0.00509| lr: 0.0| temp: 1.95647 | loss: 1.13124| constrast_loss: 4.44556| div_loss: 0.79404| %_mask_idx: 0.36278| ppl: 131.81407| %_neg_is_pos: 0.0052| lr: 0.0| temp: 1.95647 | loss: 1.13167| constrast_loss: 4.44671| div_loss: 0.79956| %_mask_idx: 0.3407| ppl: 128.28239| %_neg_is_pos: 0.00586| lr: 0.0| temp: 1.95645 | loss: 1.13455| constrast_loss: 4.45809| div_loss: 0.8012| %_mask_idx: 0.40398| ppl: 127.22921| %_neg_is_pos: 0.00529| lr: 0.0| temp: 1.95645 | loss: 1.13802| constrast_loss: 4.4723| div_loss: 0.79775| %_mask_idx: 0.39536| ppl: 129.44229| %_neg_is_pos: 0.0046| lr: 0.0| temp: 1.95644 | loss: 1.11937| constrast_loss: 4.39707| div_loss: 0.8043| %_mask_idx: 0.36372| ppl: 125.25054| %_neg_is_pos: 0.00627| lr: 0.0| temp: 1.95644 | loss: 1.13865| constrast_loss: 4.47509| div_loss: 0.79497| %_mask_idx: 0.36043| ppl: 131.21703| %_neg_is_pos: 0.0043| lr: 0.0| temp: 1.95643 | loss: 1.13533| constrast_loss: 4.46091| div_loss: 0.80429| %_mask_idx: 0.38925| ppl: 125.25338| %_neg_is_pos: 0.00729| lr: 0.0| temp: 1.95643 | loss: 1.13593| constrast_loss: 4.46332| div_loss: 0.80413| %_mask_idx: 0.4198| ppl: 125.35654| %_neg_is_pos: 0.00654| lr: 0.0| temp: 1.95642 | loss: 1.13679| constrast_loss: 4.46704| div_loss: 0.80122| %_mask_idx: 0.42888| ppl: 127.22018| %_neg_is_pos: 0.0045| lr: 0.0| temp: 1.95642 | loss: 1.13699| constrast_loss: 4.4684| div_loss: 0.79551| %_mask_idx: 0.40367| ppl: 130.87619| %_neg_is_pos: 0.00345| lr: 0.0| temp: 1.9564 | loss: 1.13624| constrast_loss: 4.46483| div_loss: 0.80132| %_mask_idx: 0.38283| ppl: 127.1543| %_neg_is_pos: 0.0051| lr: 0.0| temp: 1.9564 | loss: 1.1236| constrast_loss: 4.41435| div_loss: 0.80028| %_mask_idx: 0.3407| ppl: 127.8176| %_neg_is_pos: 0.00625| lr: 0.0| temp: 1.95639 | loss: 1.14252| constrast_loss: 4.48969| div_loss: 0.80395| %_mask_idx: 0.39019| ppl: 125.47275| %_neg_is_pos: 0.00697| lr: 0.0| temp: 1.95639 | loss: 1.13537| constrast_loss: 4.46184| div_loss: 0.79647| %_mask_idx: 0.39129| ppl: 130.26102| %_neg_is_pos: 0.00513| lr: 0.0| temp: 1.95637 | loss: 1.1308| constrast_loss: 4.44275| div_loss: 0.80436| %_mask_idx: 0.34539| ppl: 125.20936| %_neg_is_pos: 0.00548| lr: 0.0| temp: 1.95637 | loss: 1.14235| constrast_loss: 4.48923| div_loss: 0.80153| %_mask_idx: 0.39207| ppl: 127.01836| %_neg_is_pos: 0.00354| lr: 0.0| temp: 1.95636 | loss: 1.1326| constrast_loss: 4.45012| div_loss: 0.80267| %_mask_idx: 0.37516| ppl: 126.29218| %_neg_is_pos: 0.00513| lr: 0.0| temp: 1.95636 | loss: 1.1377| constrast_loss: 4.47132| div_loss: 0.79465| %_mask_idx: 0.3974| ppl: 131.42523| %_neg_is_pos: 0.00466| lr: 0.0| temp: 1.95635 | loss: 1.1232| constrast_loss: 4.41326| div_loss: 0.79551| %_mask_idx: 0.35902| ppl: 130.87502| %_neg_is_pos: 0.00681| lr: 0.0| temp: 1.95635 | loss: 1.1469| constrast_loss: 4.50716| div_loss: 0.80424| %_mask_idx: 0.37234| ppl: 125.28709| %_neg_is_pos: 0.00541| lr: 0.0| temp: 1.95634 | loss: 1.13354| constrast_loss: 4.45474| div_loss: 0.79421| %_mask_idx: 0.38769| ppl: 131.70718| %_neg_is_pos: 0.00447| lr: 0.0| temp: 1.95634 | loss: 1.13185| constrast_loss: 4.44765| div_loss: 0.79746| %_mask_idx: 0.39004| ppl: 129.62265| %_neg_is_pos: 0.00615| lr: 0.0| temp: 1.95632 | loss: 1.13599| constrast_loss: 4.46444| div_loss: 0.79523| %_mask_idx: 0.40899| ppl: 131.05353| %_neg_is_pos: 0.00539| lr: 0.0| temp: 1.95632 | loss: 1.12671| constrast_loss: 4.42685| div_loss: 0.79985| %_mask_idx: 0.3692| ppl: 128.09898| %_neg_is_pos: 0.00626| lr: 0.0| temp: 1.95631 | loss: 1.14188| constrast_loss: 4.48823| div_loss: 0.79293| %_mask_idx: 0.43092| ppl: 132.52435| %_neg_is_pos: 0.00383| lr: 0.0| temp: 1.95631 | loss: 1.13411| constrast_loss: 4.45646| div_loss: 0.79971| %_mask_idx: 0.38001| ppl: 128.18851| %_neg_is_pos: 0.00507| lr: 0.0| temp: 1.9563 | loss: 1.13818| constrast_loss: 4.47314| div_loss: 0.79562| %_mask_idx: 0.43797| ppl: 130.80345| %_neg_is_pos: 0.00441| lr: 0.0| temp: 1.9563 | loss: 1.12812| constrast_loss: 4.43308| div_loss: 0.79419| %_mask_idx: 0.347| ppl: 131.71716| %_neg_is_pos: 0.00493| lr: 0.0| temp: 1.95629 | loss: 1.13719| constrast_loss: 4.46932| div_loss: 0.79431| %_mask_idx: 0.38612| ppl: 131.64342| %_neg_is_pos: 0.00517| lr: 0.0| temp: 1.95629 | loss: 1.13288| constrast_loss: 4.45162| div_loss: 0.79914| %_mask_idx: 0.40648| ppl: 128.5511| %_neg_is_pos: 0.00572| lr: 0.0| temp: 1.95627 | loss: 1.13042| constrast_loss: 4.44163| div_loss: 0.80048| %_mask_idx: 0.36889| ppl: 127.69107| %_neg_is_pos: 0.00662| lr: 0.0| temp: 1.95627 | loss: 1.1211| constrast_loss: 4.40385| div_loss: 0.80562| %_mask_idx: 0.33506| ppl: 124.40541| %_neg_is_pos: 0.0064| lr: 0.0| temp: 1.95627 | loss: 1.12513| constrast_loss: 4.42111| div_loss: 0.79425| %_mask_idx: 0.43954| ppl: 131.68134| %_neg_is_pos: 0.00524| lr: 0.0| temp: 1.95627 | loss: 1.1286| constrast_loss: 4.43492| div_loss: 0.79459| %_mask_idx: 0.34023| ppl: 131.46448| %_neg_is_pos: 0.00532| lr: 0.0| temp: 1.95626 | loss: 1.14174| constrast_loss: 4.48687| div_loss: 0.80084| %_mask_idx: 0.39521| ppl: 127.46351| %_neg_is_pos: 0.00584| lr: 0.0| temp: 1.95626 | loss: 1.1272| constrast_loss: 4.42878| div_loss: 0.80001| %_mask_idx: 0.37625| ppl: 127.99255| %_neg_is_pos: 0.0058| lr: 0.0| temp: 1.95625 | loss: 1.14211| constrast_loss: 4.48904| div_loss: 0.7941| %_mask_idx: 0.37907| ppl: 131.77913| %_neg_is_pos: 0.00349| lr: 0.0| temp: 1.95625 | loss: 1.13667| constrast_loss: 4.46539| div_loss: 0.81273| %_mask_idx: 0.42528| ppl: 119.85249| %_neg_is_pos: 0.00598| lr: 0.0| temp: 1.95623 | loss: 1.13358| constrast_loss: 4.45416| div_loss: 0.80164| %_mask_idx: 0.38017| ppl: 126.94743| %_neg_is_pos: 0.00475| lr: 0.0| temp: 1.95623 | loss: 1.1338| constrast_loss: 4.45563| div_loss: 0.79579| %_mask_idx: 0.35041| ppl: 130.69344| %_neg_is_pos: 0.0076| lr: 0.0| temp: 1.95622 | loss: 1.14378| constrast_loss: 4.49585| div_loss: 0.79257| %_mask_idx: 0.39662| ppl: 132.75728| %_neg_is_pos: 0.00396| lr: 0.0| temp: 1.95622 | loss: 1.14017| constrast_loss: 4.48174| div_loss: 0.78942| %_mask_idx: 0.36905| ppl: 134.77016| %_neg_is_pos: 0.00327| lr: 0.0| temp: 1.9562 | loss: 1.1411| constrast_loss: 4.48403| div_loss: 0.80379| %_mask_idx: 0.38158| ppl: 125.57285| %_neg_is_pos: 0.00656| lr: 0.0| temp: 1.9562 | loss: 1.12832| constrast_loss: 4.43263| div_loss: 0.80631| %_mask_idx: 0.40414| ppl: 123.96278| %_neg_is_pos: 0.00739| lr: 0.0| temp: 1.95619 | loss: 1.13533| constrast_loss: 4.46105| div_loss: 0.80255| %_mask_idx: 0.35495| ppl: 126.3712| %_neg_is_pos: 0.00401| lr: 0.0| temp: 1.95619 | loss: 1.12915| constrast_loss: 4.43625| div_loss: 0.8036| %_mask_idx: 0.42325| ppl: 125.69923| %_neg_is_pos: 0.00444| lr: 0.0| temp: 1.95618 | loss: 1.13887| constrast_loss: 4.47526| div_loss: 0.80225| %_mask_idx: 0.4234| ppl: 126.55864| %_neg_is_pos: 0.00372| lr: 0.0| temp: 1.95618 | loss: 1.13079| constrast_loss: 4.44377| div_loss: 0.79371| %_mask_idx: 0.39505| ppl: 132.02545| %_neg_is_pos: 0.0057| lr: 0.0| temp: 1.95617 | loss: 1.14115| constrast_loss: 4.48546| div_loss: 0.79127| %_mask_idx: 0.39568| ppl: 133.58603| %_neg_is_pos: 0.00545| lr: 0.0| temp: 1.95617 | loss: 1.1441| constrast_loss: 4.49569| div_loss: 0.80721| %_mask_idx: 0.34524| ppl: 123.38806| %_neg_is_pos: 0.00501| lr: 0.0| temp: 1.95615 | loss: 1.13688| constrast_loss: 4.4684| div_loss: 0.79118| %_mask_idx: 0.36028| ppl: 133.64638| %_neg_is_pos: 0.00374| lr: 0.0| temp: 1.95615 | loss: 1.14066| constrast_loss: 4.48298| div_loss: 0.7967| %_mask_idx: 0.35182| ppl: 130.112| %_neg_is_pos: 0.00535| lr: 0.0| temp: 1.95614 | loss: 1.13819| constrast_loss: 4.4724| div_loss: 0.80376| %_mask_idx: 0.42293| ppl: 125.59177| %_neg_is_pos: 0.00605| lr: 0.0| temp: 1.95614 | loss: 1.13925| constrast_loss: 4.47626| div_loss: 0.80717| %_mask_idx: 0.3183| ppl: 123.40819| %_neg_is_pos: 0.0037| lr: 0.0| temp: 1.95613 | loss: 1.13598| constrast_loss: 4.46327| div_loss: 0.80654| %_mask_idx: 0.40398| ppl: 123.81293| %_neg_is_pos: 0.00468| lr: 0.0| temp: 1.95613 | loss: 1.13904| constrast_loss: 4.47642| div_loss: 0.79755| %_mask_idx: 0.40821| ppl: 129.56541| %_neg_is_pos: 0.00504| lr: 0.0| temp: 1.95612 | loss: 1.13647| constrast_loss: 4.46541| div_loss: 0.80461| %_mask_idx: 0.41087| ppl: 125.05154| %_neg_is_pos: 0.0051| lr: 0.0| temp: 1.95612 | loss: 1.13092| constrast_loss: 4.44373| div_loss: 0.79967| %_mask_idx: 0.401| ppl: 128.21133| %_neg_is_pos: 0.00505| lr: 0.0| temp: 1.9561 | loss: 1.14| constrast_loss: 4.47989| div_loss: 0.80108| %_mask_idx: 0.41557| ppl: 127.30946| %_neg_is_pos: 0.00451| lr: 0.0| temp: 1.9561 | loss: 1.13652| constrast_loss: 4.46562| div_loss: 0.8044| %_mask_idx: 0.40508| ppl: 125.1822| %_neg_is_pos: 0.00417| lr: 0.0| temp: 1.95609 | loss: 1.14367| constrast_loss: 4.49498| div_loss: 0.79703| %_mask_idx: 0.42935| ppl: 129.90221| %_neg_is_pos: 0.00424| lr: 0.0| temp: 1.95609 | loss: 1.13756| constrast_loss: 4.46944| div_loss: 0.80796| %_mask_idx: 0.39223| ppl: 122.90602| %_neg_is_pos: 0.00519| lr: 0.0| temp: 1.95608 | loss: 1.13244| constrast_loss: 4.44938| div_loss: 0.80369| %_mask_idx: 0.42873| ppl: 125.63708| %_neg_is_pos: 0.0047| lr: 0.0| temp: 1.95608 | loss: 1.12856| constrast_loss: 4.43432| div_loss: 0.7993| %_mask_idx: 0.29151| ppl: 128.4503| %_neg_is_pos: 0.00711| lr: 0.0| temp: 1.95607 | loss: 1.14273| constrast_loss: 4.49013| div_loss: 0.80803| %_mask_idx: 0.40648| ppl: 122.86153| %_neg_is_pos: 0.00585| lr: 0.0| temp: 1.95607 | loss: 1.13912| constrast_loss: 4.47647| div_loss: 0.80012| %_mask_idx: 0.39912| ppl: 127.92498| %_neg_is_pos: 0.00556| lr: 0.0| temp: 1.95605 | loss: 1.14339| constrast_loss: 4.49306| div_loss: 0.80507| %_mask_idx: 0.4057| ppl: 124.75408| %_neg_is_pos: 0.00517| lr: 0.0| temp: 1.95605 | loss: 1.13804| constrast_loss: 4.47312| div_loss: 0.79047| %_mask_idx: 0.37986| ppl: 134.09656| %_neg_is_pos: 0.00345| lr: 0.0| temp: 1.95604 | loss: 1.13174| constrast_loss: 4.44634| div_loss: 0.80631| %_mask_idx: 0.39427| ppl: 123.96229| %_neg_is_pos: 0.00422| lr: 0.0| temp: 1.95604 [2021-09-02 06:48:06,106] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 06:48:06,106] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.13757| constrast_loss: 4.47067| div_loss: 0.79611| %_mask_idx: 0.38283| ppl: 130.49139| %_neg_is_pos: 0.00527| lr: 0.0| temp: 1.95602 | loss: 1.12761| constrast_loss: 4.43043| div_loss: 0.79995| %_mask_idx: 0.40695| ppl: 128.02939| %_neg_is_pos: 0.00481| lr: 0.0| temp: 1.95602 | loss: 1.13828| constrast_loss: 4.47253| div_loss: 0.80599| %_mask_idx: 0.39709| ppl: 124.16346| %_neg_is_pos: 0.00747| lr: 0.0| temp: 1.95601 | loss: 1.14137| constrast_loss: 4.48398| div_loss: 0.81481| %_mask_idx: 0.41698| ppl: 118.52056| %_neg_is_pos: 0.00574| lr: 0.0| temp: 1.95601 | loss: 1.13866| constrast_loss: 4.4748| div_loss: 0.79836| %_mask_idx: 0.40523| ppl: 129.04971| %_neg_is_pos: 0.0053| lr: 0.0| temp: 1.956 | loss: 1.13084| constrast_loss: 4.44242| div_loss: 0.80941| %_mask_idx: 0.40476| ppl: 121.98074| %_neg_is_pos: 0.00635| lr: 0.0| temp: 1.956 | loss: 1.14088| constrast_loss: 4.48306| div_loss: 0.80479| %_mask_idx: 0.36811| ppl: 124.93304| %_neg_is_pos: 0.0072| lr: 0.0| temp: 1.95599 | loss: 1.13385| constrast_loss: 4.45512| div_loss: 0.80301| %_mask_idx: 0.42497| ppl: 126.0747| %_neg_is_pos: 0.00577| lr: 0.0| temp: 1.95599 | loss: 1.13904| constrast_loss: 4.47453| div_loss: 0.81626| %_mask_idx: 0.31673| ppl: 117.59676| %_neg_is_pos: 0.00868| lr: 0.0| temp: 1.95597 | loss: 1.13221| constrast_loss: 4.44796| div_loss: 0.80873| %_mask_idx: 0.38925| ppl: 122.41414| %_neg_is_pos: 0.00793| lr: 0.0| temp: 1.95597 | loss: 1.12693| constrast_loss: 4.42555| div_loss: 0.82182| %_mask_idx: 0.375| ppl: 114.03461| %_neg_is_pos: 0.00972| lr: 0.0| temp: 1.95596 | loss: 1.13321| constrast_loss: 4.45111| div_loss: 0.81729| %_mask_idx: 0.40977| ppl: 116.93343| %_neg_is_pos: 0.00622| lr: 0.0| temp: 1.95596 | loss: 1.13158| constrast_loss: 4.44462| div_loss: 0.81718| %_mask_idx: 0.30388| ppl: 117.00483| %_neg_is_pos: 0.00583| lr: 0.0| temp: 1.95595 | loss: 1.13402| constrast_loss: 4.4548| div_loss: 0.81267| %_mask_idx: 0.36184| ppl: 119.89256| %_neg_is_pos: 0.00658| lr: 0.0| temp: 1.95595 | loss: 1.13774| constrast_loss: 4.46956| div_loss: 0.81403| %_mask_idx: 0.36325| ppl: 119.01868| %_neg_is_pos: 0.00618| lr: 0.0| temp: 1.95594 | loss: 1.13575| constrast_loss: 4.46114| div_loss: 0.81878| %_mask_idx: 0.35871| ppl: 115.97868| %_neg_is_pos: 0.00729| lr: 0.0| temp: 1.95594 | loss: 1.14171| constrast_loss: 4.48508| div_loss: 0.81758| %_mask_idx: 0.43249| ppl: 116.74583| %_neg_is_pos: 0.00564| lr: 0.0| temp: 1.95592 | loss: 1.13658| constrast_loss: 4.46509| div_loss: 0.81226| %_mask_idx: 0.3844| ppl: 120.15365| %_neg_is_pos: 0.00599| lr: 0.0| temp: 1.95592 | loss: 1.14066| constrast_loss: 4.48086| div_loss: 0.81792| %_mask_idx: 0.38033| ppl: 116.52917| %_neg_is_pos: 0.00692| lr: 0.0| temp: 1.95591 | loss: 1.12742| constrast_loss: 4.42797| div_loss: 0.81695| %_mask_idx: 0.41949| ppl: 117.14941| %_neg_is_pos: 0.00676| lr: 0.0| temp: 1.95591 | loss: 1.1459| constrast_loss: 4.5012| div_loss: 0.82414| %_mask_idx: 0.42951| ppl: 112.54865| %_neg_is_pos: 0.00654| lr: 0.0| temp: 1.9559 | loss: 1.14485| constrast_loss: 4.496| div_loss: 0.83398| %_mask_idx: 0.39787| ppl: 106.2543| %_neg_is_pos: 0.00709| lr: 0.0| temp: 1.9559 | loss: 1.13444| constrast_loss: 4.45479| div_loss: 0.82972| %_mask_idx: 0.34258| ppl: 108.9786| %_neg_is_pos: 0.00617| lr: 0.0| temp: 1.95589 | loss: 1.12873| constrast_loss: 4.43237| div_loss: 0.82538| %_mask_idx: 0.3714| ppl: 111.75648| %_neg_is_pos: 0.00712| lr: 0.0| temp: 1.95589 | loss: 1.13557| constrast_loss: 4.45898| div_loss: 0.83305| %_mask_idx: 0.40883| ppl: 106.8479| %_neg_is_pos: 0.00703| lr: 0.0| temp: 1.95587 | loss: 1.13303| constrast_loss: 4.44911| div_loss: 0.83015| %_mask_idx: 0.42857| ppl: 108.70238| %_neg_is_pos: 0.00728| lr: 0.0| temp: 1.95587 | loss: 1.14384| constrast_loss: 4.49159| div_loss: 0.8377| %_mask_idx: 0.40805| ppl: 103.87404| %_neg_is_pos: 0.00773| lr: 0.0| temp: 1.95586 | loss: 1.14147| constrast_loss: 4.48248| div_loss: 0.83417| %_mask_idx: 0.37829| ppl: 106.13428| %_neg_is_pos: 0.00822| lr: 0.0| temp: 1.95586 | loss: 1.13147| constrast_loss: 4.44217| div_loss: 0.83698| %_mask_idx: 0.39646| ppl: 104.33351| %_neg_is_pos: 0.0067| lr: 0.0| temp: 1.95584 | loss: 1.1389| constrast_loss: 4.4719| div_loss: 0.83687| %_mask_idx: 0.37719| ppl: 104.40604| %_neg_is_pos: 0.00847| lr: 0.0| temp: 1.95584 | loss: 1.13823| constrast_loss: 4.46957| div_loss: 0.83331| %_mask_idx: 0.36607| ppl: 106.67891| %_neg_is_pos: 0.00627| lr: 0.0| temp: 1.95583 | loss: 1.12756| constrast_loss: 4.42714| div_loss: 0.83116| %_mask_idx: 0.37939| ppl: 108.05611| %_neg_is_pos: 0.00777| lr: 0.0| temp: 1.95583 | loss: 1.14058| constrast_loss: 4.47866| div_loss: 0.83648| %_mask_idx: 0.39975| ppl: 104.65426| %_neg_is_pos: 0.00845| lr: 0.0| temp: 1.95583 | loss: 1.12574| constrast_loss: 4.41885| div_loss: 0.84094| %_mask_idx: 0.41886| ppl: 101.79794| %_neg_is_pos: 0.00765| lr: 0.0| temp: 1.95583 | loss: 1.14647| constrast_loss: 4.50149| div_loss: 0.84392| %_mask_idx: 0.41432| ppl: 99.8899| %_neg_is_pos: 0.00794| lr: 0.0| temp: 1.95582 | loss: 1.14624| constrast_loss: 4.50147| div_loss: 0.83501| %_mask_idx: 0.41494| ppl: 105.59428| %_neg_is_pos: 0.00755| lr: 0.0| temp: 1.95582 | loss: 1.14157| constrast_loss: 4.48342| div_loss: 0.82875| %_mask_idx: 0.38706| ppl: 109.59769| %_neg_is_pos: 0.00753| lr: 0.0| temp: 1.9558 | loss: 1.14027| constrast_loss: 4.47748| div_loss: 0.83612| %_mask_idx: 0.3963| ppl: 104.88136| %_neg_is_pos: 0.00706| lr: 0.0| temp: 1.9558 | loss: 1.13541| constrast_loss: 4.45805| div_loss: 0.83575| %_mask_idx: 0.42184| ppl: 105.11801| %_neg_is_pos: 0.00691| lr: 0.0| temp: 1.95579 | loss: 1.13589| constrast_loss: 4.45938| div_loss: 0.84175| %_mask_idx: 0.40022| ppl: 101.28185| %_neg_is_pos: 0.00781| lr: 0.0| temp: 1.95579 | loss: 1.14251| constrast_loss: 4.48573| div_loss: 0.84324| %_mask_idx: 0.35949| ppl: 100.32394| %_neg_is_pos: 0.00733| lr: 0.0| temp: 1.95578 | loss: 1.14354| constrast_loss: 4.49074| div_loss: 0.83413| %_mask_idx: 0.37437| ppl: 106.15745| %_neg_is_pos: 0.00649| lr: 0.0| temp: 1.95578 | loss: 1.14322| constrast_loss: 4.48946| div_loss: 0.83426| %_mask_idx: 0.36451| ppl: 106.07667| %_neg_is_pos: 0.00743| lr: 0.0| temp: 1.95577 | loss: 1.13286| constrast_loss: 4.44726| div_loss: 0.84195| %_mask_idx: 0.40304| ppl: 101.15276| %_neg_is_pos: 0.00723| lr: 0.0| temp: 1.95577 | loss: 1.14131| constrast_loss: 4.48105| div_loss: 0.84186| %_mask_idx: 0.37093| ppl: 101.2112| %_neg_is_pos: 0.0077| lr: 0.0| temp: 1.95575 | loss: 1.14341| constrast_loss: 4.48938| div_loss: 0.84259| %_mask_idx: 0.37265| ppl: 100.74301| %_neg_is_pos: 0.00755| lr: 0.0| temp: 1.95575 | loss: 1.13437| constrast_loss: 4.45374| div_loss: 0.83747| %_mask_idx: 0.40398| ppl: 104.02081| %_neg_is_pos: 0.00734| lr: 0.0| temp: 1.95574 | loss: 1.14256| constrast_loss: 4.48683| div_loss: 0.83421| %_mask_idx: 0.38471| ppl: 106.10264| %_neg_is_pos: 0.008| lr: 0.0| temp: 1.95574 | loss: 1.13675| constrast_loss: 4.4635| div_loss: 0.83492| %_mask_idx: 0.38001| ppl: 105.64826| %_neg_is_pos: 0.00655| lr: 0.0| temp: 1.95573 | loss: 1.12637| constrast_loss: 4.42203| div_loss: 0.83467| %_mask_idx: 0.37766| ppl: 105.80831| %_neg_is_pos: 0.00802| lr: 0.0| temp: 1.95573 | loss: 1.1314| constrast_loss: 4.44206| div_loss: 0.83547| %_mask_idx: 0.34962| ppl: 105.29729| %_neg_is_pos: 0.00806| lr: 0.0| temp: 1.95572 | loss: 1.12496| constrast_loss: 4.41662| div_loss: 0.83228| %_mask_idx: 0.40727| ppl: 107.33957| %_neg_is_pos: 0.00696| lr: 0.0| temp: 1.95572 | loss: 1.12519| constrast_loss: 4.41742| div_loss: 0.83336| %_mask_idx: 0.36153| ppl: 106.64944| %_neg_is_pos: 0.00751| lr: 0.0| temp: 1.9557 | loss: 1.13791| constrast_loss: 4.46811| div_loss: 0.83523| %_mask_idx: 0.41729| ppl: 105.45056| %_neg_is_pos: 0.00704| lr: 0.0| temp: 1.9557 | loss: 1.1372| constrast_loss: 4.46519| div_loss: 0.83607| %_mask_idx: 0.33788| ppl: 104.91312| %_neg_is_pos: 0.00825| lr: 0.0| temp: 1.95569 | loss: 1.13402| constrast_loss: 4.45188| div_loss: 0.84199| %_mask_idx: 0.38503| ppl: 101.12428| %_neg_is_pos: 0.00828| lr: 0.0| temp: 1.95569 | loss: 1.14509| constrast_loss: 4.49708| div_loss: 0.83286| %_mask_idx: 0.45207| ppl: 106.97142| %_neg_is_pos: 0.00844| lr: 0.0| temp: 1.95567 | loss: 1.13888| constrast_loss: 4.47167| div_loss: 0.83832| %_mask_idx: 0.38393| ppl: 103.47787| %_neg_is_pos: 0.00801| lr: 0.0| temp: 1.95567 | loss: 1.14298| constrast_loss: 4.48871| div_loss: 0.83202| %_mask_idx: 0.40586| ppl: 107.50654| %_neg_is_pos: 0.00705| lr: 0.0| temp: 1.95566 | loss: 1.13973| constrast_loss: 4.47554| div_loss: 0.83374| %_mask_idx: 0.39192| ppl: 106.40833| %_neg_is_pos: 0.00753| lr: 0.0| temp: 1.95566 | loss: 1.13251| constrast_loss: 4.44613| div_loss: 0.83934| %_mask_idx: 0.40241| ppl: 102.82462| %_neg_is_pos: 0.00714| lr: 0.0| temp: 1.95565 | loss: 1.13828| constrast_loss: 4.47002| div_loss: 0.83084| %_mask_idx: 0.35338| ppl: 108.26411| %_neg_is_pos: 0.00535| lr: 0.0| temp: 1.95565 | loss: 1.14756| constrast_loss: 4.50682| div_loss: 0.83418| %_mask_idx: 0.37093| ppl: 106.12536| %_neg_is_pos: 0.00683| lr: 0.0| temp: 1.95564 | loss: 1.13794| constrast_loss: 4.46775| div_loss: 0.84008| %_mask_idx: 0.37594| ppl: 102.35094| %_neg_is_pos: 0.00849| lr: 0.0| temp: 1.95564 | loss: 1.13848| constrast_loss: 4.47081| div_loss: 0.83096| %_mask_idx: 0.36404| ppl: 108.18586| %_neg_is_pos: 0.00586| lr: 0.0| temp: 1.95562 | loss: 1.13952| constrast_loss: 4.4745| div_loss: 0.83569| %_mask_idx: 0.39928| ppl: 105.15929| %_neg_is_pos: 0.0072| lr: 0.0| temp: 1.95562 | loss: 1.13481| constrast_loss: 4.45535| div_loss: 0.83893| %_mask_idx: 0.35182| ppl: 103.08274| %_neg_is_pos: 0.0075| lr: 0.0| temp: 1.95561 | loss: 1.13064| constrast_loss: 4.43931| div_loss: 0.83254| %_mask_idx: 0.36544| ppl: 107.1755| %_neg_is_pos: 0.00837| lr: 0.0| temp: 1.95561 | loss: 1.14794| constrast_loss: 4.50828| div_loss: 0.83469| %_mask_idx: 0.40617| ppl: 105.80007| %_neg_is_pos: 0.00679| lr: 0.0| temp: 1.9556 | loss: 1.13394| constrast_loss: 4.45208| div_loss: 0.83679| %_mask_idx: 0.36012| ppl: 104.45373| %_neg_is_pos: 0.00841| lr: 0.0| temp: 1.9556 | loss: 1.13483| constrast_loss: 4.45561| div_loss: 0.83702| %_mask_idx: 0.37171| ppl: 104.3087| %_neg_is_pos: 0.00653| lr: 0.0| temp: 1.95559 | loss: 1.13455| constrast_loss: 4.45427| div_loss: 0.83911| %_mask_idx: 0.40821| ppl: 102.97172| %_neg_is_pos: 0.00718| lr: 0.0| temp: 1.95559 | loss: 1.13344| constrast_loss: 4.44999| div_loss: 0.83775| %_mask_idx: 0.3844| ppl: 103.84075| %_neg_is_pos: 0.00721| lr: 0.0| temp: 1.95557 | loss: 1.14072| constrast_loss: 4.47943| div_loss: 0.83468| %_mask_idx: 0.4541| ppl: 105.80734| %_neg_is_pos: 0.00669| lr: 0.0| temp: 1.95557 | loss: 1.13962| constrast_loss: 4.47512| div_loss: 0.83353| %_mask_idx: 0.3396| ppl: 106.5388| %_neg_is_pos: 0.00626| lr: 0.0| temp: 1.95556 | loss: 1.13828| constrast_loss: 4.47039| div_loss: 0.82732| %_mask_idx: 0.38534| ppl: 110.51576| %_neg_is_pos: 0.0062| lr: 0.0| temp: 1.95556 | loss: 1.13788| constrast_loss: 4.46805| div_loss: 0.8346| %_mask_idx: 0.33286| ppl: 105.85358| %_neg_is_pos: 0.00645| lr: 0.0| temp: 1.95555 | loss: 1.13675| constrast_loss: 4.46352| div_loss: 0.83496| %_mask_idx: 0.38346| ppl: 105.62437| %_neg_is_pos: 0.00713| lr: 0.0| temp: 1.95555 | loss: 1.1438| constrast_loss: 4.49162| div_loss: 0.83572| %_mask_idx: 0.42622| ppl: 105.14124| %_neg_is_pos: 0.00727| lr: 0.0| temp: 1.95554 | loss: 1.13164| constrast_loss: 4.4429| div_loss: 0.8365| %_mask_idx: 0.35385| ppl: 104.64263| %_neg_is_pos: 0.00724| lr: 0.0| temp: 1.95554 | loss: 1.13074| constrast_loss: 4.43974| div_loss: 0.83243| %_mask_idx: 0.36999| ppl: 107.2465| %_neg_is_pos: 0.00709| lr: 0.0| temp: 1.95552 | loss: 1.13788| constrast_loss: 4.46781| div_loss: 0.83723| %_mask_idx: 0.36122| ppl: 104.1724| %_neg_is_pos: 0.00723| lr: 0.0| temp: 1.95552 | loss: 1.14018| constrast_loss: 4.47721| div_loss: 0.83494| %_mask_idx: 0.4057| ppl: 105.63661| %_neg_is_pos: 0.00667| lr: 0.0| temp: 1.95551 | loss: 1.14264| constrast_loss: 4.4872| div_loss: 0.83371| %_mask_idx: 0.36294| ppl: 106.42747| %_neg_is_pos: 0.00596| lr: 0.0| temp: 1.95551 | loss: 1.14388| constrast_loss: 4.49213| div_loss: 0.83378| %_mask_idx: 0.47353| ppl: 106.37868| %_neg_is_pos: 0.00797| lr: 0.0| temp: 1.95549 | loss: 1.14237| constrast_loss: 4.48591| div_loss: 0.83593| %_mask_idx: 0.37406| ppl: 105.00471| %_neg_is_pos: 0.00801| lr: 0.0| temp: 1.95549 | loss: 1.13043| constrast_loss: 4.43779| div_loss: 0.83951| %_mask_idx: 0.4032| ppl: 102.71263| %_neg_is_pos: 0.00755| lr: 0.0| temp: 1.95548 | loss: 1.13336| constrast_loss: 4.45001| div_loss: 0.83416| %_mask_idx: 0.38189| ppl: 106.13615| %_neg_is_pos: 0.00778| lr: 0.0| temp: 1.95548 | loss: 1.14114| constrast_loss: 4.48162| div_loss: 0.82952| %_mask_idx: 0.37093| ppl: 109.10876| %_neg_is_pos: 0.00626| lr: 0.0| temp: 1.95547 | loss: 1.14193| constrast_loss: 4.48351| div_loss: 0.84193| %_mask_idx: 0.41024| ppl: 101.16447| %_neg_is_pos: 0.00771| lr: 0.0| temp: 1.95547 | loss: 1.14791| constrast_loss: 4.50778| div_loss: 0.83841| %_mask_idx: 0.3703| ppl: 103.41876| %_neg_is_pos: 0.00618| lr: 0.0| temp: 1.95546 | loss: 1.13641| constrast_loss: 4.46231| div_loss: 0.83341| %_mask_idx: 0.35824| ppl: 106.62012| %_neg_is_pos: 0.00624| lr: 0.0| temp: 1.95546 | loss: 1.12718| constrast_loss: 4.425| div_loss: 0.83705| %_mask_idx: 0.34117| ppl: 104.29057| %_neg_is_pos: 0.0083| lr: 0.0| temp: 1.95544 | loss: 1.13925| constrast_loss: 4.47292| div_loss: 0.84089| %_mask_idx: 0.42434| ppl: 101.82994| %_neg_is_pos: 0.00802| lr: 0.0| temp: 1.95544 | loss: 1.13951| constrast_loss: 4.47466| div_loss: 0.83377| %_mask_idx: 0.38471| ppl: 106.38412| %_neg_is_pos: 0.00804| lr: 0.0| temp: 1.95543 | loss: 1.13285| constrast_loss: 4.44803| div_loss: 0.83374| %_mask_idx: 0.39787| ppl: 106.40699| %_neg_is_pos: 0.00869| lr: 0.0| temp: 1.95543 | loss: 1.13925| constrast_loss: 4.47384| div_loss: 0.83156| %_mask_idx: 0.3916| ppl: 107.79903| %_neg_is_pos: 0.00693| lr: 0.0| temp: 1.95542 | loss: 1.14652| constrast_loss: 4.5027| div_loss: 0.8338| %_mask_idx: 0.4364| ppl: 106.36982| %_neg_is_pos: 0.00722| lr: 0.0| temp: 1.95542 | loss: 1.13602| constrast_loss: 4.46167| div_loss: 0.8242| %_mask_idx: 0.39521| ppl: 112.51012| %_neg_is_pos: 0.00627| lr: 0.0| temp: 1.95541 | loss: 1.14684| constrast_loss: 4.50358| div_loss: 0.83769| %_mask_idx: 0.43828| ppl: 103.8772| %_neg_is_pos: 0.00749| lr: 0.0| temp: 1.95541 | loss: 1.13852| constrast_loss: 4.47023| div_loss: 0.83845| %_mask_idx: 0.32816| ppl: 103.39197| %_neg_is_pos: 0.00637| lr: 0.0| temp: 1.95539 | loss: 1.14031| constrast_loss: 4.47749| div_loss: 0.83762| %_mask_idx: 0.38596| ppl: 103.92169| %_neg_is_pos: 0.00733| lr: 0.0| temp: 1.95539 | loss: 1.13214| constrast_loss: 4.44439| div_loss: 0.84176| %_mask_idx: 0.34602| ppl: 101.27584| %_neg_is_pos: 0.00749| lr: 0.0| temp: 1.95539 | loss: 1.13163| constrast_loss: 4.44288| div_loss: 0.83637| %_mask_idx: 0.39771| ppl: 104.72578| %_neg_is_pos: 0.0075| lr: 0.0| temp: 1.95539 | loss: 1.13622| constrast_loss: 4.46099| div_loss: 0.8388| %_mask_idx: 0.41353| ppl: 103.16603| %_neg_is_pos: 0.00791| lr: 0.0| temp: 1.95538 | loss: 1.1382| constrast_loss: 4.4691| div_loss: 0.8371| %_mask_idx: 0.36623| ppl: 104.25653| %_neg_is_pos: 0.00806| lr: 0.0| temp: 1.95538 | loss: 1.12548| constrast_loss: 4.41865| div_loss: 0.8327| %_mask_idx: 0.40226| ppl: 107.07323| %_neg_is_pos: 0.00759| lr: 0.0| temp: 1.95537 | loss: 1.1342| constrast_loss: 4.45344| div_loss: 0.83368| %_mask_idx: 0.40977| ppl: 106.4445| %_neg_is_pos: 0.00713| lr: 0.0| temp: 1.95537 | loss: 1.13544| constrast_loss: 4.45782| div_loss: 0.83928| %_mask_idx: 0.38346| ppl: 102.85886| %_neg_is_pos: 0.00871| lr: 0.0| temp: 1.95535 | loss: 1.13456| constrast_loss: 4.45403| div_loss: 0.84219| %_mask_idx: 0.39301| ppl: 100.99671| %_neg_is_pos: 0.00792| lr: 0.0| temp: 1.95535 | loss: 1.12799| constrast_loss: 4.42769| div_loss: 0.84259| %_mask_idx: 0.40586| ppl: 100.74007| %_neg_is_pos: 0.00802| lr: 0.0| temp: 1.95534 | loss: 1.13155| constrast_loss: 4.44208| div_loss: 0.84109| %_mask_idx: 0.36795| ppl: 101.7037| %_neg_is_pos: 0.00769| lr: 0.0| temp: 1.95534 [2021-09-02 06:57:18,651] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 06:57:18,651] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.13609| constrast_loss: 4.4604| div_loss: 0.83966| %_mask_idx: 0.41792| ppl: 102.61668| %_neg_is_pos: 0.00664| lr: 0.0| temp: 1.95532| loss: 1.12474| constrast_loss: 4.4151| div_loss: 0.83846| %_mask_idx: 0.38831| ppl: 103.38822| %_neg_is_pos: 0.00917| lr: 0.0| temp: 1.95532 | loss: 1.1298| constrast_loss: 4.43509| div_loss: 0.84122| %_mask_idx: 0.39474| ppl: 101.62045| %_neg_is_pos: 0.00833| lr: 0.0| temp: 1.95531 | loss: 1.13929| constrast_loss: 4.47357| div_loss: 0.83587| %_mask_idx: 0.45081| ppl: 105.04315| %_neg_is_pos: 0.00778| lr: 0.0| temp: 1.95531 | loss: 1.13509| constrast_loss: 4.45608| div_loss: 0.84294| %_mask_idx: 0.39536| ppl: 100.51922| %_neg_is_pos: 0.00747| lr: 0.0| temp: 1.9553 | loss: 1.13241| constrast_loss: 4.44455| div_loss: 0.85087| %_mask_idx: 0.35871| ppl: 95.44292| %_neg_is_pos: 0.00957| lr: 0.0| temp: 1.9553 | loss: 1.13805| constrast_loss: 4.4677| div_loss: 0.84506| %_mask_idx: 0.46194| ppl: 99.16118| %_neg_is_pos: 0.00886| lr: 0.0| temp: 1.95529 | loss: 1.14229| constrast_loss: 4.4836| div_loss: 0.85578| %_mask_idx: 0.44063| ppl: 92.29939| %_neg_is_pos: 0.01127| lr: 0.0| temp: 1.95529 | loss: 1.1337| constrast_loss: 4.44926| div_loss: 0.85517| %_mask_idx: 0.39442| ppl: 92.69128| %_neg_is_pos: 0.0117| lr: 0.0| temp: 1.95527| loss: 1.13688| constrast_loss: 4.46194| div_loss: 0.85602| %_mask_idx: 0.39912| ppl: 92.1448| %_neg_is_pos: 0.01096| lr: 0.0| temp: 1.95527 | loss: 1.13531| constrast_loss: 4.45588| div_loss: 0.85352| %_mask_idx: 0.40508| ppl: 93.74594| %_neg_is_pos: 0.01207| lr: 0.0| temp: 1.95526 | loss: 1.13612| constrast_loss: 4.45861| div_loss: 0.85859| %_mask_idx: 0.39897| ppl: 90.50243| %_neg_is_pos: 0.012| lr: 0.0| temp: 1.95526 | loss: 1.13838| constrast_loss: 4.46775| div_loss: 0.85777| %_mask_idx: 0.40539| ppl: 91.02908| %_neg_is_pos: 0.01432| lr: 0.0| temp: 1.95525 | loss: 1.13022| constrast_loss: 4.43363| div_loss: 0.87263| %_mask_idx: 0.36873| ppl: 81.51601| %_neg_is_pos: 0.01391| lr: 0.0| temp: 1.95525 | loss: 1.13119| constrast_loss: 4.4383| div_loss: 0.86454| %_mask_idx: 0.42121| ppl: 86.69177| %_neg_is_pos: 0.01385| lr: 0.0| temp: 1.95524 | loss: 1.14309| constrast_loss: 4.48561| div_loss: 0.86751| %_mask_idx: 0.43734| ppl: 84.79474| %_neg_is_pos: 0.01405| lr: 0.0| temp: 1.95524 | loss: 1.12877| constrast_loss: 4.42884| div_loss: 0.86246| %_mask_idx: 0.35902| ppl: 88.02701| %_neg_is_pos: 0.01395| lr: 0.0| temp: 1.95522 | loss: 1.1366| constrast_loss: 4.45908| div_loss: 0.87304| %_mask_idx: 0.36967| ppl: 81.25383| %_neg_is_pos: 0.01408| lr: 0.0| temp: 1.95522 | loss: 1.13763| constrast_loss: 4.46305| div_loss: 0.87487| %_mask_idx: 0.4317| ppl: 80.08476| %_neg_is_pos: 0.01656| lr: 0.0| temp: 1.95521 | loss: 1.1316| constrast_loss: 4.4399| div_loss: 0.86517| %_mask_idx: 0.37171| ppl: 86.29082| %_neg_is_pos: 0.01367| lr: 0.0| temp: 1.95521 | loss: 1.13505| constrast_loss: 4.45359| div_loss: 0.86609| %_mask_idx: 0.3891| ppl: 85.70309| %_neg_is_pos: 0.01243| lr: 0.0| temp: 1.9552 | loss: 1.12744| constrast_loss: 4.42333| div_loss: 0.86411| %_mask_idx: 0.40633| ppl: 86.96913| %_neg_is_pos: 0.01334| lr: 0.0| temp: 1.9552 | loss: 1.1365| constrast_loss: 4.45901| div_loss: 0.86999| %_mask_idx: 0.42152| ppl: 83.20502| %_neg_is_pos: 0.01401| lr: 0.0| temp: 1.95519 | loss: 1.12415| constrast_loss: 4.40961| div_loss: 0.86997| %_mask_idx: 0.30905| ppl: 83.22011| %_neg_is_pos: 0.0166| lr: 0.0| temp: 1.95519 | loss: 1.12582| constrast_loss: 4.41698| div_loss: 0.8632| %_mask_idx: 0.40116| ppl: 87.5523| %_neg_is_pos: 0.01428| lr: 0.0| temp: 1.95517 | loss: 1.12209| constrast_loss: 4.40081| div_loss: 0.87539| %_mask_idx: 0.33239| ppl: 79.7473| %_neg_is_pos: 0.01556| lr: 0.0| temp: 1.95517 | loss: 1.13233| constrast_loss: 4.44199| div_loss: 0.87322| %_mask_idx: 0.37061| ppl: 81.13734| %_neg_is_pos: 0.01558| lr: 0.0| temp: 1.95516 | loss: 1.12561| constrast_loss: 4.41575| div_loss: 0.86689| %_mask_idx: 0.39051| ppl: 85.18929| %_neg_is_pos: 0.01493| lr: 0.0| temp: 1.95516 | loss: 1.13318| constrast_loss: 4.44626| div_loss: 0.8645| %_mask_idx: 0.42137| ppl: 86.71811| %_neg_is_pos: 0.01588| lr: 0.0| temp: 1.95514 | loss: 1.13505| constrast_loss: 4.45306| div_loss: 0.87137| %_mask_idx: 0.41385| ppl: 82.32443| %_neg_is_pos: 0.01807| lr: 0.0| temp: 1.95514 | loss: 1.13337| constrast_loss: 4.44659| div_loss: 0.86905| %_mask_idx: 0.40476| ppl: 83.80614| %_neg_is_pos: 0.01606| lr: 0.0| temp: 1.95513 | loss: 1.13043| constrast_loss: 4.43541| div_loss: 0.8629| %_mask_idx: 0.41071| ppl: 87.74142| %_neg_is_pos: 0.01555| lr: 0.0| temp: 1.95513 | loss: 1.13607| constrast_loss: 4.45765| div_loss: 0.86641| %_mask_idx: 0.39709| ppl: 85.49689| %_neg_is_pos: 0.01374| lr: 0.0| temp: 1.95512 | loss: 1.13331| constrast_loss: 4.44624| div_loss: 0.86989| %_mask_idx: 0.41338| ppl: 83.26827| %_neg_is_pos: 0.01691| lr: 0.0| temp: 1.95512 | loss: 1.13576| constrast_loss: 4.45647| div_loss: 0.8658| %_mask_idx: 0.39724| ppl: 85.88963| %_neg_is_pos: 0.01421| lr: 0.0| temp: 1.95511 | loss: 1.12315| constrast_loss: 4.40564| div_loss: 0.86943| %_mask_idx: 0.38957| ppl: 83.56673| %_neg_is_pos: 0.01579| lr: 0.0| temp: 1.95511 | loss: 1.12301| constrast_loss: 4.40561| div_loss: 0.86438| %_mask_idx: 0.38503| ppl: 86.79662| %_neg_is_pos: 0.01523| lr: 0.0| temp: 1.95509 | loss: 1.14275| constrast_loss: 4.48502| div_loss: 0.85999| %_mask_idx: 0.40508| ppl: 89.60758| %_neg_is_pos: 0.0137| lr: 0.0| temp: 1.95509 | loss: 1.13213| constrast_loss: 4.44132| div_loss: 0.87189| %_mask_idx: 0.44753| ppl: 81.98981| %_neg_is_pos: 0.01756| lr: 0.0| temp: 1.95508 | loss: 1.1347| constrast_loss: 4.45148| div_loss: 0.87306| %_mask_idx: 0.39615| ppl: 81.24314| %_neg_is_pos: 0.01641| lr: 0.0| temp: 1.95508 | loss: 1.13078| constrast_loss: 4.43667| div_loss: 0.86451| %_mask_idx: 0.34947| ppl: 86.71357| %_neg_is_pos: 0.01338| lr: 0.0| temp: 1.95507 | loss: 1.12624| constrast_loss: 4.41922| div_loss: 0.85727| %_mask_idx: 0.4068| ppl: 91.34698| %_neg_is_pos: 0.0122| lr: 0.0| temp: 1.95507 | loss: 1.12272| constrast_loss: 4.40384| div_loss: 0.87044| %_mask_idx: 0.38158| ppl: 82.91776| %_neg_is_pos: 0.01484| lr: 0.0| temp: 1.95506 | loss: 1.1425| constrast_loss: 4.48314| div_loss: 0.86876| %_mask_idx: 0.41291| ppl: 83.99331| %_neg_is_pos: 0.01402| lr: 0.0| temp: 1.95506 | loss: 1.11471| constrast_loss: 4.37331| div_loss: 0.85535| %_mask_idx: 0.3786| ppl: 92.57771| %_neg_is_pos: 0.01261| lr: 0.0| temp: 1.95504 | loss: 1.12597| constrast_loss: 4.41721| div_loss: 0.86663| %_mask_idx: 0.42747| ppl: 85.35646| %_neg_is_pos: 0.01637| lr: 0.0| temp: 1.95504 | loss: 1.13549| constrast_loss: 4.45419| div_loss: 0.8775| %_mask_idx: 0.38628| ppl: 78.39717| %_neg_is_pos: 0.01841| lr: 0.0| temp: 1.95503 | loss: 1.14012| constrast_loss: 4.47412| div_loss: 0.86362| %_mask_idx: 0.40132| ppl: 87.28526| %_neg_is_pos: 0.0164| lr: 0.0| temp: 1.95503 | loss: 1.12293| constrast_loss: 4.40482| div_loss: 0.86901| %_mask_idx: 0.40226| ppl: 83.83495| %_neg_is_pos: 0.01573| lr: 0.0| temp: 1.95502 | loss: 1.13336| constrast_loss: 4.44652| div_loss: 0.8694| %_mask_idx: 0.37046| ppl: 83.58126| %_neg_is_pos: 0.01565| lr: 0.0| temp: 1.95502 | loss: 1.13571| constrast_loss: 4.45569| div_loss: 0.8716| %_mask_idx: 0.39646| ppl: 82.17694| %_neg_is_pos: 0.01404| lr: 0.0| temp: 1.95501 | loss: 1.13699| constrast_loss: 4.46212| div_loss: 0.85829| %_mask_idx: 0.4364| ppl: 90.69176| %_neg_is_pos: 0.01458| lr: 0.0| temp: 1.95501 | loss: 1.12801| constrast_loss: 4.42437| div_loss: 0.87685| %_mask_idx: 0.38377| ppl: 78.81902| %_neg_is_pos: 0.01631| lr: 0.0| temp: 1.95499 | loss: 1.13944| constrast_loss: 4.47045| div_loss: 0.87307| %_mask_idx: 0.41228| ppl: 81.23507| %_neg_is_pos: 0.01537| lr: 0.0| temp: 1.95499 | loss: 1.13525| constrast_loss: 4.45465| div_loss: 0.86352| %_mask_idx: 0.37516| ppl: 87.34687| %_neg_is_pos: 0.01199| lr: 0.0| temp: 1.95498 | loss: 1.12931| constrast_loss: 4.43079| div_loss: 0.86451| %_mask_idx: 0.41776| ppl: 86.71089| %_neg_is_pos: 0.01409| lr: 0.0| temp: 1.95498 | loss: 1.1379| constrast_loss: 4.46449| div_loss: 0.87112| %_mask_idx: 0.37719| ppl: 82.48164| %_neg_is_pos: 0.01515| lr: 0.0| temp: 1.95496 | loss: 1.13239| constrast_loss: 4.4435| div_loss: 0.86064| %_mask_idx: 0.41338| ppl: 89.19012| %_neg_is_pos: 0.01343| lr: 0.0| temp: 1.95496 | loss: 1.13119| constrast_loss: 4.43737| div_loss: 0.87383| %_mask_idx: 0.38252| ppl: 80.7468| %_neg_is_pos: 0.01568| lr: 0.0| temp: 1.95495 | loss: 1.13323| constrast_loss: 4.44599| div_loss: 0.86942| %_mask_idx: 0.3562| ppl: 83.57248| %_neg_is_pos: 0.01345| lr: 0.0| temp: 1.95495 | loss: 1.13475| constrast_loss: 4.45179| div_loss: 0.87224| %_mask_idx: 0.37343| ppl: 81.76911| %_neg_is_pos: 0.01423| lr: 0.0| temp: 1.95495 | loss: 1.13569| constrast_loss: 4.45616| div_loss: 0.86587| %_mask_idx: 0.36936| ppl: 85.84309| %_neg_is_pos: 0.01694| lr: 0.0| temp: 1.95495 | loss: 1.14005| constrast_loss: 4.47272| div_loss: 0.8747| %_mask_idx: 0.47713| ppl: 80.19012| %_neg_is_pos: 0.01856| lr: 0.0| temp: 1.95494 | loss: 1.12878| constrast_loss: 4.42913| div_loss: 0.85986| %_mask_idx: 0.36122| ppl: 89.69107| %_neg_is_pos: 0.01228| lr: 0.0| temp: 1.95494 | loss: 1.12141| constrast_loss: 4.39899| div_loss: 0.86653| %_mask_idx: 0.38033| ppl: 85.42114| %_neg_is_pos: 0.01606| lr: 0.0| temp: 1.95492 | loss: 1.13746| constrast_loss: 4.46202| div_loss: 0.87823| %_mask_idx: 0.38017| ppl: 77.93374| %_neg_is_pos: 0.01762| lr: 0.0| temp: 1.95492 | loss: 1.13715| constrast_loss: 4.46185| div_loss: 0.86735| %_mask_idx: 0.34571| ppl: 84.896| %_neg_is_pos: 0.01501| lr: 0.0| temp: 1.95491 | loss: 1.13767| constrast_loss: 4.46379| div_loss: 0.86877| %_mask_idx: 0.37845| ppl: 83.98921| %_neg_is_pos: 0.01445| lr: 0.0| temp: 1.95491 | loss: 1.13636| constrast_loss: 4.45842| div_loss: 0.87031| %_mask_idx: 0.38017| ppl: 83.00117| %_neg_is_pos: 0.01573| lr: 0.0| temp: 1.9549 | loss: 1.13548| constrast_loss: 4.45538| div_loss: 0.86537| %_mask_idx: 0.38534| ppl: 86.16248| %_neg_is_pos: 0.01407| lr: 0.0| temp: 1.9549 | loss: 1.12202| constrast_loss: 4.40269| div_loss: 0.85378| %_mask_idx: 0.40868| ppl: 93.58067| %_neg_is_pos: 0.01222| lr: 0.0| temp: 1.95489 | loss: 1.13303| constrast_loss: 4.44511| div_loss: 0.87022| %_mask_idx: 0.37578| ppl: 83.05613| %_neg_is_pos: 0.01684| lr: 0.0| temp: 1.95489 | loss: 1.13961| constrast_loss: 4.47282| div_loss: 0.85612| %_mask_idx: 0.41228| ppl: 92.08238| %_neg_is_pos: 0.01101| lr: 0.0| temp: 1.95487 | loss: 1.13858| constrast_loss: 4.46724| div_loss: 0.87094| %_mask_idx: 0.39991| ppl: 82.59787| %_neg_is_pos: 0.01597| lr: 0.0| temp: 1.95487 | loss: 1.13666| constrast_loss: 4.45871| div_loss: 0.87919| %_mask_idx: 0.42607| ppl: 77.3192| %_neg_is_pos: 0.01934| lr: 0.0| temp: 1.95486 | loss: 1.12859| constrast_loss: 4.42776| div_loss: 0.86618| %_mask_idx: 0.38628| ppl: 85.6468| %_neg_is_pos: 0.01115| lr: 0.0| temp: 1.95486 | loss: 1.13461| constrast_loss: 4.45228| div_loss: 0.86162| %_mask_idx: 0.34915| ppl: 88.56607| %_neg_is_pos: 0.01532| lr: 0.0| temp: 1.95485 | loss: 1.11733| constrast_loss: 4.38331| div_loss: 0.86017| %_mask_idx: 0.39113| ppl: 89.49162| %_neg_is_pos: 0.01354| lr: 0.0| temp: 1.95485 | loss: 1.12708| constrast_loss: 4.4223| div_loss: 0.86015| %_mask_idx: 0.36858| ppl: 89.50106| %_neg_is_pos: 0.01475| lr: 0.0| temp: 1.95484 | loss: 1.11971| constrast_loss: 4.39196| div_loss: 0.8687| %_mask_idx: 0.38205| ppl: 84.0313| %_neg_is_pos: 0.01541| lr: 0.0| temp: 1.95484 | loss: 1.13227| constrast_loss: 4.44198| div_loss: 0.87112| %_mask_idx: 0.41259| ppl: 82.48093| %_neg_is_pos: 0.01612| lr: 0.0| temp: 1.95482 | loss: 1.13819| constrast_loss: 4.4667| div_loss: 0.86068| %_mask_idx: 0.37437| ppl: 89.16692| %_neg_is_pos: 0.01246| lr: 0.0| temp: 1.95482 | loss: 1.1237| constrast_loss: 4.40776| div_loss: 0.87031| %_mask_idx: 0.3927| ppl: 83.00027| %_neg_is_pos: 0.01631| lr: 0.0| temp: 1.95481 | loss: 1.13608| constrast_loss: 4.45777| div_loss: 0.86547| %_mask_idx: 0.38283| ppl: 86.09686| %_neg_is_pos: 0.01538| lr: 0.0| temp: 1.95481 | loss: 1.13586| constrast_loss: 4.45669| div_loss: 0.86767| %_mask_idx: 0.34555| ppl: 84.69249| %_neg_is_pos: 0.01103| lr: 0.0| temp: 1.95479 | loss: 1.12388| constrast_loss: 4.40851| div_loss: 0.87021| %_mask_idx: 0.38534| ppl: 83.06255| %_neg_is_pos: 0.01576| lr: 0.0| temp: 1.95479 | loss: 1.134| constrast_loss: 4.4489| div_loss: 0.87098| %_mask_idx: 0.44784| ppl: 82.57559| %_neg_is_pos: 0.01579| lr: 0.0| temp: 1.95478 | loss: 1.12279| constrast_loss: 4.40502| div_loss: 0.86156| %_mask_idx: 0.38534| ppl: 88.59969| %_neg_is_pos: 0.01305| lr: 0.0| temp: 1.95478 | loss: 1.13548| constrast_loss: 4.45533| div_loss: 0.86591| %_mask_idx: 0.39317| ppl: 85.81987| %_neg_is_pos: 0.01516| lr: 0.0| temp: 1.95477 | loss: 1.12412| constrast_loss: 4.40997| div_loss: 0.86508| %_mask_idx: 0.37406| ppl: 86.34743| %_neg_is_pos: 0.01312| lr: 0.0| temp: 1.95477 | loss: 1.13256| constrast_loss: 4.44316| div_loss: 0.87079| %_mask_idx: 0.39035| ppl: 82.69734| %_neg_is_pos: 0.01666| lr: 0.0| temp: 1.95476 | loss: 1.12534| constrast_loss: 4.41491| div_loss: 0.86456| %_mask_idx: 0.35683| ppl: 86.68225| %_neg_is_pos: 0.01421| lr: 0.0| temp: 1.95476 | loss: 1.13679| constrast_loss: 4.46083| div_loss: 0.86325| %_mask_idx: 0.42513| ppl: 87.52283| %_neg_is_pos: 0.01421| lr: 0.0| temp: 1.95474 | loss: 1.13579| constrast_loss: 4.45591| div_loss: 0.87257| %_mask_idx: 0.41181| ppl: 81.55242| %_neg_is_pos: 0.01686| lr: 0.0| temp: 1.95474 | loss: 1.14212| constrast_loss: 4.48118| div_loss: 0.87311| %_mask_idx: 0.37672| ppl: 81.20915| %_neg_is_pos: 0.01558| lr: 0.0| temp: 1.95473 | loss: 1.13021| constrast_loss: 4.43516| div_loss: 0.85673| %_mask_idx: 0.3584| ppl: 91.6952| %_neg_is_pos: 0.01342| lr: 0.0| temp: 1.95473 | loss: 1.12801| constrast_loss: 4.42584| div_loss: 0.86202| %_mask_idx: 0.36231| ppl: 88.30572| %_neg_is_pos: 0.01312| lr: 0.0| temp: 1.95472 | loss: 1.13205| constrast_loss: 4.44174| div_loss: 0.86472| %_mask_idx: 0.37813| ppl: 86.58085| %_neg_is_pos: 0.01425| lr: 0.0| temp: 1.95472 | loss: 1.13353| constrast_loss: 4.44717| div_loss: 0.86941| %_mask_idx: 0.41808| ppl: 83.57876| %_neg_is_pos: 0.01631| lr: 0.0| temp: 1.95471 | loss: 1.12898| constrast_loss: 4.42923| div_loss: 0.86709| %_mask_idx: 0.35338| ppl: 85.06543| %_neg_is_pos: 0.0112| lr: 0.0| temp: 1.95471 | loss: 1.12223| constrast_loss: 4.40216| div_loss: 0.86741| %_mask_idx: 0.44048| ppl: 84.85986| %_neg_is_pos: 0.01493| lr: 0.0| temp: 1.95469 | loss: 1.13209| constrast_loss: 4.44081| div_loss: 0.87531| %_mask_idx: 0.35761| ppl: 79.80441| %_neg_is_pos: 0.01687| lr: 0.0| temp: 1.95469 | loss: 1.13091| constrast_loss: 4.43761| div_loss: 0.86014| %_mask_idx: 0.34884| ppl: 89.50938| %_neg_is_pos: 0.01296| lr: 0.0| temp: 1.95468 | loss: 1.11771| constrast_loss: 4.38324| div_loss: 0.8758| %_mask_idx: 0.40351| ppl: 79.48772| %_neg_is_pos: 0.01683| lr: 0.0| temp: 1.95468 | loss: 1.13288| constrast_loss: 4.44426| div_loss: 0.87277| %_mask_idx: 0.41291| ppl: 81.42853| %_neg_is_pos: 0.01659| lr: 0.0| temp: 1.95467 | loss: 1.12122| constrast_loss: 4.39816| div_loss: 0.86733| %_mask_idx: 0.34414| ppl: 84.90581| %_neg_is_pos: 0.01409| lr: 0.0| temp: 1.95467 | loss: 1.13773| constrast_loss: 4.46446| div_loss: 0.86467| %_mask_idx: 0.36263| ppl: 86.61095| %_neg_is_pos: 0.01694| lr: 0.0| temp: 1.95466 | loss: 1.13547| constrast_loss: 4.45472| div_loss: 0.87176| %_mask_idx: 0.39113| ppl: 82.07075| %_neg_is_pos: 0.01648| lr: 0.0| temp: 1.95466 | loss: 1.13287| constrast_loss: 4.44457| div_loss: 0.86929| %_mask_idx: 0.40006| ppl: 83.65459| %_neg_is_pos: 0.01349| lr: 0.0| temp: 1.95464 | loss: 1.12667| constrast_loss: 4.41939| div_loss: 0.87302| %_mask_idx: 0.39881| ppl: 81.26884| %_neg_is_pos: 0.01665| lr: 0.0| temp: 1.95464 | loss: 1.12909| constrast_loss: 4.42986| div_loss: 0.86516| %_mask_idx: 0.33459| ppl: 86.29955| %_neg_is_pos: 0.0144| lr: 0.0| temp: 1.95463 | loss: 1.13822| constrast_loss: 4.46637| div_loss: 0.86528| %_mask_idx: 0.4187| ppl: 86.22188| %_neg_is_pos: 0.01412| lr: 0.0| temp: 1.95463 [2021-09-02 07:06:31,886] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 07:06:31,886] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.13604| constrast_loss: 4.4561| div_loss: 0.88074| %_mask_idx: 0.43625| ppl: 76.32558| %_neg_is_pos: 0.02015| lr: 0.0| temp: 1.95461 | loss: 1.12923| constrast_loss: 4.43038| div_loss: 0.86543| %_mask_idx: 0.3869| ppl: 86.1251| %_neg_is_pos: 0.01302| lr: 0.0| temp: 1.95461 | loss: 1.13255| constrast_loss: 4.4435| div_loss: 0.86697| %_mask_idx: 0.35464| ppl: 85.13933| %_neg_is_pos: 0.01383| lr: 0.0| temp: 1.9546 | loss: 1.12501| constrast_loss: 4.41255| div_loss: 0.87492| %_mask_idx: 0.37766| ppl: 80.04975| %_neg_is_pos: 0.01441| lr: 0.0| temp: 1.9546 | loss: 1.14104| constrast_loss: 4.47762| div_loss: 0.86557| %_mask_idx: 0.43108| ppl: 86.0369| %_neg_is_pos: 0.01685| lr: 0.0| temp: 1.95459 | loss: 1.14363| constrast_loss: 4.48725| div_loss: 0.87267| %_mask_idx: 0.4209| ppl: 81.49146| %_neg_is_pos: 0.01858| lr: 0.0| temp: 1.95459 | loss: 1.13951| constrast_loss: 4.47049| div_loss: 0.87541| %_mask_idx: 0.43358| ppl: 79.73707| %_neg_is_pos: 0.0186| lr: 0.0| temp: 1.95458 | loss: 1.13464| constrast_loss: 4.45122| div_loss: 0.87325| %_mask_idx: 0.39724| ppl: 81.11924| %_neg_is_pos: 0.01825| lr: 0.0| temp: 1.95458 | loss: 1.13182| constrast_loss: 4.43942| div_loss: 0.87852| %_mask_idx: 0.40852| ppl: 77.74774| %_neg_is_pos: 0.02051| lr: 0.0| temp: 1.95456 | loss: 1.12557| constrast_loss: 4.41395| div_loss: 0.8833| %_mask_idx: 0.39254| ppl: 74.69015| %_neg_is_pos: 0.02385| lr: 0.0| temp: 1.95456 | loss: 1.11991| constrast_loss: 4.39055| div_loss: 0.89098| %_mask_idx: 0.3432| ppl: 69.77223| %_neg_is_pos: 0.02308| lr: 0.0| temp: 1.95455 | loss: 1.12692| constrast_loss: 4.41798| div_loss: 0.89704| %_mask_idx: 0.414| ppl: 65.89613| %_neg_is_pos: 0.03069| lr: 0.0| temp: 1.95455 | loss: 1.11891| constrast_loss: 4.38649| div_loss: 0.8914| %_mask_idx: 0.37187| ppl: 69.50207| %_neg_is_pos: 0.02591| lr: 0.0| temp: 1.95454 | loss: 1.11977| constrast_loss: 4.38977| div_loss: 0.89297| %_mask_idx: 0.40555| ppl: 68.49651| %_neg_is_pos: 0.03119| lr: 0.0| temp: 1.95454 | loss: 1.13364| constrast_loss: 4.44487| div_loss: 0.89689| %_mask_idx: 0.40915| ppl: 65.98727| %_neg_is_pos: 0.03179| lr: 0.0| temp: 1.95453 | loss: 1.12617| constrast_loss: 4.41499| div_loss: 0.89702| %_mask_idx: 0.38769| ppl: 65.90855| %_neg_is_pos: 0.03104| lr: 0.0| temp: 1.95453 | loss: 1.11876| constrast_loss: 4.38527| div_loss: 0.89754| %_mask_idx: 0.35589| ppl: 65.57663| %_neg_is_pos: 0.02824| lr: 0.0| temp: 1.95452 | loss: 1.13928| constrast_loss: 4.46712| div_loss: 0.90015| %_mask_idx: 0.37171| ppl: 63.90643| %_neg_is_pos: 0.02669| lr: 0.0| temp: 1.95452 | loss: 1.12554| constrast_loss: 4.41079| div_loss: 0.91376| %_mask_idx: 0.37719| ppl: 55.19604| %_neg_is_pos: 0.03819| lr: 0.0| temp: 1.95451 | loss: 1.11406| constrast_loss: 4.36565| div_loss: 0.90586| %_mask_idx: 0.36717| ppl: 60.24724| %_neg_is_pos: 0.04009| lr: 0.0| temp: 1.95451 | loss: 1.12781| constrast_loss: 4.42091| div_loss: 0.90324| %_mask_idx: 0.40492| ppl: 61.92548| %_neg_is_pos: 0.03798| lr: 0.0| temp: 1.9545 | loss: 1.1258| constrast_loss: 4.41378| div_loss: 0.8943| %_mask_idx: 0.41338| ppl: 67.64913| %_neg_is_pos: 0.03409| lr: 0.0| temp: 1.9545 | loss: 1.12045| constrast_loss: 4.38962| div_loss: 0.92173| %_mask_idx: 0.35699| ppl: 50.0919| %_neg_is_pos: 0.05376| lr: 0.0| temp: 1.95449 | loss: 1.13085| constrast_loss: 4.43242| div_loss: 0.90969| %_mask_idx: 0.39458| ppl: 57.79828| %_neg_is_pos: 0.04142| lr: 0.0| temp: 1.95449 | loss: 1.11236| constrast_loss: 4.35713| div_loss: 0.923| %_mask_idx: 0.33741| ppl: 49.278| %_neg_is_pos: 0.04308| lr: 0.0| temp: 1.95447| loss: 1.11729| constrast_loss: 4.37732| div_loss: 0.91838| %_mask_idx: 0.4234| ppl: 52.23389| %_neg_is_pos: 0.05165| lr: 0.0| temp: 1.95447 | loss: 1.09714| constrast_loss: 4.29642| div_loss: 0.9215| %_mask_idx: 0.39176| ppl: 50.24117| %_neg_is_pos: 0.06106| lr: 0.0| temp: 1.95446 | loss: 1.10612| constrast_loss: 4.33209| div_loss: 0.92415| %_mask_idx: 0.37516| ppl: 48.54705| %_neg_is_pos: 0.06296| lr: 0.0| temp: 1.95446 | loss: 1.09112| constrast_loss: 4.2706| div_loss: 0.93858| %_mask_idx: 0.37845| ppl: 39.30757| %_neg_is_pos: 0.08571| lr: 0.0| temp: 1.95444 | loss: 1.11106| constrast_loss: 4.35095| div_loss: 0.93281| %_mask_idx: 0.38064| ppl: 43.00145| %_neg_is_pos: 0.08025| lr: 0.0| temp: 1.95444 | loss: 1.09307| constrast_loss: 4.27858| div_loss: 0.9371| %_mask_idx: 0.40429| ppl: 40.25607| %_neg_is_pos: 0.07881| lr: 0.0| temp: 1.95443 | loss: 1.09202| constrast_loss: 4.2754| div_loss: 0.92684| %_mask_idx: 0.37422| ppl: 46.81995| %_neg_is_pos: 0.07049| lr: 0.0| temp: 1.95443 | loss: 1.09554| constrast_loss: 4.28759| div_loss: 0.94569| %_mask_idx: 0.37735| ppl: 34.75642| %_neg_is_pos: 0.09641| lr: 0.0| temp: 1.95442 | loss: 1.11168| constrast_loss: 4.35389| div_loss: 0.92844| %_mask_idx: 0.42262| ppl: 45.80064| %_neg_is_pos: 0.07512| lr: 0.0| temp: 1.95442 | loss: 1.1013| constrast_loss: 4.31149| div_loss: 0.93711| %_mask_idx: 0.42904| ppl: 40.24795| %_neg_is_pos: 0.09431| lr: 0.0| temp: 1.95441 | loss: 1.09169| constrast_loss: 4.27327| div_loss: 0.93472| %_mask_idx: 0.32127| ppl: 41.78204| %_neg_is_pos: 0.06933| lr: 0.0| temp: 1.95441 | loss: 1.09523| constrast_loss: 4.28744| div_loss: 0.93481| %_mask_idx: 0.38095| ppl: 41.7193| %_neg_is_pos: 0.07895| lr: 0.0| temp: 1.95439 | loss: 1.09643| constrast_loss: 4.29253| div_loss: 0.93175| %_mask_idx: 0.37907| ppl: 43.68254| %_neg_is_pos: 0.06798| lr: 0.0| temp: 1.95439 | loss: 1.10602| constrast_loss: 4.3312| div_loss: 0.92889| %_mask_idx: 0.34774| ppl: 45.51298| %_neg_is_pos: 0.05837| lr: 0.0| temp: 1.95438 | loss: 1.10429| constrast_loss: 4.32341| div_loss: 0.93755| %_mask_idx: 0.4256| ppl: 39.96695| %_neg_is_pos: 0.09351| lr: 0.0| temp: 1.95438 | loss: 1.09388| constrast_loss: 4.28131| div_loss: 0.94199| %_mask_idx: 0.39975| ppl: 37.12863| %_neg_is_pos: 0.07725| lr: 0.0| temp: 1.95437 | loss: 1.09333| constrast_loss: 4.27916| div_loss: 0.94172| %_mask_idx: 0.37719| ppl: 37.29701| %_neg_is_pos: 0.10194| lr: 0.0| temp: 1.95437 | loss: 1.10006| constrast_loss: 4.30645| div_loss: 0.93802| %_mask_idx: 0.38252| ppl: 39.6642| %_neg_is_pos: 0.07468| lr: 0.0| temp: 1.95436 | loss: 1.11371| constrast_loss: 4.36146| div_loss: 0.93377| %_mask_idx: 0.42481| ppl: 42.38752| %_neg_is_pos: 0.07942| lr: 0.0| temp: 1.95436 | loss: 1.10158| constrast_loss: 4.31317| div_loss: 0.93166| %_mask_idx: 0.35009| ppl: 43.73653| %_neg_is_pos: 0.06196| lr: 0.0| temp: 1.95434 | loss: 1.10093| constrast_loss: 4.31054| div_loss: 0.93164| %_mask_idx: 0.40429| ppl: 43.74894| %_neg_is_pos: 0.0837| lr: 0.0| temp: 1.95434 | loss: 1.10287| constrast_loss: 4.31786| div_loss: 0.93632| %_mask_idx: 0.38534| ppl: 40.75588| %_neg_is_pos: 0.08131| lr: 0.0| temp: 1.95433 | loss: 1.11507| constrast_loss: 4.36747| div_loss: 0.9281| %_mask_idx: 0.38831| ppl: 46.01319| %_neg_is_pos: 0.07456| lr: 0.0| temp: 1.95433 | loss: 1.09457| constrast_loss: 4.28469| div_loss: 0.9357| %_mask_idx: 0.44173| ppl: 41.15307| %_neg_is_pos: 0.08617| lr: 0.0| temp: 1.95432 | loss: 1.09966| constrast_loss: 4.30455| div_loss: 0.94101| %_mask_idx: 0.39897| ppl: 37.7519| %_neg_is_pos: 0.08801| lr: 0.0| temp: 1.95432 | loss: 1.10348| constrast_loss: 4.32041| div_loss: 0.93519| %_mask_idx: 0.39615| ppl: 41.47761| %_neg_is_pos: 0.07765| lr: 0.0| temp: 1.95431 | loss: 1.10267| constrast_loss: 4.31722| div_loss: 0.93474| %_mask_idx: 0.41212| ppl: 41.76783| %_neg_is_pos: 0.08224| lr: 0.0| temp: 1.95431 | loss: 1.11266| constrast_loss: 4.35728| div_loss: 0.93373| %_mask_idx: 0.42043| ppl: 42.41019| %_neg_is_pos: 0.08318| lr: 0.0| temp: 1.95429 | loss: 1.10446| constrast_loss: 4.32347| div_loss: 0.94373| %_mask_idx: 0.39646| ppl: 36.01234| %_neg_is_pos: 0.08901| lr: 0.0| temp: 1.95429 | loss: 1.10754| constrast_loss: 4.3374| div_loss: 0.92776| %_mask_idx: 0.38722| ppl: 46.23359| %_neg_is_pos: 0.07395| lr: 0.0| temp: 1.95428 | loss: 1.10047| constrast_loss: 4.30911| div_loss: 0.92781| %_mask_idx: 0.3963| ppl: 46.19976| %_neg_is_pos: 0.0724| lr: 0.0| temp: 1.95428 | loss: 1.10985| constrast_loss: 4.347| div_loss: 0.92414| %_mask_idx: 0.37202| ppl: 48.55279| %_neg_is_pos: 0.0636| lr: 0.0| temp: 1.95426 | loss: 1.0954| constrast_loss: 4.28822| div_loss: 0.93384| %_mask_idx: 0.40335| ppl: 42.34082| %_neg_is_pos: 0.07617| lr: 0.0| temp: 1.95426 | loss: 1.09215| constrast_loss: 4.27472| div_loss: 0.93876| %_mask_idx: 0.38174| ppl: 39.19548| %_neg_is_pos: 0.07785| lr: 0.0| temp: 1.95425 | loss: 1.08116| constrast_loss: 4.23125| div_loss: 0.93408| %_mask_idx: 0.32112| ppl: 42.19| %_neg_is_pos: 0.06223| lr: 0.0| temp: 1.95425 | loss: 1.10653| constrast_loss: 4.33259| div_loss: 0.93507| %_mask_idx: 0.39176| ppl: 41.55722| %_neg_is_pos: 0.08473| lr: 0.0| temp: 1.95424 | loss: 1.10489| constrast_loss: 4.3262| div_loss: 0.93356| %_mask_idx: 0.43045| ppl: 42.52363| %_neg_is_pos: 0.0856| lr: 0.0| temp: 1.95424 | loss: 1.09615| constrast_loss: 4.29105| div_loss: 0.9355| %_mask_idx: 0.41353| ppl: 41.27702| %_neg_is_pos: 0.0874| lr: 0.0| temp: 1.95423 | loss: 1.09976| constrast_loss: 4.30501| div_loss: 0.94047| %_mask_idx: 0.43578| ppl: 38.09895| %_neg_is_pos: 0.0887| lr: 0.0| temp: 1.95423 | loss: 1.09466| constrast_loss: 4.28476| div_loss: 0.93871| %_mask_idx: 0.36435| ppl: 39.22826| %_neg_is_pos: 0.08005| lr: 0.0| temp: 1.95421 | loss: 1.10168| constrast_loss: 4.31336| div_loss: 0.93361| %_mask_idx: 0.39959| ppl: 42.49091| %_neg_is_pos: 0.0776| lr: 0.0| temp: 1.95421 | loss: 1.10936| constrast_loss: 4.34376| div_loss: 0.9369| %_mask_idx: 0.39991| ppl: 40.38346| %_neg_is_pos: 0.07322| lr: 0.0| temp: 1.9542 | loss: 1.10306| constrast_loss: 4.31913| div_loss: 0.93102| %_mask_idx: 0.37845| ppl: 44.14499| %_neg_is_pos: 0.07346| lr: 0.0| temp: 1.9542 | loss: 1.10413| constrast_loss: 4.32373| div_loss: 0.92807| %_mask_idx: 0.40758| ppl: 46.03645| %_neg_is_pos: 0.07251| lr: 0.0| temp: 1.95419 | loss: 1.11224| constrast_loss: 4.35548| div_loss: 0.9347| %_mask_idx: 0.35996| ppl: 41.79028| %_neg_is_pos: 0.07483| lr: 0.0| temp: 1.95419 | loss: 1.11093| constrast_loss: 4.35112| div_loss: 0.92577| %_mask_idx: 0.42372| ppl: 47.50863| %_neg_is_pos: 0.06727| lr: 0.0| temp: 1.95418 | loss: 1.1043| constrast_loss: 4.3242| div_loss: 0.93011| %_mask_idx: 0.37766| ppl: 44.72771| %_neg_is_pos: 0.0712| lr: 0.0| temp: 1.95418 | loss: 1.10396| constrast_loss: 4.32221| div_loss: 0.93629| %_mask_idx: 0.42857| ppl: 40.7721| %_neg_is_pos: 0.08801| lr: 0.0| temp: 1.95416 | loss: 1.09381| constrast_loss: 4.28211| div_loss: 0.93136| %_mask_idx: 0.37625| ppl: 43.92757| %_neg_is_pos: 0.06219| lr: 0.0| temp: 1.95416 | loss: 1.10203| constrast_loss: 4.31451| div_loss: 0.93624| %_mask_idx: 0.41714| ppl: 40.80502| %_neg_is_pos: 0.08152| lr: 0.0| temp: 1.95415 | loss: 1.11196| constrast_loss: 4.35413| div_loss: 0.93699| %_mask_idx: 0.40648| ppl: 40.32883| %_neg_is_pos: 0.07774| lr: 0.0| temp: 1.95415 | loss: 1.0926| constrast_loss: 4.27718| div_loss: 0.93225| %_mask_idx: 0.37422| ppl: 43.36263| %_neg_is_pos: 0.07149| lr: 0.0| temp: 1.95414 | loss: 1.11272| constrast_loss: 4.35806| div_loss: 0.92829| %_mask_idx: 0.32581| ppl: 45.89177| %_neg_is_pos: 0.06265| lr: 0.0| temp: 1.95414 | loss: 1.11461| constrast_loss: 4.36562| div_loss: 0.92802| %_mask_idx: 0.41009| ppl: 46.06416| %_neg_is_pos: 0.07478| lr: 0.0| temp: 1.95413 | loss: 1.09833| constrast_loss: 4.29946| div_loss: 0.93845| %_mask_idx: 0.37437| ppl: 39.39069| %_neg_is_pos: 0.07583| lr: 0.0| temp: 1.95413 | loss: 1.10813| constrast_loss: 4.34017| div_loss: 0.92334| %_mask_idx: 0.41902| ppl: 49.06053| %_neg_is_pos: 0.06362| lr: 0.0| temp: 1.95411 | loss: 1.09465| constrast_loss: 4.28542| div_loss: 0.93172| %_mask_idx: 0.38033| ppl: 43.70098| %_neg_is_pos: 0.07832| lr: 0.0| temp: 1.95411 | loss: 1.09896| constrast_loss: 4.3023| div_loss: 0.93551| %_mask_idx: 0.37907| ppl: 41.27209| %_neg_is_pos: 0.06956| lr: 0.0| temp: 1.9541 | loss: 1.09351| constrast_loss: 4.28016| div_loss: 0.9387| %_mask_idx: 0.33568| ppl: 39.23369| %_neg_is_pos: 0.07001| lr: 0.0| temp: 1.9541 | loss: 1.09705| constrast_loss: 4.29466| div_loss: 0.93555| %_mask_idx: 0.36811| ppl: 41.2474| %_neg_is_pos: 0.09078| lr: 0.0| temp: 1.95409 | loss: 1.10042| constrast_loss: 4.30917| div_loss: 0.92506| %_mask_idx: 0.32503| ppl: 47.95864| %_neg_is_pos: 0.05317| lr: 0.0| temp: 1.95409 | loss: 1.09594| constrast_loss: 4.28999| div_loss: 0.93773| %_mask_idx: 0.39552| ppl: 39.85143| %_neg_is_pos: 0.08477| lr: 0.0| temp: 1.95408 | loss: 1.10987| constrast_loss: 4.34566| div_loss: 0.9382| %_mask_idx: 0.4057| ppl: 39.55181| %_neg_is_pos: 0.08304| lr: 0.0| temp: 1.95408 | loss: 1.10842| constrast_loss: 4.34056| div_loss: 0.93108| %_mask_idx: 0.40711| ppl: 44.11031| %_neg_is_pos: 0.07863| lr: 0.0| temp: 1.95407 | loss: 1.10752| constrast_loss: 4.33639| div_loss: 0.93681| %_mask_idx: 0.40915| ppl: 40.43965| %_neg_is_pos: 0.08865| lr: 0.0| temp: 1.95407 | loss: 1.1173| constrast_loss: 4.37694| div_loss: 0.92259| %_mask_idx: 0.4115| ppl: 49.5435| %_neg_is_pos: 0.06858| lr: 0.0| temp: 1.95406 | loss: 1.10784| constrast_loss: 4.33862| div_loss: 0.9275| %_mask_idx: 0.39865| ppl: 46.40042| %_neg_is_pos: 0.07024| lr: 0.0| temp: 1.95406 | loss: 1.10551| constrast_loss: 4.32777| div_loss: 0.94273| %_mask_idx: 0.38111| ppl: 36.65286| %_neg_is_pos: 0.08881| lr: 0.0| temp: 1.95404 | loss: 1.09891| constrast_loss: 4.30175| div_loss: 0.93894| %_mask_idx: 0.4032| ppl: 39.08154| %_neg_is_pos: 0.09006| lr: 0.0| temp: 1.95404 | loss: 1.0913| constrast_loss: 4.27146| div_loss: 0.93719| %_mask_idx: 0.36748| ppl: 40.1973| %_neg_is_pos: 0.07353| lr: 0.0| temp: 1.95403 | loss: 1.08964| constrast_loss: 4.26457| div_loss: 0.93978| %_mask_idx: 0.38315| ppl: 38.54043| %_neg_is_pos: 0.0765| lr: 0.0| temp: 1.95403 | loss: 1.0993| constrast_loss: 4.30447| div_loss: 0.92736| %_mask_idx: 0.32174| ppl: 46.49245| %_neg_is_pos: 0.06219| lr: 0.0| temp: 1.95402 | loss: 1.10366| constrast_loss: 4.32198| div_loss: 0.92681| %_mask_idx: 0.38174| ppl: 46.84066| %_neg_is_pos: 0.06697| lr: 0.0| temp: 1.95402 | loss: 1.10563| constrast_loss: 4.32942| div_loss: 0.93117| %_mask_idx: 0.39301| ppl: 44.04939| %_neg_is_pos: 0.07328| lr: 0.0| temp: 1.95401 | loss: 1.10495| constrast_loss: 4.32638| div_loss: 0.93444| %_mask_idx: 0.37484| ppl: 41.96074| %_neg_is_pos: 0.0757| lr: 0.0| temp: 1.95401 | loss: 1.10474| constrast_loss: 4.32657| div_loss: 0.92375| %_mask_idx: 0.33427| ppl: 48.79709| %_neg_is_pos: 0.05672| lr: 0.0| temp: 1.95399 | loss: 1.1044| constrast_loss: 4.3238| div_loss: 0.93813| %_mask_idx: 0.42794| ppl: 39.59515| %_neg_is_pos: 0.08932| lr: 0.0| temp: 1.95399 | loss: 1.09666| constrast_loss: 4.29303| div_loss: 0.93617| %_mask_idx: 0.33615| ppl: 40.85241| %_neg_is_pos: 0.07628| lr: 0.0| temp: 1.95398 | loss: 1.09563| constrast_loss: 4.28877| div_loss: 0.93766| %_mask_idx: 0.3631| ppl: 39.8961| %_neg_is_pos: 0.08174| lr: 0.0| temp: 1.95398 | loss: 1.11675| constrast_loss: 4.37408| div_loss: 0.92926| %_mask_idx: 0.36247| ppl: 45.27394| %_neg_is_pos: 0.0626| lr: 0.0| temp: 1.95397 | loss: 1.09977| constrast_loss: 4.30602| div_loss: 0.93057| %_mask_idx: 0.38268| ppl: 44.43432| %_neg_is_pos: 0.06875| lr: 0.0| temp: 1.95397 | loss: 1.09763| constrast_loss: 4.29717| div_loss: 0.93351| %_mask_idx: 0.41541| ppl: 42.55071| %_neg_is_pos: 0.08481| lr: 0.0| temp: 1.95396 | loss: 1.10449| constrast_loss: 4.32491| div_loss: 0.93067| %_mask_idx: 0.37923| ppl: 44.37359| %_neg_is_pos: 0.07307| lr: 0.0| temp: 1.95396 | loss: 1.08889| constrast_loss: 4.26174| div_loss: 0.93802| %_mask_idx: 0.4162| ppl: 39.66898| %_neg_is_pos: 0.08051| lr: 0.0| temp: 1.95394 | loss: 1.09378| constrast_loss: 4.28228| div_loss: 0.92855| %_mask_idx: 0.34743| ppl: 45.72654| %_neg_is_pos: 0.0624| lr: 0.0| temp: 1.95394 | loss: 1.093| constrast_loss: 4.27857| div_loss: 0.93441| %_mask_idx: 0.39395| ppl: 41.97867| %_neg_is_pos: 0.08125| lr: 0.0| temp: 1.95393 | loss: 1.10925| constrast_loss: 4.34449| div_loss: 0.9251| %_mask_idx: 0.40805| ppl: 47.93452| %_neg_is_pos: 0.0619| lr: 0.0| temp: 1.95393 [2021-09-02 07:15:45,346] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 07:15:45,346] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.09165| constrast_loss: 4.27282| div_loss: 0.93801| %_mask_idx: 0.42528| ppl: 39.67356| %_neg_is_pos: 0.09024| lr: 0.0| temp: 1.95391 | loss: 1.09292| constrast_loss: 4.27884| div_loss: 0.92849| %_mask_idx: 0.35934| ppl: 45.76538| %_neg_is_pos: 0.06515| lr: 0.0| temp: 1.95391 | loss: 1.09709| constrast_loss: 4.29461| div_loss: 0.93768| %_mask_idx: 0.362| ppl: 39.88701| %_neg_is_pos: 0.0803| lr: 0.0| temp: 1.9539 | loss: 1.0985| constrast_loss: 4.29971| div_loss: 0.94285| %_mask_idx: 0.37061| ppl: 36.57763| %_neg_is_pos: 0.08785| lr: 0.0| temp: 1.9539 | loss: 1.0877| constrast_loss: 4.25633| div_loss: 0.94483| %_mask_idx: 0.42607| ppl: 35.3115| %_neg_is_pos: 0.10347| lr: 0.0| temp: 1.95389 | loss: 1.09499| constrast_loss: 4.28594| div_loss: 0.94023| %_mask_idx: 0.38675| ppl: 38.25187| %_neg_is_pos: 0.0837| lr: 0.0| temp: 1.95389 | loss: 1.07955| constrast_loss: 4.22321| div_loss: 0.94994| %_mask_idx: 0.41776| ppl: 32.04158| %_neg_is_pos: 0.11512| lr: 0.0| temp: 1.95388 | loss: 1.08637| constrast_loss: 4.25102| div_loss: 0.94457| %_mask_idx: 0.40774| ppl: 35.47643| %_neg_is_pos: 0.10602| lr: 0.0| temp: 1.95388 | loss: 1.06804| constrast_loss: 4.17691| div_loss: 0.95228| %_mask_idx: 0.40586| ppl: 30.54276| %_neg_is_pos: 0.12185| lr: 0.0| temp: 1.95386| loss: 1.07557| constrast_loss: 4.20822| div_loss: 0.9406| %_mask_idx: 0.36184| ppl: 38.01569| %_neg_is_pos: 0.09509| lr: 0.0| temp: 1.95386 | loss: 1.03278| constrast_loss: 4.03508| div_loss: 0.96053| %_mask_idx: 0.3963| ppl: 25.26277| %_neg_is_pos: 0.1599| lr: 0.0| temp: 1.95385 | loss: 1.03085| constrast_loss: 4.0272| div_loss: 0.96199| %_mask_idx: 0.40695| ppl: 24.32485| %_neg_is_pos: 0.16716| lr: 0.0| temp: 1.95385 | loss: 1.01318| constrast_loss: 3.95611| div_loss: 0.96615| %_mask_idx: 0.4104| ppl: 21.66166| %_neg_is_pos: 0.18892| lr: 0.0| temp: 1.95384 | loss: 1.00789| constrast_loss: 3.93481| div_loss: 0.96746| %_mask_idx: 0.43828| ppl: 20.826| %_neg_is_pos: 0.19768| lr: 0.0| temp: 1.95384 | loss: 1.03338| constrast_loss: 4.03719| div_loss: 0.96348| %_mask_idx: 0.40163| ppl: 23.37304| %_neg_is_pos: 0.17337| lr: 0.0| temp: 1.95383 | loss: 1.02226| constrast_loss: 3.99212| div_loss: 0.96916| %_mask_idx: 0.38487| ppl: 19.73784| %_neg_is_pos: 0.19832| lr: 0.0| temp: 1.95383 | loss: 0.98857| constrast_loss: 3.85722| div_loss: 0.97044| %_mask_idx: 0.34352| ppl: 18.91646| %_neg_is_pos: 0.17461| lr: 0.0| temp: 1.95381| loss: 0.95934| constrast_loss: 3.74006| div_loss: 0.97313| %_mask_idx: 0.38393| ppl: 17.19909| %_neg_is_pos: 0.21061| lr: 0.0| temp: 1.95381 | loss: 0.91245| constrast_loss: 3.55179| div_loss: 0.98015| %_mask_idx: 0.40194| ppl: 12.70669| %_neg_is_pos: 0.26542| lr: 0.0| temp: 1.9538 | loss: 0.95559| constrast_loss: 3.7247| div_loss: 0.97652| %_mask_idx: 0.40053| ppl: 15.02437| %_neg_is_pos: 0.25685| lr: 0.0| temp: 1.9538 | loss: 0.94043| constrast_loss: 3.66387| div_loss: 0.97863| %_mask_idx: 0.34586| ppl: 13.67545| %_neg_is_pos: 0.24253| lr: 0.0| temp: 1.95379 | loss: 0.95637| constrast_loss: 3.72796| div_loss: 0.97503| %_mask_idx: 0.39615| ppl: 15.9821| %_neg_is_pos: 0.22395| lr: 0.0| temp: 1.95379 | loss: 1.00075| constrast_loss: 3.90567| div_loss: 0.97333| %_mask_idx: 0.39505| ppl: 17.06682| %_neg_is_pos: 0.20117| lr: 0.0| temp: 1.95378 | loss: 0.93112| constrast_loss: 3.62644| div_loss: 0.98052| %_mask_idx: 0.40179| ppl: 12.46462| %_neg_is_pos: 0.27636| lr: 0.0| temp: 1.95378 | loss: 0.94995| constrast_loss: 3.70192| div_loss: 0.97889| %_mask_idx: 0.3974| ppl: 13.51259| %_neg_is_pos: 0.2328| lr: 0.0| temp: 1.95376 | loss: 0.97009| constrast_loss: 3.78276| div_loss: 0.97596| %_mask_idx: 0.36043| ppl: 15.38746| %_neg_is_pos: 0.21865| lr: 0.0| temp: 1.95376 | loss: 1.01591| constrast_loss: 3.96633| div_loss: 0.97309| %_mask_idx: 0.33506| ppl: 17.22236| %_neg_is_pos: 0.17176| lr: 0.0| temp: 1.95375 | loss: 1.00844| constrast_loss: 3.93633| div_loss: 0.97431| %_mask_idx: 0.37672| ppl: 16.44328| %_neg_is_pos: 0.18322| lr: 0.0| temp: 1.95375 | loss: 1.02726| constrast_loss: 4.01149| div_loss: 0.97556| %_mask_idx: 0.3396| ppl: 15.6436| %_neg_is_pos: 0.16197| lr: 0.0| temp: 1.95373 | loss: 1.03805| constrast_loss: 4.05491| div_loss: 0.97301| %_mask_idx: 0.38925| ppl: 17.27237| %_neg_is_pos: 0.15167| lr: 0.0| temp: 1.95373 | loss: 1.04076| constrast_loss: 4.0657| div_loss: 0.9733| %_mask_idx: 0.35229| ppl: 17.09016| %_neg_is_pos: 0.15981| lr: 0.0| temp: 1.95372 | loss: 1.04572| constrast_loss: 4.08555| div_loss: 0.97338| %_mask_idx: 0.42841| ppl: 17.036| %_neg_is_pos: 0.17661| lr: 0.0| temp: 1.95372 | loss: 1.04335| constrast_loss: 4.07627| div_loss: 0.97111| %_mask_idx: 0.36889| ppl: 18.48828| %_neg_is_pos: 0.16049| lr: 0.0| temp: 1.95371 | loss: 1.03082| constrast_loss: 4.02577| div_loss: 0.97511| %_mask_idx: 0.3761| ppl: 15.92701| %_neg_is_pos: 0.17056| lr: 0.0| temp: 1.95371 | loss: 1.07225| constrast_loss: 4.19222| div_loss: 0.96787| %_mask_idx: 0.41917| ppl: 20.56393| %_neg_is_pos: 0.15496| lr: 0.0| temp: 1.9537 | loss: 1.06046| constrast_loss: 4.14493| div_loss: 0.96924| %_mask_idx: 0.39004| ppl: 19.68392| %_neg_is_pos: 0.14899| lr: 0.0| temp: 1.9537 | loss: 1.06048| constrast_loss: 4.14491| div_loss: 0.97018| %_mask_idx: 0.39458| ppl: 19.0826| %_neg_is_pos: 0.16103| lr: 0.0| temp: 1.95368 | loss: 1.02817| constrast_loss: 4.01527| div_loss: 0.97396| %_mask_idx: 0.34947| ppl: 16.6652| %_neg_is_pos: 0.15472| lr: 0.0| temp: 1.95368 | loss: 1.03773| constrast_loss: 4.05356| div_loss: 0.97378| %_mask_idx: 0.40695| ppl: 16.77881| %_neg_is_pos: 0.17657| lr: 0.0| temp: 1.95367 | loss: 1.01602| constrast_loss: 3.96665| div_loss: 0.97414| %_mask_idx: 0.34117| ppl: 16.54785| %_neg_is_pos: 0.15383| lr: 0.0| temp: 1.95367 | loss: 1.04751| constrast_loss: 4.09246| div_loss: 0.97589| %_mask_idx: 0.40899| ppl: 15.42822| %_neg_is_pos: 0.18817| lr: 0.0| temp: 1.95367 | loss: 1.02231| constrast_loss: 3.99165| div_loss: 0.97573| %_mask_idx: 0.37923| ppl: 15.53353| %_neg_is_pos: 0.16997| lr: 0.0| temp: 1.95367 | loss: 1.06352| constrast_loss: 4.15701| div_loss: 0.97087| %_mask_idx: 0.40523| ppl: 18.64012| %_neg_is_pos: 0.15025| lr: 0.0| temp: 1.95366 | loss: 1.05497| constrast_loss: 4.12271| div_loss: 0.97191| %_mask_idx: 0.42685| ppl: 17.97857| %_neg_is_pos: 0.16976| lr: 0.0| temp: 1.95366 | loss: 1.0401| constrast_loss: 4.06313| div_loss: 0.97279| %_mask_idx: 0.39865| ppl: 17.41704| %_neg_is_pos: 0.15781| lr: 0.0| temp: 1.95364 | loss: 1.04433| constrast_loss: 4.08018| div_loss: 0.97143| %_mask_idx: 0.43452| ppl: 18.28548| %_neg_is_pos: 0.17135| lr: 0.0| temp: 1.95364 | loss: 1.03868| constrast_loss: 4.05731| div_loss: 0.97431| %_mask_idx: 0.37312| ppl: 16.43979| %_neg_is_pos: 0.16839| lr: 0.0| temp: 1.95363 | loss: 1.02076| constrast_loss: 3.98583| div_loss: 0.97223| %_mask_idx: 0.31877| ppl: 17.77492| %_neg_is_pos: 0.15182| lr: 0.0| temp: 1.95363 | loss: 1.05188| constrast_loss: 4.11022| div_loss: 0.97313| %_mask_idx: 0.40836| ppl: 17.19425| %_neg_is_pos: 0.16907| lr: 0.0| temp: 1.95362 | loss: 1.0372| constrast_loss: 4.05129| div_loss: 0.9751| %_mask_idx: 0.39897| ppl: 15.93452| %_neg_is_pos: 0.17095| lr: 0.0| temp: 1.95362 | loss: 1.04774| constrast_loss: 4.0936| div_loss: 0.9736| %_mask_idx: 0.38503| ppl: 16.89411| %_neg_is_pos: 0.18886| lr: 0.0| temp: 1.95361 | loss: 1.03373| constrast_loss: 4.03749| div_loss: 0.97421| %_mask_idx: 0.43797| ppl: 16.50661| %_neg_is_pos: 0.18154| lr: 0.0| temp: 1.95361 | loss: 1.04577| constrast_loss: 4.08579| div_loss: 0.97274| %_mask_idx: 0.37751| ppl: 17.44383| %_neg_is_pos: 0.1556| lr: 0.0| temp: 1.95359 | loss: 1.03082| constrast_loss: 4.02582| div_loss: 0.97442| %_mask_idx: 0.38393| ppl: 16.37097| %_neg_is_pos: 0.16512| lr: 0.0| temp: 1.95359 | loss: 1.04311| constrast_loss: 4.075| div_loss: 0.97421| %_mask_idx: 0.34305| ppl: 16.50605| %_neg_is_pos: 0.15393| lr: 0.0| temp: 1.95358 | loss: 1.03724| constrast_loss: 4.0517| div_loss: 0.97253| %_mask_idx: 0.36325| ppl: 17.58209| %_neg_is_pos: 0.15487| lr: 0.0| temp: 1.95358 | loss: 1.04| constrast_loss: 4.06269| div_loss: 0.97307| %_mask_idx: 0.36858| ppl: 17.23277| %_neg_is_pos: 0.15697| lr: 0.0| temp: 1.95356 | loss: 1.01926| constrast_loss: 3.9795| div_loss: 0.97533| %_mask_idx: 0.32237| ppl: 15.78644| %_neg_is_pos: 0.16934| lr: 0.0| temp: 1.95356 | loss: 1.04222| constrast_loss: 4.0715| div_loss: 0.9736| %_mask_idx: 0.3703| ppl: 16.89861| %_neg_is_pos: 0.17391| lr: 0.0| temp: 1.95355 | loss: 1.052| constrast_loss: 4.11086| div_loss: 0.97153| %_mask_idx: 0.4021| ppl: 18.22187| %_neg_is_pos: 0.1624| lr: 0.0| temp: 1.95355 | loss: 1.04532| constrast_loss: 4.08408| div_loss: 0.97187| %_mask_idx: 0.36513| ppl: 18.00555| %_neg_is_pos: 0.15734| lr: 0.0| temp: 1.95354 | loss: 1.03683| constrast_loss: 4.04988| div_loss: 0.97457| %_mask_idx: 0.42982| ppl: 16.27623| %_neg_is_pos: 0.18243| lr: 0.0| temp: 1.95354 | loss: 1.03885| constrast_loss: 4.05796| div_loss: 0.97432| %_mask_idx: 0.45269| ppl: 16.43264| %_neg_is_pos: 0.19375| lr: 0.0| temp: 1.95353 | loss: 1.03513| constrast_loss: 4.04337| div_loss: 0.97146| %_mask_idx: 0.36497| ppl: 18.26741| %_neg_is_pos: 0.14587| lr: 0.0| temp: 1.95353 | loss: 1.06721| constrast_loss: 4.17189| div_loss: 0.96946| %_mask_idx: 0.37484| ppl: 19.54873| %_neg_is_pos: 0.13972| lr: 0.0| temp: 1.95351 | loss: 1.02916| constrast_loss: 4.01899| div_loss: 0.9766| %_mask_idx: 0.38503| ppl: 14.97527| %_neg_is_pos: 0.17419| lr: 0.0| temp: 1.95351 | loss: 1.03786| constrast_loss: 4.054| div_loss: 0.97454| %_mask_idx: 0.39286| ppl: 16.29335| %_neg_is_pos: 0.18511| lr: 0.0| temp: 1.9535 | loss: 1.01951| constrast_loss: 3.98037| div_loss: 0.97688| %_mask_idx: 0.34602| ppl: 14.79529| %_neg_is_pos: 0.17581| lr: 0.0| temp: 1.9535 | loss: 1.04164| constrast_loss: 4.06905| div_loss: 0.97499| %_mask_idx: 0.39223| ppl: 16.00806| %_neg_is_pos: 0.17238| lr: 0.0| temp: 1.95349 | loss: 1.05771| constrast_loss: 4.13375| div_loss: 0.97097| %_mask_idx: 0.42935| ppl: 18.57806| %_neg_is_pos: 0.14589| lr: 0.0| temp: 1.95349 | loss: 1.0499| constrast_loss: 4.10224| div_loss: 0.97349| %_mask_idx: 0.41573| ppl: 16.96378| %_neg_is_pos: 0.17299| lr: 0.0| temp: 1.95348 | loss: 1.05483| constrast_loss: 4.12215| div_loss: 0.97183| %_mask_idx: 0.38675| ppl: 18.03068| %_neg_is_pos: 0.16067| lr: 0.0| temp: 1.95348 | loss: 1.03619| constrast_loss: 4.04748| div_loss: 0.97277| %_mask_idx: 0.36404| ppl: 17.42923| %_neg_is_pos: 0.14947| lr: 0.0| temp: 1.95346 | loss: 1.02733| constrast_loss: 4.01168| div_loss: 0.97622| %_mask_idx: 0.44283| ppl: 15.21982| %_neg_is_pos: 0.20928| lr: 0.0| temp: 1.95346 | loss: 1.02588| constrast_loss: 4.00608| div_loss: 0.97457| %_mask_idx: 0.38127| ppl: 16.27238| %_neg_is_pos: 0.1648| lr: 0.0| temp: 1.95345 | loss: 1.03205| constrast_loss: 4.03083| div_loss: 0.97383| %_mask_idx: 0.40633| ppl: 16.74878| %_neg_is_pos: 0.16246| lr: 0.0| temp: 1.95345 | loss: 1.00725| constrast_loss: 3.93162| div_loss: 0.97376| %_mask_idx: 0.31548| ppl: 16.79592| %_neg_is_pos: 0.13865| lr: 0.0| temp: 1.95344 | loss: 1.04989| constrast_loss: 4.10215| div_loss: 0.97396| %_mask_idx: 0.43593| ppl: 16.66798| %_neg_is_pos: 0.17964| lr: 0.0| temp: 1.95344 | loss: 1.05868| constrast_loss: 4.13753| div_loss: 0.97179| %_mask_idx: 0.41745| ppl: 18.0566| %_neg_is_pos: 0.15831| lr: 0.0| temp: 1.95343 | loss: 1.04849| constrast_loss: 4.09671| div_loss: 0.97271| %_mask_idx: 0.41902| ppl: 17.46782| %_neg_is_pos: 0.17471| lr: 0.0| temp: 1.95343 | loss: 1.0432| constrast_loss: 4.07559| div_loss: 0.97221| %_mask_idx: 0.3927| ppl: 17.78423| %_neg_is_pos: 0.1653| lr: 0.0| temp: 1.95341 | loss: 1.07099| constrast_loss: 4.18713| div_loss: 0.96815| %_mask_idx: 0.38878| ppl: 20.38109| %_neg_is_pos: 0.14032| lr: 0.0| temp: 1.95341 | loss: 1.04793| constrast_loss: 4.09486| div_loss: 0.96874| %_mask_idx: 0.31736| ppl: 20.00512| %_neg_is_pos: 0.13193| lr: 0.0| temp: 1.9534 | loss: 1.04042| constrast_loss: 4.06422| div_loss: 0.97451| %_mask_idx: 0.37719| ppl: 16.31053| %_neg_is_pos: 0.16547| lr: 0.0| temp: 1.9534 | loss: 1.04754| constrast_loss: 4.09294| div_loss: 0.9723| %_mask_idx: 0.40116| ppl: 17.72754| %_neg_is_pos: 0.17292| lr: 0.0| temp: 1.95338 | loss: 1.04486| constrast_loss: 4.08226| div_loss: 0.97163| %_mask_idx: 0.3916| ppl: 18.15905| %_neg_is_pos: 0.15441| lr: 0.0| temp: 1.95338 | loss: 1.0303| constrast_loss: 4.02385| div_loss: 0.97335| %_mask_idx: 0.37343| ppl: 17.05499| %_neg_is_pos: 0.16774| lr: 0.0| temp: 1.95337 | loss: 1.02762| constrast_loss: 4.01287| div_loss: 0.97611| %_mask_idx: 0.37876| ppl: 15.28691| %_neg_is_pos: 0.17588| lr: 0.0| temp: 1.95337 | loss: 1.03721| constrast_loss: 4.05158| div_loss: 0.97246| %_mask_idx: 0.37876| ppl: 17.62769| %_neg_is_pos: 0.14791| lr: 0.0| temp: 1.95336 | loss: 1.03287| constrast_loss: 4.03405| div_loss: 0.97416| %_mask_idx: 0.38346| ppl: 16.53473| %_neg_is_pos: 0.16565| lr: 0.0| temp: 1.95336 | loss: 1.053| constrast_loss: 4.11485| div_loss: 0.97152| %_mask_idx: 0.40821| ppl: 18.22415| %_neg_is_pos: 0.16032| lr: 0.0| temp: 1.95335 | loss: 1.03249| constrast_loss: 4.03259| div_loss: 0.97381| %_mask_idx: 0.39113| ppl: 16.76081| %_neg_is_pos: 0.18179| lr: 0.0| temp: 1.95335 | loss: 1.05301| constrast_loss: 4.11477| div_loss: 0.97282| %_mask_idx: 0.37563| ppl: 17.39744| %_neg_is_pos: 0.15102| lr: 0.0| temp: 1.95333 | loss: 1.03415| constrast_loss: 4.03901| div_loss: 0.976| %_mask_idx: 0.39129| ppl: 15.3579| %_neg_is_pos: 0.17327| lr: 0.0| temp: 1.95333 | loss: 1.06815| constrast_loss: 4.17563| div_loss: 0.96965| %_mask_idx: 0.39881| ppl: 19.42546| %_neg_is_pos: 0.14409| lr: 0.0| temp: 1.95332 | loss: 1.04891| constrast_loss: 4.09857| div_loss: 0.97073| %_mask_idx: 0.3468| ppl: 18.73016| %_neg_is_pos: 0.14224| lr: 0.0| temp: 1.95332 | loss: 1.04824| constrast_loss: 4.09587| div_loss: 0.97095| %_mask_idx: 0.43969| ppl: 18.59152| %_neg_is_pos: 0.16704| lr: 0.0| temp: 1.95331 | loss: 1.04725| constrast_loss: 4.09196| div_loss: 0.9704| %_mask_idx: 0.40758| ppl: 18.94133| %_neg_is_pos: 0.18742| lr: 0.0| temp: 1.95331 | loss: 1.04416| constrast_loss: 4.07931| div_loss: 0.97332| %_mask_idx: 0.38863| ppl: 17.07355| %_neg_is_pos: 0.17714| lr: 0.0| temp: 1.9533 | loss: 1.03951| constrast_loss: 4.06055| div_loss: 0.97491| %_mask_idx: 0.37124| ppl: 16.0547| %_neg_is_pos: 0.17135| lr: 0.0| temp: 1.9533 | loss: 1.00887| constrast_loss: 3.93789| div_loss: 0.97576| %_mask_idx: 0.41244| ppl: 15.51472| %_neg_is_pos: 0.18251| lr: 0.0| temp: 1.95328 | loss: 1.01703| constrast_loss: 3.97059| div_loss: 0.97548| %_mask_idx: 0.41353| ppl: 15.69567| %_neg_is_pos: 0.18349| lr: 0.0| temp: 1.95328 | loss: 1.0381| constrast_loss: 4.05493| div_loss: 0.97459| %_mask_idx: 0.41573| ppl: 16.26552| %_neg_is_pos: 0.17793| lr: 0.0| temp: 1.95327 | loss: 1.0531| constrast_loss: 4.11522| div_loss: 0.97164| %_mask_idx: 0.39333| ppl: 18.15253| %_neg_is_pos: 0.14979| lr: 0.0| temp: 1.95327 | loss: 1.04584| constrast_loss: 4.08615| div_loss: 0.97213| %_mask_idx: 0.4433| ppl: 17.83753| %_neg_is_pos: 0.16771| lr: 0.0| temp: 1.95326 | loss: 1.02342| constrast_loss: 3.99629| div_loss: 0.97392| %_mask_idx: 0.37563| ppl: 16.69307| %_neg_is_pos: 0.17123| lr: 0.0| temp: 1.95326 | loss: 1.0332| constrast_loss: 4.03519| div_loss: 0.97619| %_mask_idx: 0.40742| ppl: 15.23856| %_neg_is_pos: 0.1765| lr: 0.0| temp: 1.95325 | loss: 1.05297| constrast_loss: 4.11446| div_loss: 0.97427| %_mask_idx: 0.42105| ppl: 16.46936| %_neg_is_pos: 0.17861| lr: 0.0| temp: 1.95325 | loss: 1.0406| constrast_loss: 4.06482| div_loss: 0.9759| %_mask_idx: 0.4162| ppl: 15.42363| %_neg_is_pos: 0.1901| lr: 0.0| temp: 1.95324 | loss: 1.05282| constrast_loss: 4.11405| div_loss: 0.9723| %_mask_idx: 0.40351| ppl: 17.72645| %_neg_is_pos: 0.15702| lr: 0.0| temp: 1.95324 | loss: 1.02984| constrast_loss: 4.02169| div_loss: 0.97675| %_mask_idx: 0.41275| ppl: 14.88088| %_neg_is_pos: 0.18549| lr: 0.0| temp: 1.95323 | loss: 1.0392| constrast_loss: 4.05925| div_loss: 0.97539| %_mask_idx: 0.35526| ppl: 15.74758| %_neg_is_pos: 0.15246| lr: 0.0| temp: 1.95323 [2021-09-02 07:24:58,429] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 07:24:58,429] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.06559| constrast_loss: 4.16544| div_loss: 0.96916| %_mask_idx: 0.40132| ppl: 19.73554| %_neg_is_pos: 0.15514| lr: 0.0| temp: 1.95321 | loss: 1.00761| constrast_loss: 3.93305| div_loss: 0.97396| %_mask_idx: 0.31579| ppl: 16.66705| %_neg_is_pos: 0.15993| lr: 0.0| temp: 1.95321 | loss: 1.05672| constrast_loss: 4.12961| div_loss: 0.97259| %_mask_idx: 0.40414| ppl: 17.54373| %_neg_is_pos: 0.14639| lr: 0.0| temp: 1.9532 | loss: 1.07564| constrast_loss: 4.20565| div_loss: 0.96922| %_mask_idx: 0.38127| ppl: 19.69642| %_neg_is_pos: 0.13677| lr: 0.0| temp: 1.9532 | loss: 1.08278| constrast_loss: 4.23425| div_loss: 0.9687| %_mask_idx: 0.40883| ppl: 20.03504| %_neg_is_pos: 0.14181| lr: 0.0| temp: 1.95319 | loss: 1.07039| constrast_loss: 4.18442| div_loss: 0.97126| %_mask_idx: 0.41181| ppl: 18.39135| %_neg_is_pos: 0.13967| lr: 0.0| temp: 1.95319 | loss: 1.06835| constrast_loss: 4.1767| div_loss: 0.96695| %_mask_idx: 0.36435| ppl: 21.15093| %_neg_is_pos: 0.12837| lr: 0.0| temp: 1.95318 | loss: 1.07939| constrast_loss: 4.22035| div_loss: 0.97188| %_mask_idx: 0.32644| ppl: 17.99767| %_neg_is_pos: 0.12836| lr: 0.0| temp: 1.95318 | loss: 1.06949| constrast_loss: 4.18093| div_loss: 0.97038| %_mask_idx: 0.43296| ppl: 18.95412| %_neg_is_pos: 0.14224| lr: 0.0| temp: 1.95316 | loss: 1.07563| constrast_loss: 4.20551| div_loss: 0.96995| %_mask_idx: 0.33631| ppl: 19.23356| %_neg_is_pos: 0.12905| lr: 0.0| temp: 1.95316 | loss: 1.07305| constrast_loss: 4.19537| div_loss: 0.96838| %_mask_idx: 0.42293| ppl: 20.23526| %_neg_is_pos: 0.14724| lr: 0.0| temp: 1.95315 | loss: 1.0606| constrast_loss: 4.1453| div_loss: 0.97086| %_mask_idx: 0.37798| ppl: 18.64917| %_neg_is_pos: 0.14179| lr: 0.0| temp: 1.95315 | loss: 1.06245| constrast_loss: 4.15278| div_loss: 0.97021| %_mask_idx: 0.37657| ppl: 19.06264| %_neg_is_pos: 0.1541| lr: 0.0| temp: 1.95314 | loss: 1.07912| constrast_loss: 4.21948| div_loss: 0.96993| %_mask_idx: 0.37093| ppl: 19.24314| %_neg_is_pos: 0.14065| lr: 0.0| temp: 1.95314 | loss: 1.061| constrast_loss: 4.14696| div_loss: 0.97052| %_mask_idx: 0.39583| ppl: 18.86679| %_neg_is_pos: 0.16488| lr: 0.0| temp: 1.95313 | loss: 1.05325| constrast_loss: 4.11598| div_loss: 0.97016| %_mask_idx: 0.41776| ppl: 19.09637| %_neg_is_pos: 0.1677| lr: 0.0| temp: 1.95313 | loss: 1.06365| constrast_loss: 4.15696| div_loss: 0.97641| %_mask_idx: 0.35244| ppl: 15.09983| %_neg_is_pos: 0.16196| lr: 0.0| temp: 1.95311| loss: 1.03462| constrast_loss: 4.04096| div_loss: 0.97535| %_mask_idx: 0.35119| ppl: 15.7769| %_neg_is_pos: 0.17791| lr: 0.0| temp: 1.95311 | loss: 1.03216| constrast_loss: 4.03107| div_loss: 0.97561| %_mask_idx: 0.40899| ppl: 15.60715| %_neg_is_pos: 0.20576| lr: 0.0| temp: 1.9531 | loss: 1.04073| constrast_loss: 4.06515| div_loss: 0.97756| %_mask_idx: 0.36184| ppl: 14.35904| %_neg_is_pos: 0.19312| lr: 0.0| temp: 1.9531 | loss: 1.03258| constrast_loss: 4.03289| div_loss: 0.97429| %_mask_idx: 0.4057| ppl: 16.4525| %_neg_is_pos: 0.18966| lr: 0.0| temp: 1.95309 | loss: 1.04558| constrast_loss: 4.08483| div_loss: 0.9748| %_mask_idx: 0.41823| ppl: 16.12957| %_neg_is_pos: 0.18742| lr: 0.0| temp: 1.95309 | loss: 1.01379| constrast_loss: 3.95737| div_loss: 0.97776| %_mask_idx: 0.43781| ppl: 14.23515| %_neg_is_pos: 0.1995| lr: 0.0| temp: 1.95308 | loss: 1.01713| constrast_loss: 3.97084| div_loss: 0.97667| %_mask_idx: 0.38001| ppl: 14.92807| %_neg_is_pos: 0.18263| lr: 0.0| temp: 1.95308 | loss: 1.01652| constrast_loss: 3.96845| div_loss: 0.97622| %_mask_idx: 0.41165| ppl: 15.22136| %_neg_is_pos: 0.21916| lr: 0.0| temp: 1.95306 | loss: 1.02782| constrast_loss: 4.0136| div_loss: 0.97671| %_mask_idx: 0.35025| ppl: 14.90565| %_neg_is_pos: 0.19299| lr: 0.0| temp: 1.95306 | loss: 0.99595| constrast_loss: 3.88595| div_loss: 0.97858| %_mask_idx: 0.39333| ppl: 13.71039| %_neg_is_pos: 0.23427| lr: 0.0| temp: 1.95305 | loss: 1.02533| constrast_loss: 4.0035| div_loss: 0.97826| %_mask_idx: 0.41087| ppl: 13.911| %_neg_is_pos: 0.22904| lr: 0.0| temp: 1.95305 | loss: 1.00027| constrast_loss: 3.90325| div_loss: 0.97842| %_mask_idx: 0.35918| ppl: 13.81198| %_neg_is_pos: 0.21357| lr: 0.0| temp: 1.95303 | loss: 0.99951| constrast_loss: 3.90034| div_loss: 0.97677| %_mask_idx: 0.44314| ppl: 14.86449| %_neg_is_pos: 0.21896| lr: 0.0| temp: 1.95303 | loss: 1.01558| constrast_loss: 3.96466| div_loss: 0.97662| %_mask_idx: 0.37657| ppl: 14.96145| %_neg_is_pos: 0.22781| lr: 0.0| temp: 1.95302 | loss: 0.99256| constrast_loss: 3.87241| div_loss: 0.9785| %_mask_idx: 0.3916| ppl: 13.7573| %_neg_is_pos: 0.2211| lr: 0.0| temp: 1.95302 | loss: 0.98177| constrast_loss: 3.82919| div_loss: 0.97907| %_mask_idx: 0.41353| ppl: 13.39567| %_neg_is_pos: 0.23125| lr: 0.0| temp: 1.95301 | loss: 0.99836| constrast_loss: 3.89575| div_loss: 0.97689| %_mask_idx: 0.37657| ppl: 14.79122| %_neg_is_pos: 0.2105| lr: 0.0| temp: 1.95301 | loss: 1.00289| constrast_loss: 3.91391| div_loss: 0.97672| %_mask_idx: 0.41886| ppl: 14.89981| %_neg_is_pos: 0.22118| lr: 0.0| temp: 1.953 | loss: 1.01879| constrast_loss: 3.97752| div_loss: 0.97652| %_mask_idx: 0.40555| ppl: 15.02435| %_neg_is_pos: 0.21126| lr: 0.0| temp: 1.953 | loss: 0.983| constrast_loss: 3.83402| div_loss: 0.97982| %_mask_idx: 0.39458| ppl: 12.913| %_neg_is_pos: 0.23277| lr: 0.0| temp: 1.95298 | loss: 1.01271| constrast_loss: 3.95345| div_loss: 0.97392| %_mask_idx: 0.34602| ppl: 16.69167| %_neg_is_pos: 0.19676| lr: 0.0| temp: 1.95298 | loss: 1.01434| constrast_loss: 3.95973| div_loss: 0.97648| %_mask_idx: 0.38268| ppl: 15.05521| %_neg_is_pos: 0.19839| lr: 0.0| temp: 1.95297 | loss: 0.98941| constrast_loss: 3.85982| div_loss: 0.97822| %_mask_idx: 0.40006| ppl: 13.94024| %_neg_is_pos: 0.23718| lr: 0.0| temp: 1.95297 | loss: 1.02408| constrast_loss: 3.99874| div_loss: 0.97595| %_mask_idx: 0.42951| ppl: 15.39398| %_neg_is_pos: 0.22486| lr: 0.0| temp: 1.95296 | loss: 0.99882| constrast_loss: 3.89766| div_loss: 0.97609| %_mask_idx: 0.38534| ppl: 15.30317| %_neg_is_pos: 0.20723| lr: 0.0| temp: 1.95296 | loss: 1.00488| constrast_loss: 3.92196| div_loss: 0.97575| %_mask_idx: 0.43311| ppl: 15.51848| %_neg_is_pos: 0.21681| lr: 0.0| temp: 1.95295 | loss: 0.97015| constrast_loss: 3.78249| div_loss: 0.981| %_mask_idx: 0.37845| ppl: 12.15743| %_neg_is_pos: 0.243| lr: 0.0| temp: 1.95295 | loss: 0.98582| constrast_loss: 3.84546| div_loss: 0.97809| %_mask_idx: 0.35464| ppl: 14.02239| %_neg_is_pos: 0.22602| lr: 0.0| temp: 1.95293 | loss: 1.00765| constrast_loss: 3.93272| div_loss: 0.97882| %_mask_idx: 0.44471| ppl: 13.55248| %_neg_is_pos: 0.24519| lr: 0.0| temp: 1.95293 | loss: 0.98401| constrast_loss: 3.83813| div_loss: 0.97889| %_mask_idx: 0.41087| ppl: 13.51338| %_neg_is_pos: 0.2286| lr: 0.0| temp: 1.95292 | loss: 0.98604| constrast_loss: 3.84641| div_loss: 0.97751| %_mask_idx: 0.3443| ppl: 14.3934| %_neg_is_pos: 0.1906| lr: 0.0| temp: 1.95292 | loss: 0.99466| constrast_loss: 3.88083| div_loss: 0.97821| %_mask_idx: 0.3786| ppl: 13.94276| %_neg_is_pos: 0.22468| lr: 0.0| temp: 1.95291 | loss: 0.96019| constrast_loss: 3.74262| div_loss: 0.98148| %_mask_idx: 0.35871| ppl: 11.85066| %_neg_is_pos: 0.23057| lr: 0.0| temp: 1.95291 | loss: 0.97215| constrast_loss: 3.79066| div_loss: 0.97918| %_mask_idx: 0.35542| ppl: 13.3251| %_neg_is_pos: 0.22596| lr: 0.0| temp: 1.9529 | loss: 0.99703| constrast_loss: 3.89027| div_loss: 0.97841| %_mask_idx: 0.38221| ppl: 13.8157| %_neg_is_pos: 0.22886| lr: 0.0| temp: 1.9529 | loss: 0.98445| constrast_loss: 3.83976| div_loss: 0.98019| %_mask_idx: 0.40445| ppl: 12.68096| %_neg_is_pos: 0.23482| lr: 0.0| temp: 1.95288 | loss: 1.00004| constrast_loss: 3.90237| div_loss: 0.9781| %_mask_idx: 0.40868| ppl: 14.01723| %_neg_is_pos: 0.22423| lr: 0.0| temp: 1.95288 | loss: 0.98284| constrast_loss: 3.83351| div_loss: 0.97864| %_mask_idx: 0.37563| ppl: 13.66967| %_neg_is_pos: 0.23013| lr: 0.0| temp: 1.95287 | loss: 0.99884| constrast_loss: 3.89777| div_loss: 0.97577| %_mask_idx: 0.38534| ppl: 15.50532| %_neg_is_pos: 0.22946| lr: 0.0| temp: 1.95287 | loss: 0.99269| constrast_loss: 3.87292| div_loss: 0.97836| %_mask_idx: 0.38503| ppl: 13.84696| %_neg_is_pos: 0.20816| lr: 0.0| temp: 1.95285 | loss: 0.98497| constrast_loss: 3.84208| div_loss: 0.97791| %_mask_idx: 0.37641| ppl: 14.13925| %_neg_is_pos: 0.22638| lr: 0.0| temp: 1.95285 | loss: 1.01384| constrast_loss: 3.95774| div_loss: 0.97601| %_mask_idx: 0.40273| ppl: 15.35591| %_neg_is_pos: 0.2162| lr: 0.0| temp: 1.95284 | loss: 1.01913| constrast_loss: 3.97886| div_loss: 0.97654| %_mask_idx: 0.41385| ppl: 15.01753| %_neg_is_pos: 0.21113| lr: 0.0| temp: 1.95284 | loss: 0.99662| constrast_loss: 3.88847| div_loss: 0.98| %_mask_idx: 0.44236| ppl: 12.79778| %_neg_is_pos: 0.25213| lr: 0.0| temp: 1.95283 | loss: 1.03972| constrast_loss: 4.06158| div_loss: 0.97304| %_mask_idx: 0.37798| ppl: 17.25725| %_neg_is_pos: 0.19018| lr: 0.0| temp: 1.95283 | loss: 0.96054| constrast_loss: 3.74418| div_loss: 0.97958| %_mask_idx: 0.38581| ppl: 13.06968| %_neg_is_pos: 0.22547| lr: 0.0| temp: 1.95283 | loss: 0.99355| constrast_loss: 3.87647| div_loss: 0.97737| %_mask_idx: 0.38831| ppl: 14.482| %_neg_is_pos: 0.21262| lr: 0.0| temp: 1.95283 | loss: 1.01885| constrast_loss: 3.97787| div_loss: 0.97538| %_mask_idx: 0.43813| ppl: 15.75542| %_neg_is_pos: 0.2022| lr: 0.0| temp: 1.95281 | loss: 1.02605| constrast_loss: 4.00647| div_loss: 0.97724| %_mask_idx: 0.38174| ppl: 14.56921| %_neg_is_pos: 0.2091| lr: 0.0| temp: 1.95281 | loss: 1.0016| constrast_loss: 3.90874| div_loss: 0.9765| %_mask_idx: 0.35041| ppl: 15.04134| %_neg_is_pos: 0.19969| lr: 0.0| temp: 1.9528 | loss: 0.98744| constrast_loss: 3.85195| div_loss: 0.97809| %_mask_idx: 0.36341| ppl: 14.02395| %_neg_is_pos: 0.21374| lr: 0.0| temp: 1.9528 | loss: 0.97782| constrast_loss: 3.81333| div_loss: 0.9796| %_mask_idx: 0.43625| ppl: 13.05326| %_neg_is_pos: 0.23982| lr: 0.0| temp: 1.95279 | loss: 1.02414| constrast_loss: 3.99898| div_loss: 0.97566| %_mask_idx: 0.40727| ppl: 15.5801| %_neg_is_pos: 0.19212| lr: 0.0| temp: 1.95279 | loss: 1.01503| constrast_loss: 3.96262| div_loss: 0.97513| %_mask_idx: 0.41385| ppl: 15.91477| %_neg_is_pos: 0.212| lr: 0.0| temp: 1.95278 | loss: 0.99547| constrast_loss: 3.88408| div_loss: 0.9782| %_mask_idx: 0.43139| ppl: 13.95116| %_neg_is_pos: 0.23141| lr: 0.0| temp: 1.95278 | loss: 1.01952| constrast_loss: 3.98048| div_loss: 0.97588| %_mask_idx: 0.39051| ppl: 15.43796| %_neg_is_pos: 0.20209| lr: 0.0| temp: 1.95276 | loss: 0.99005| constrast_loss: 3.86239| div_loss: 0.97825| %_mask_idx: 0.3667| ppl: 13.9217| %_neg_is_pos: 0.21307| lr: 0.0| temp: 1.95276 | loss: 1.01482| constrast_loss: 3.96166| div_loss: 0.97613| %_mask_idx: 0.39991| ppl: 15.27652| %_neg_is_pos: 0.2048| lr: 0.0| temp: 1.95275 | loss: 0.99509| constrast_loss: 3.88249| div_loss: 0.97882| %_mask_idx: 0.38957| ppl: 13.55706| %_neg_is_pos: 0.21945| lr: 0.0| temp: 1.95275 | loss: 0.99023| constrast_loss: 3.86321| div_loss: 0.97708| %_mask_idx: 0.36889| ppl: 14.66677| %_neg_is_pos: 0.21514| lr: 0.0| temp: 1.95274 | loss: 0.98434| constrast_loss: 3.83956| div_loss: 0.97805| %_mask_idx: 0.3631| ppl: 14.04838| %_neg_is_pos: 0.21467| lr: 0.0| temp: 1.95274 | loss: 0.97645| constrast_loss: 3.80779| div_loss: 0.98014| %_mask_idx: 0.36701| ppl: 12.71282| %_neg_is_pos: 0.25334| lr: 0.0| temp: 1.95273 | loss: 0.97722| constrast_loss: 3.81089| div_loss: 0.97968| %_mask_idx: 0.40523| ppl: 13.00741| %_neg_is_pos: 0.23881| lr: 0.0| temp: 1.95273 | loss: 1.03173| constrast_loss: 4.02946| div_loss: 0.97471| %_mask_idx: 0.3974| ppl: 16.18831| %_neg_is_pos: 0.20022| lr: 0.0| temp: 1.95271 | loss: 1.00153| constrast_loss: 3.90839| div_loss: 0.97732| %_mask_idx: 0.41761| ppl: 14.5163| %_neg_is_pos: 0.22476| lr: 0.0| temp: 1.95271 | loss: 1.01109| constrast_loss: 3.94674| div_loss: 0.97618| %_mask_idx: 0.41886| ppl: 15.24333| %_neg_is_pos: 0.22084| lr: 0.0| temp: 1.9527 | loss: 0.99579| constrast_loss: 3.88544| div_loss: 0.97738| %_mask_idx: 0.38581| ppl: 14.47517| %_neg_is_pos: 0.21482| lr: 0.0| temp: 1.9527 | loss: 1.0003| constrast_loss: 3.90345| div_loss: 0.97772| %_mask_idx: 0.3667| ppl: 14.26014| %_neg_is_pos: 0.21849| lr: 0.0| temp: 1.95268 | loss: 1.01005| constrast_loss: 3.94264| div_loss: 0.9757| %_mask_idx: 0.41118| ppl: 15.54901| %_neg_is_pos: 0.21543| lr: 0.0| temp: 1.95268 | loss: 0.99432| constrast_loss: 3.87955| div_loss: 0.97718| %_mask_idx: 0.37531| ppl: 14.60207| %_neg_is_pos: 0.2133| lr: 0.0| temp: 1.95267 | loss: 1.00509| constrast_loss: 3.92259| div_loss: 0.97791| %_mask_idx: 0.41541| ppl: 14.13961| %_neg_is_pos: 0.22756| lr: 0.0| temp: 1.95267 | loss: 0.98702| constrast_loss: 3.85008| div_loss: 0.97994| %_mask_idx: 0.34383| ppl: 12.83612| %_neg_is_pos: 0.2238| lr: 0.0| temp: 1.95266 | loss: 1.01426| constrast_loss: 3.95974| div_loss: 0.97318| %_mask_idx: 0.36795| ppl: 17.16541| %_neg_is_pos: 0.2004| lr: 0.0| temp: 1.95266 | loss: 1.00492| constrast_loss: 3.92184| div_loss: 0.97822| %_mask_idx: 0.40429| ppl: 13.93705| %_neg_is_pos: 0.24309| lr: 0.0| temp: 1.95265 | loss: 0.97482| constrast_loss: 3.80139| div_loss: 0.97909| %_mask_idx: 0.37061| ppl: 13.38084| %_neg_is_pos: 0.23212| lr: 0.0| temp: 1.95265 | loss: 0.98788| constrast_loss: 3.85375| div_loss: 0.9776| %_mask_idx: 0.41886| ppl: 14.33809| %_neg_is_pos: 0.22319| lr: 0.0| temp: 1.95263 | loss: 0.9949| constrast_loss: 3.88169| div_loss: 0.97901| %_mask_idx: 0.41071| ppl: 13.4361| %_neg_is_pos: 0.21523| lr: 0.0| temp: 1.95263 | loss: 0.99122| constrast_loss: 3.86687| div_loss: 0.97997| %_mask_idx: 0.35855| ppl: 12.8201| %_neg_is_pos: 0.2182| lr: 0.0| temp: 1.95262 | loss: 1.01297| constrast_loss: 3.95421| div_loss: 0.97682| %_mask_idx: 0.38643| ppl: 14.83401| %_neg_is_pos: 0.21642| lr: 0.0| temp: 1.95262 | loss: 1.01838| constrast_loss: 3.97601| div_loss: 0.97517| %_mask_idx: 0.35182| ppl: 15.88818| %_neg_is_pos: 0.19827| lr: 0.0| temp: 1.95261 | loss: 0.99469| constrast_loss: 3.88086| div_loss: 0.9792| %_mask_idx: 0.38534| ppl: 13.31514| %_neg_is_pos: 0.23526| lr: 0.0| temp: 1.95261 | loss: 1.00741| constrast_loss: 3.93221| div_loss: 0.97419| %_mask_idx: 0.33788| ppl: 16.5165| %_neg_is_pos: 0.21125| lr: 0.0| temp: 1.9526 | loss: 0.99499| constrast_loss: 3.88195| div_loss: 0.98003| %_mask_idx: 0.44236| ppl: 12.78383| %_neg_is_pos: 0.2517| lr: 0.0| temp: 1.9526 | loss: 1.00784| constrast_loss: 3.93385| div_loss: 0.97518| %_mask_idx: 0.37375| ppl: 15.88608| %_neg_is_pos: 0.2183| lr: 0.0| temp: 1.95258 | loss: 1.00418| constrast_loss: 3.91899| div_loss: 0.97744| %_mask_idx: 0.37688| ppl: 14.43703| %_neg_is_pos: 0.22075| lr: 0.0| temp: 1.95258 | loss: 1.00647| constrast_loss: 3.92832| div_loss: 0.97543| %_mask_idx: 0.34712| ppl: 15.72253| %_neg_is_pos: 0.21055| lr: 0.0| temp: 1.95257 | loss: 0.98047| constrast_loss: 3.82407| div_loss: 0.97812| %_mask_idx: 0.32628| ppl: 14.00587| %_neg_is_pos: 0.2226| lr: 0.0| temp: 1.95257 | loss: 0.99817| constrast_loss: 3.89492| div_loss: 0.97745| %_mask_idx: 0.4256| ppl: 14.43468| %_neg_is_pos: 0.21923| lr: 0.0| temp: 1.95256 | loss: 1.01826| constrast_loss: 3.97549| div_loss: 0.97531| %_mask_idx: 0.33756| ppl: 15.80392| %_neg_is_pos: 0.1968| lr: 0.0| temp: 1.95256 | loss: 0.98328| constrast_loss: 3.83536| div_loss: 0.97742| %_mask_idx: 0.39771| ppl: 14.45434| %_neg_is_pos: 0.22992| lr: 0.0| temp: 1.95255 | loss: 0.99524| constrast_loss: 3.88316| div_loss: 0.97805| %_mask_idx: 0.39959| ppl: 14.04628| %_neg_is_pos: 0.23654| lr: 0.0| temp: 1.95255 | loss: 0.97305| constrast_loss: 3.79422| div_loss: 0.9798| %_mask_idx: 0.38596| ppl: 12.92865| %_neg_is_pos: 0.25589| lr: 0.0| temp: 1.95253 | loss: 0.9968| constrast_loss: 3.88936| div_loss: 0.9782| %_mask_idx: 0.42638| ppl: 13.94944| %_neg_is_pos: 0.22649| lr: 0.0| temp: 1.95253 | loss: 0.95001| constrast_loss: 3.70188| div_loss: 0.98145| %_mask_idx: 0.36482| ppl: 11.87276| %_neg_is_pos: 0.25104| lr: 0.0| temp: 1.95252 | loss: 1.01911| constrast_loss: 3.97904| div_loss: 0.97386| %_mask_idx: 0.40226| ppl: 16.73249| %_neg_is_pos: 0.20791| lr: 0.0| temp: 1.95252 [2021-09-02 07:34:12,712] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 07:34:12,712] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 0.99644| constrast_loss: 3.88792| div_loss: 0.97818| %_mask_idx: 0.401| ppl: 13.96187| %_neg_is_pos: 0.20601| lr: 0.0| temp: 1.9525 | loss: 0.99111| constrast_loss: 3.8666| div_loss: 0.9784| %_mask_idx: 0.40351| ppl: 13.82244| %_neg_is_pos: 0.23268| lr: 0.0| temp: 1.9525 | loss: 1.0166| constrast_loss: 3.96867| div_loss: 0.97753| %_mask_idx: 0.38017| ppl: 14.3826| %_neg_is_pos: 0.22113| lr: 0.0| temp: 1.95249 | loss: 0.99602| constrast_loss: 3.88659| div_loss: 0.97483| %_mask_idx: 0.42372| ppl: 16.11147| %_neg_is_pos: 0.21024| lr: 0.0| temp: 1.95249 | loss: 1.00362| constrast_loss: 3.91679| div_loss: 0.97698| %_mask_idx: 0.40163| ppl: 14.73578| %_neg_is_pos: 0.22308| lr: 0.0| temp: 1.95248 | loss: 0.9729| constrast_loss: 3.79362| div_loss: 0.9796| %_mask_idx: 0.42763| ppl: 13.0571| %_neg_is_pos: 0.25597| lr: 0.0| temp: 1.95248 | loss: 0.97019| constrast_loss: 3.78292| div_loss: 0.97824| %_mask_idx: 0.40069| ppl: 13.92745| %_neg_is_pos: 0.23741| lr: 0.0| temp: 1.95247 | loss: 0.95024| constrast_loss: 3.70285| div_loss: 0.98118| %_mask_idx: 0.45207| ppl: 12.04211| %_neg_is_pos: 0.25271| lr: 0.0| temp: 1.95247 | loss: 0.9329| constrast_loss: 3.63346| div_loss: 0.98155| %_mask_idx: 0.42638| ppl: 11.80839| %_neg_is_pos: 0.30341| lr: 0.0| temp: 1.95245 | loss: 0.97652| constrast_loss: 3.80826| div_loss: 0.97816| %_mask_idx: 0.39787| ppl: 13.97746| %_neg_is_pos: 0.24722| lr: 0.0| temp: 1.95245 | loss: 0.96857| constrast_loss: 3.77638| div_loss: 0.97887| %_mask_idx: 0.37312| ppl: 13.52357| %_neg_is_pos: 0.24477| lr: 0.0| temp: 1.95244 | loss: 0.98977| constrast_loss: 3.86167| div_loss: 0.97412| %_mask_idx: 0.35464| ppl: 16.56568| %_neg_is_pos: 0.20443| lr: 0.0| temp: 1.95244 | loss: 0.97815| constrast_loss: 3.81505| div_loss: 0.97527| %_mask_idx: 0.34962| ppl: 15.82431| %_neg_is_pos: 0.23696| lr: 0.0| temp: 1.95243 | loss: 0.97369| constrast_loss: 3.79698| div_loss: 0.97766| %_mask_idx: 0.39223| ppl: 14.29468| %_neg_is_pos: 0.24429| lr: 0.0| temp: 1.95243 | loss: 1.02375| constrast_loss: 3.99774| div_loss: 0.97256| %_mask_idx: 0.33349| ppl: 17.56358| %_neg_is_pos: 0.20804| lr: 0.0| temp: 1.95242 | loss: 1.00615| constrast_loss: 3.9274| div_loss: 0.97205| %_mask_idx: 0.38205| ppl: 17.88883| %_neg_is_pos: 0.20836| lr: 0.0| temp: 1.95242 | loss: 0.98892| constrast_loss: 3.85847| div_loss: 0.97219| %_mask_idx: 0.38628| ppl: 17.80094| %_neg_is_pos: 0.19258| lr: 0.0| temp: 1.95241| loss: 1.00808| constrast_loss: 3.93528| div_loss: 0.97022| %_mask_idx: 0.36263| ppl: 19.06103| %_neg_is_pos: 0.188| lr: 0.0| temp: 1.95241 | loss: 1.02872| constrast_loss: 4.01784| div_loss: 0.9704| %_mask_idx: 0.38737| ppl: 18.94341| %_neg_is_pos: 0.18138| lr: 0.0| temp: 1.9524 | loss: 1.06241| constrast_loss: 4.15305| div_loss: 0.96606| %_mask_idx: 0.42967| ppl: 21.72155| %_neg_is_pos: 0.15351| lr: 0.0| temp: 1.9524 | loss: 1.063| constrast_loss: 4.15573| div_loss: 0.9626| %_mask_idx: 0.43907| ppl: 23.93693| %_neg_is_pos: 0.13662| lr: 0.0| temp: 1.95239 | loss: 1.03591| constrast_loss: 4.04714| div_loss: 0.96491| %_mask_idx: 0.41714| ppl: 22.45753| %_neg_is_pos: 0.16819| lr: 0.0| temp: 1.95239 | loss: 1.08427| constrast_loss: 4.24134| div_loss: 0.95736| %_mask_idx: 0.3714| ppl: 27.28885| %_neg_is_pos: 0.14664| lr: 0.0| temp: 1.95238 | loss: 1.07803| constrast_loss: 4.21647| div_loss: 0.95636| %_mask_idx: 0.41385| ppl: 27.93211| %_neg_is_pos: 0.12167| lr: 0.0| temp: 1.95238 | loss: 1.08184| constrast_loss: 4.23182| div_loss: 0.95529| %_mask_idx: 0.40257| ppl: 28.61341| %_neg_is_pos: 0.11139| lr: 0.0| temp: 1.95236 | loss: 1.05467| constrast_loss: 4.12263| div_loss: 0.96068| %_mask_idx: 0.39207| ppl: 25.16663| %_neg_is_pos: 0.13165| lr: 0.0| temp: 1.95236 | loss: 1.05294| constrast_loss: 4.11599| div_loss: 0.95766| %_mask_idx: 0.33631| ppl: 27.09834| %_neg_is_pos: 0.13696| lr: 0.0| temp: 1.95235 | loss: 1.07559| constrast_loss: 4.20761| div_loss: 0.9477| %_mask_idx: 0.362| ppl: 33.47101| %_neg_is_pos: 0.12615| lr: 0.0| temp: 1.95235 | loss: 1.09263| constrast_loss: 4.27567| div_loss: 0.94869| %_mask_idx: 0.40539| ppl: 32.83943| %_neg_is_pos: 0.09927| lr: 0.0| temp: 1.95233 | loss: 1.0748| constrast_loss: 4.20361| div_loss: 0.95587| %_mask_idx: 0.4093| ppl: 28.244| %_neg_is_pos: 0.12275| lr: 0.0| temp: 1.95233 | loss: 1.07186| constrast_loss: 4.19214| div_loss: 0.95295| %_mask_idx: 0.40398| ppl: 30.10969| %_neg_is_pos: 0.12612| lr: 0.0| temp: 1.95232 | loss: 1.07545| constrast_loss: 4.20636| div_loss: 0.95446| %_mask_idx: 0.42575| ppl: 29.1455| %_neg_is_pos: 0.12955| lr: 0.0| temp: 1.95232 | loss: 1.10479| constrast_loss: 4.32422| div_loss: 0.94935| %_mask_idx: 0.3938| ppl: 32.41866| %_neg_is_pos: 0.10047| lr: 0.0| temp: 1.95231 | loss: 1.06044| constrast_loss: 4.14614| div_loss: 0.95604| %_mask_idx: 0.38659| ppl: 28.13373| %_neg_is_pos: 0.12515| lr: 0.0| temp: 1.95231 | loss: 1.07934| constrast_loss: 4.22192| div_loss: 0.95437| %_mask_idx: 0.39286| ppl: 29.20235| %_neg_is_pos: 0.11692| lr: 0.0| temp: 1.9523 | loss: 1.08415| constrast_loss: 4.24119| div_loss: 0.95405| %_mask_idx: 0.41729| ppl: 29.40866| %_neg_is_pos: 0.11296| lr: 0.0| temp: 1.9523 | loss: 1.051| constrast_loss: 4.10785| div_loss: 0.96139| %_mask_idx: 0.38205| ppl: 24.71113| %_neg_is_pos: 0.16543| lr: 0.0| temp: 1.95228 | loss: 1.10217| constrast_loss: 4.31374| div_loss: 0.94959| %_mask_idx: 0.39395| ppl: 32.26482| %_neg_is_pos: 0.11126| lr: 0.0| temp: 1.95228 | loss: 1.06622| constrast_loss: 4.16928| div_loss: 0.95595| %_mask_idx: 0.4068| ppl: 28.18898| %_neg_is_pos: 0.12598| lr: 0.0| temp: 1.95227 | loss: 1.0813| constrast_loss: 4.22992| div_loss: 0.95265| %_mask_idx: 0.38142| ppl: 30.30373| %_neg_is_pos: 0.12192| lr: 0.0| temp: 1.95227 | loss: 1.07356| constrast_loss: 4.19828| div_loss: 0.95963| %_mask_idx: 0.40774| ppl: 25.83467| %_neg_is_pos: 0.13269| lr: 0.0| temp: 1.95226 | loss: 1.07898| constrast_loss: 4.22097| div_loss: 0.9496| %_mask_idx: 0.37813| ppl: 32.25634| %_neg_is_pos: 0.11449| lr: 0.0| temp: 1.95226 | loss: 1.06343| constrast_loss: 4.1581| div_loss: 0.95618| %_mask_idx: 0.42027| ppl: 28.04786| %_neg_is_pos: 0.12178| lr: 0.0| temp: 1.95225 | loss: 1.07374| constrast_loss: 4.19957| div_loss: 0.95381| %_mask_idx: 0.40695| ppl: 29.55862| %_neg_is_pos: 0.11298| lr: 0.0| temp: 1.95225 | loss: 1.0773| constrast_loss: 4.21415| div_loss: 0.95058| %_mask_idx: 0.39301| ppl: 31.62984| %_neg_is_pos: 0.10191| lr: 0.0| temp: 1.95223 | loss: 1.03937| constrast_loss: 4.06168| div_loss: 0.95815| %_mask_idx: 0.37704| ppl: 26.78447| %_neg_is_pos: 0.13439| lr: 0.0| temp: 1.95223 | loss: 1.08115| constrast_loss: 4.22892| div_loss: 0.95694| %_mask_idx: 0.37704| ppl: 27.55928| %_neg_is_pos: 0.12559| lr: 0.0| temp: 1.95222 | loss: 1.07618| constrast_loss: 4.20897| div_loss: 0.95765| %_mask_idx: 0.42794| ppl: 27.10604| %_neg_is_pos: 0.13722| lr: 0.0| temp: 1.95222 | loss: 1.06089| constrast_loss: 4.14771| div_loss: 0.9584| %_mask_idx: 0.40492| ppl: 26.62426| %_neg_is_pos: 0.12851| lr: 0.0| temp: 1.95221 | loss: 1.08142| constrast_loss: 4.23017| div_loss: 0.95526| %_mask_idx: 0.40476| ppl: 28.63093| %_neg_is_pos: 0.1194| lr: 0.0| temp: 1.95221 | loss: 1.08876| constrast_loss: 4.26029| div_loss: 0.94736| %_mask_idx: 0.34853| ppl: 33.68691| %_neg_is_pos: 0.11104| lr: 0.0| temp: 1.9522 | loss: 1.05981| constrast_loss: 4.14343| div_loss: 0.95807| %_mask_idx: 0.38988| ppl: 26.83725| %_neg_is_pos: 0.14867| lr: 0.0| temp: 1.9522 | loss: 1.07902| constrast_loss: 4.22116| div_loss: 0.94937| %_mask_idx: 0.36717| ppl: 32.40224| %_neg_is_pos: 0.11044| lr: 0.0| temp: 1.95218 | loss: 1.05216| constrast_loss: 4.11297| div_loss: 0.95686| %_mask_idx: 0.37375| ppl: 27.61172| %_neg_is_pos: 0.13951| lr: 0.0| temp: 1.95218 | loss: 1.08218| constrast_loss: 4.23351| div_loss: 0.95214| %_mask_idx: 0.36482| ppl: 30.63231| %_neg_is_pos: 0.1086| lr: 0.0| temp: 1.95217 | loss: 1.08807| constrast_loss: 4.25706| div_loss: 0.95228| %_mask_idx: 0.41244| ppl: 30.5415| %_neg_is_pos: 0.08759| lr: 0.0| temp: 1.95217 | loss: 1.09089| constrast_loss: 4.26796| div_loss: 0.9559| %_mask_idx: 0.41557| ppl: 28.22523| %_neg_is_pos: 0.11328| lr: 0.0| temp: 1.95215 | loss: 1.07852| constrast_loss: 4.21889| div_loss: 0.952| %_mask_idx: 0.39207| ppl: 30.72194| %_neg_is_pos: 0.12648| lr: 0.0| temp: 1.95215 | loss: 1.07453| constrast_loss: 4.2033| div_loss: 0.94809| %_mask_idx: 0.3891| ppl: 33.21933| %_neg_is_pos: 0.11719| lr: 0.0| temp: 1.95214 | loss: 1.10052| constrast_loss: 4.30758| div_loss: 0.94487| %_mask_idx: 0.40539| ppl: 35.28458| %_neg_is_pos: 0.09374| lr: 0.0| temp: 1.95214 | loss: 1.08855| constrast_loss: 4.25914| div_loss: 0.95063| %_mask_idx: 0.36842| ppl: 31.59443| %_neg_is_pos: 0.1058| lr: 0.0| temp: 1.95213 | loss: 1.10629| constrast_loss: 4.3303| div_loss: 0.94867| %_mask_idx: 0.3797| ppl: 32.85307| %_neg_is_pos: 0.10815| lr: 0.0| temp: 1.95213 | loss: 1.061| constrast_loss: 4.14848| div_loss: 0.95504| %_mask_idx: 0.37359| ppl: 28.77659| %_neg_is_pos: 0.13008| lr: 0.0| temp: 1.95212 | loss: 1.07549| constrast_loss: 4.20693| div_loss: 0.95013| %_mask_idx: 0.41761| ppl: 31.91793| %_neg_is_pos: 0.11386| lr: 0.0| temp: 1.95212 | loss: 1.11021| constrast_loss: 4.34578| div_loss: 0.95069| %_mask_idx: 0.38659| ppl: 31.55529| %_neg_is_pos: 0.11583| lr: 0.0| temp: 1.9521 | loss: 1.0728| constrast_loss: 4.19609| div_loss: 0.95112| %_mask_idx: 0.38737| ppl: 31.28432| %_neg_is_pos: 0.10876| lr: 0.0| temp: 1.9521 | loss: 1.09026| constrast_loss: 4.26584| div_loss: 0.95208| %_mask_idx: 0.40993| ppl: 30.66702| %_neg_is_pos: 0.11079| lr: 0.0| temp: 1.95209 | loss: 1.04158| constrast_loss: 4.07038| div_loss: 0.95925| %_mask_idx: 0.38549| ppl: 26.07758| %_neg_is_pos: 0.12988| lr: 0.0| temp: 1.95209 | loss: 1.0856| constrast_loss: 4.24745| div_loss: 0.94971| %_mask_idx: 0.34477| ppl: 32.18849| %_neg_is_pos: 0.11427| lr: 0.0| temp: 1.95208 | loss: 1.06416| constrast_loss: 4.16109| div_loss: 0.95562| %_mask_idx: 0.4292| ppl: 28.40601| %_neg_is_pos: 0.12801| lr: 0.0| temp: 1.95208 | loss: 1.10456| constrast_loss: 4.32375| div_loss: 0.94481| %_mask_idx: 0.37892| ppl: 35.32133| %_neg_is_pos: 0.10736| lr: 0.0| temp: 1.95207 | loss: 1.09043| constrast_loss: 4.2666| div_loss: 0.95116| %_mask_idx: 0.4245| ppl: 31.2574| %_neg_is_pos: 0.11526| lr: 0.0| temp: 1.95207 | loss: 1.08193| constrast_loss: 4.23237| div_loss: 0.95352| %_mask_idx: 0.40085| ppl: 29.74809| %_neg_is_pos: 0.11873| lr: 0.0| temp: 1.95205 | loss: 1.07511| constrast_loss: 4.20525| div_loss: 0.95195| %_mask_idx: 0.35417| ppl: 30.75326| %_neg_is_pos: 0.12809| lr: 0.0| temp: 1.95205 | loss: 1.07822| constrast_loss: 4.21772| div_loss: 0.95177| %_mask_idx: 0.33803| ppl: 30.866| %_neg_is_pos: 0.11652| lr: 0.0| temp: 1.95204 | loss: 1.09032| constrast_loss: 4.26586| div_loss: 0.95398| %_mask_idx: 0.40429| ppl: 29.45045| %_neg_is_pos: 0.10697| lr: 0.0| temp: 1.95204 | loss: 1.08479| constrast_loss: 4.24358| div_loss: 0.95579| %_mask_idx: 0.38456| ppl: 28.29337| %_neg_is_pos: 0.11733| lr: 0.0| temp: 1.95203 | loss: 1.08326| constrast_loss: 4.23744| div_loss: 0.95613| %_mask_idx: 0.38064| ppl: 28.07868| %_neg_is_pos: 0.11272| lr: 0.0| temp: 1.95203 | loss: 1.08344| constrast_loss: 4.23859| div_loss: 0.95185| %_mask_idx: 0.39646| ppl: 30.8174| %_neg_is_pos: 0.11757| lr: 0.0| temp: 1.95202 | loss: 1.0688| constrast_loss: 4.1797| div_loss: 0.95479| %_mask_idx: 0.40273| ppl: 28.93268| %_neg_is_pos: 0.12106| lr: 0.0| temp: 1.95202 | loss: 1.09568| constrast_loss: 4.2883| div_loss: 0.9441| %_mask_idx: 0.38863| ppl: 35.77723| %_neg_is_pos: 0.09367| lr: 0.0| temp: 1.95201 | loss: 1.08| constrast_loss: 4.22461| div_loss: 0.95394| %_mask_idx: 0.38252| ppl: 29.47672| %_neg_is_pos: 0.11686| lr: 0.0| temp: 1.95201 | loss: 1.06137| constrast_loss: 4.14982| div_loss: 0.95673| %_mask_idx: 0.39301| ppl: 27.69372| %_neg_is_pos: 0.11681| lr: 0.0| temp: 1.952 | loss: 1.0873| constrast_loss: 4.25361| div_loss: 0.95604| %_mask_idx: 0.36028| ppl: 28.13242| %_neg_is_pos: 0.12662| lr: 0.0| temp: 1.952 | loss: 1.0762| constrast_loss: 4.2095| div_loss: 0.95307| %_mask_idx: 0.43531| ppl: 30.03482| %_neg_is_pos: 0.1259| lr: 0.0| temp: 1.95198 | loss: 1.09163| constrast_loss: 4.27129| div_loss: 0.95214| %_mask_idx: 0.36811| ppl: 30.63104| %_neg_is_pos: 0.1081| lr: 0.0| temp: 1.95198 | loss: 1.06197| constrast_loss: 4.15221| div_loss: 0.95691| %_mask_idx: 0.31093| ppl: 27.57521| %_neg_is_pos: 0.12922| lr: 0.0| temp: 1.95197 | loss: 1.09363| constrast_loss: 4.27977| div_loss: 0.9476| %_mask_idx: 0.39834| ppl: 33.53481| %_neg_is_pos: 0.10374| lr: 0.0| temp: 1.95197 | loss: 1.08135| constrast_loss: 4.22983| div_loss: 0.95559| %_mask_idx: 0.38769| ppl: 28.42417| %_neg_is_pos: 0.13049| lr: 0.0| temp: 1.95196 | loss: 1.07592| constrast_loss: 4.20803| div_loss: 0.9565| %_mask_idx: 0.39552| ppl: 27.83705| %_neg_is_pos: 0.12184| lr: 0.0| temp: 1.95196 | loss: 1.07868| constrast_loss: 4.21938| div_loss: 0.95346| %_mask_idx: 0.37124| ppl: 29.78566| %_neg_is_pos: 0.11938| lr: 0.0| temp: 1.95195 | loss: 1.04994| constrast_loss: 4.10419| div_loss: 0.95553| %_mask_idx: 0.32801| ppl: 28.4594| %_neg_is_pos: 0.14127| lr: 0.0| temp: 1.95195 | loss: 1.08076| constrast_loss: 4.22744| div_loss: 0.95589| %_mask_idx: 0.41056| ppl: 28.23136| %_neg_is_pos: 0.12208| lr: 0.0| temp: 1.95193 | loss: 1.09199| constrast_loss: 4.2728| div_loss: 0.95142| %_mask_idx: 0.42262| ppl: 31.09366| %_neg_is_pos: 0.10872| lr: 0.0| temp: 1.95193 | loss: 1.07595| constrast_loss: 4.20846| div_loss: 0.95326| %_mask_idx: 0.41024| ppl: 29.91449| %_neg_is_pos: 0.12263| lr: 0.0| temp: 1.95192 | loss: 1.08844| constrast_loss: 4.25834| div_loss: 0.954| %_mask_idx: 0.39521| ppl: 29.44133| %_neg_is_pos: 0.11782| lr: 0.0| temp: 1.95192 | loss: 1.08045| constrast_loss: 4.22611| div_loss: 0.95694| %_mask_idx: 0.43296| ppl: 27.55638| %_neg_is_pos: 0.12256| lr: 0.0| temp: 1.95191 | loss: 1.06531| constrast_loss: 4.16605| div_loss: 0.95181| %_mask_idx: 0.35871| ppl: 30.8426| %_neg_is_pos: 0.12437| lr: 0.0| temp: 1.95191 | loss: 1.08739| constrast_loss: 4.25447| div_loss: 0.95091| %_mask_idx: 0.41134| ppl: 31.41965| %_neg_is_pos: 0.12667| lr: 0.0| temp: 1.9519 | loss: 1.06357| constrast_loss: 4.15839| div_loss: 0.95895| %_mask_idx: 0.4162| ppl: 26.26958| %_neg_is_pos: 0.12553| lr: 0.0| temp: 1.9519 | loss: 1.06385| constrast_loss: 4.16018| div_loss: 0.95226| %_mask_idx: 0.34508| ppl: 30.55104| %_neg_is_pos: 0.14092| lr: 0.0| temp: 1.95188 | loss: 1.07312| constrast_loss: 4.19742| div_loss: 0.95065| %_mask_idx: 0.35307| ppl: 31.58624| %_neg_is_pos: 0.10717| lr: 0.0| temp: 1.95188 | loss: 1.10216| constrast_loss: 4.31416| div_loss: 0.9446| %_mask_idx: 0.41541| ppl: 35.45889| %_neg_is_pos: 0.08998| lr: 0.0| temp: 1.95187 | loss: 1.11056| constrast_loss: 4.34723| div_loss: 0.95018| %_mask_idx: 0.37625| ppl: 31.88627| %_neg_is_pos: 0.11378| lr: 0.0| temp: 1.95187 | loss: 1.09436| constrast_loss: 4.28291| div_loss: 0.94531| %_mask_idx: 0.38612| ppl: 35.00117| %_neg_is_pos: 0.10994| lr: 0.0| temp: 1.95186 | loss: 1.07331| constrast_loss: 4.19787| div_loss: 0.95356| %_mask_idx: 0.40617| ppl: 29.72472| %_neg_is_pos: 0.11809| lr: 0.0| temp: 1.95186 | loss: 1.08483| constrast_loss: 4.24407| div_loss: 0.95271| %_mask_idx: 0.37234| ppl: 30.26277| %_neg_is_pos: 0.13242| lr: 0.0| temp: 1.95185 | loss: 1.09051| constrast_loss: 4.2665| div_loss: 0.95515| %_mask_idx: 0.38017| ppl: 28.7012| %_neg_is_pos: 0.11357| lr: 0.0| temp: 1.95185 | loss: 1.07933| constrast_loss: 4.22259| div_loss: 0.9472| %_mask_idx: 0.388| ppl: 33.79323| %_neg_is_pos: 0.11482| lr: 0.0| temp: 1.95183 | loss: 1.04251| constrast_loss: 4.0738| div_loss: 0.96228| %_mask_idx: 0.36043| ppl: 24.14233| %_neg_is_pos: 0.14159| lr: 0.0| temp: 1.95183 | loss: 1.08528| constrast_loss: 4.24591| div_loss: 0.95218| %_mask_idx: 0.37923| ppl: 30.60761| %_neg_is_pos: 0.11247| lr: 0.0| temp: 1.95182 | loss: 1.05605| constrast_loss: 4.12913| div_loss: 0.95075| %_mask_idx: 0.34821| ppl: 31.52205| %_neg_is_pos: 0.12775| lr: 0.0| temp: 1.95182 [2021-09-02 07:43:25,470] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 07:43:25,470] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.09393| constrast_loss: 4.28069| div_loss: 0.95023| %_mask_idx: 0.34445| ppl: 31.85434| %_neg_is_pos: 0.10773| lr: 0.0| temp: 1.9518 | loss: 1.06434| constrast_loss: 4.16175| div_loss: 0.95613| %_mask_idx: 0.38972| ppl: 28.07411| %_neg_is_pos: 0.13505| lr: 0.0| temp: 1.9518 | loss: 1.10014| constrast_loss: 4.30515| div_loss: 0.954| %_mask_idx: 0.38753| ppl: 29.43952| %_neg_is_pos: 0.10518| lr: 0.0| temp: 1.95179 | loss: 1.09307| constrast_loss: 4.27684| div_loss: 0.95425| %_mask_idx: 0.40304| ppl: 29.27877| %_neg_is_pos: 0.12986| lr: 0.0| temp: 1.95179 | loss: 1.09057| constrast_loss: 4.26729| div_loss: 0.95004| %_mask_idx: 0.4057| ppl: 31.97296| %_neg_is_pos: 0.11327| lr: 0.0| temp: 1.95178 | loss: 1.0773| constrast_loss: 4.21392| div_loss: 0.95287| %_mask_idx: 0.43985| ppl: 30.1641| %_neg_is_pos: 0.11798| lr: 0.0| temp: 1.95178 | loss: 1.09384| constrast_loss: 4.28056| div_loss: 0.94808| %_mask_idx: 0.40883| ppl: 33.22989| %_neg_is_pos: 0.09765| lr: 0.0| temp: 1.95177 | loss: 1.08911| constrast_loss: 4.26163| div_loss: 0.94789| %_mask_idx: 0.42278| ppl: 33.34883| %_neg_is_pos: 0.09576| lr: 0.0| temp: 1.95177 | loss: 1.09864| constrast_loss: 4.30004| div_loss: 0.94536| %_mask_idx: 0.41244| ppl: 34.9674| %_neg_is_pos: 0.08328| lr: 0.0| temp: 1.95175 | loss: 1.07613| constrast_loss: 4.20921| div_loss: 0.95293| %_mask_idx: 0.39192| ppl: 30.12236| %_neg_is_pos: 0.10182| lr: 0.0| temp: 1.95175 | loss: 1.11248| constrast_loss: 4.35562| div_loss: 0.943| %_mask_idx: 0.34508| ppl: 36.47697| %_neg_is_pos: 0.08411| lr: 0.0| temp: 1.95174 | loss: 1.11972| constrast_loss: 4.38464| div_loss: 0.94232| %_mask_idx: 0.42058| ppl: 36.91313| %_neg_is_pos: 0.06924| lr: 0.0| temp: 1.95174 | loss: 1.10609| constrast_loss: 4.32981| div_loss: 0.94544| %_mask_idx: 0.37061| ppl: 34.9193| %_neg_is_pos: 0.07723| lr: 0.0| temp: 1.95173 | loss: 1.08443| constrast_loss: 4.24287| div_loss: 0.9484| %_mask_idx: 0.41463| ppl: 33.02163| %_neg_is_pos: 0.08525| lr: 0.0| temp: 1.95173 | loss: 1.11172| constrast_loss: 4.35238| div_loss: 0.94511| %_mask_idx: 0.35949| ppl: 35.12699| %_neg_is_pos: 0.08904| lr: 0.0| temp: 1.95172 | loss: 1.09177| constrast_loss: 4.27253| div_loss: 0.94531| %_mask_idx: 0.36889| ppl: 34.99953| %_neg_is_pos: 0.10117| lr: 0.0| temp: 1.95172 | loss: 1.11281| constrast_loss: 4.35721| div_loss: 0.94053| %_mask_idx: 0.3739| ppl: 38.06326| %_neg_is_pos: 0.07395| lr: 0.0| temp: 1.9517 | loss: 1.10279| constrast_loss: 4.31684| div_loss: 0.94315| %_mask_idx: 0.3609| ppl: 36.38258| %_neg_is_pos: 0.10045| lr: 0.0| temp: 1.9517 | loss: 1.10217| constrast_loss: 4.31494| div_loss: 0.93721| %_mask_idx: 0.36889| ppl: 40.18486| %_neg_is_pos: 0.06616| lr: 0.0| temp: 1.95169 | loss: 1.10369| constrast_loss: 4.32103| div_loss: 0.9371| %_mask_idx: 0.40555| ppl: 40.25801| %_neg_is_pos: 0.06037| lr: 0.0| temp: 1.95169 | loss: 1.10774| constrast_loss: 4.33723| div_loss: 0.93707| %_mask_idx: 0.37876| ppl: 40.27262| %_neg_is_pos: 0.066| lr: 0.0| temp: 1.95168 | loss: 1.10624| constrast_loss: 4.33084| div_loss: 0.941| %_mask_idx: 0.38048| ppl: 37.76119| %_neg_is_pos: 0.08315| lr: 0.0| temp: 1.95168 | loss: 1.09637| constrast_loss: 4.29151| div_loss: 0.93976| %_mask_idx: 0.37719| ppl: 38.55236| %_neg_is_pos: 0.10198| lr: 0.0| temp: 1.95167 | loss: 1.10841| constrast_loss: 4.33936| div_loss: 0.94287| %_mask_idx: 0.36451| ppl: 36.56437| %_neg_is_pos: 0.07401| lr: 0.0| temp: 1.95167 | loss: 1.09298| constrast_loss: 4.27793| div_loss: 0.93985| %_mask_idx: 0.35949| ppl: 38.49686| %_neg_is_pos: 0.08528| lr: 0.0| temp: 1.95165| loss: 1.0921| constrast_loss: 4.27441| div_loss: 0.93991| %_mask_idx: 0.36873| ppl: 38.4567| %_neg_is_pos: 0.07876| lr: 0.0| temp: 1.95165 | loss: 1.09335| constrast_loss: 4.27938| div_loss: 0.9401| %_mask_idx: 0.33866| ppl: 38.33803| %_neg_is_pos: 0.07834| lr: 0.0| temp: 1.95164 | loss: 1.09325| constrast_loss: 4.27898| div_loss: 0.94023| %_mask_idx: 0.41964| ppl: 38.2508| %_neg_is_pos: 0.07054| lr: 0.0| temp: 1.95164 | loss: 1.09207| constrast_loss: 4.2741| div_loss: 0.9419| %_mask_idx: 0.37375| ppl: 37.18455| %_neg_is_pos: 0.08338| lr: 0.0| temp: 1.95162 | loss: 1.10096| constrast_loss: 4.30926| div_loss: 0.94575| %_mask_idx: 0.35683| ppl: 34.71786| %_neg_is_pos: 0.09402| lr: 0.0| temp: 1.95162 | loss: 1.09569| constrast_loss: 4.28872| div_loss: 0.94033| %_mask_idx: 0.41729| ppl: 38.19147| %_neg_is_pos: 0.06375| lr: 0.0| temp: 1.95161 | loss: 1.09368| constrast_loss: 4.28044| div_loss: 0.943| %_mask_idx: 0.38017| ppl: 36.48289| %_neg_is_pos: 0.08387| lr: 0.0| temp: 1.95161 | loss: 1.09747| constrast_loss: 4.29606| div_loss: 0.9383| %_mask_idx: 0.40852| ppl: 39.48965| %_neg_is_pos: 0.07072| lr: 0.0| temp: 1.9516 | loss: 1.09903| constrast_loss: 4.30161| div_loss: 0.9451| %_mask_idx: 0.38393| ppl: 35.13292| %_neg_is_pos: 0.08544| lr: 0.0| temp: 1.9516 | loss: 1.11168| constrast_loss: 4.35272| div_loss: 0.94017| %_mask_idx: 0.33615| ppl: 38.29192| %_neg_is_pos: 0.08326| lr: 0.0| temp: 1.9516 | loss: 1.10289| constrast_loss: 4.3177| div_loss: 0.93862| %_mask_idx: 0.35573| ppl: 39.28259| %_neg_is_pos: 0.07829| lr: 0.0| temp: 1.9516 | loss: 1.12002| constrast_loss: 4.38638| div_loss: 0.93691| %_mask_idx: 0.41698| ppl: 40.37561| %_neg_is_pos: 0.05826| lr: 0.0| temp: 1.95158 | loss: 1.11487| constrast_loss: 4.36542| div_loss: 0.94054| %_mask_idx: 0.38753| ppl: 38.05455| %_neg_is_pos: 0.07544| lr: 0.0| temp: 1.95158 | loss: 1.07888| constrast_loss: 4.22099| div_loss: 0.94536| %_mask_idx: 0.35949| ppl: 34.96734| %_neg_is_pos: 0.08165| lr: 0.0| temp: 1.95157 | loss: 1.10976| constrast_loss: 4.3445| div_loss: 0.94543| %_mask_idx: 0.38095| ppl: 34.92467| %_neg_is_pos: 0.07885| lr: 0.0| temp: 1.95157 | loss: 1.11184| constrast_loss: 4.35333| div_loss: 0.94015| %_mask_idx: 0.34586| ppl: 38.30152| %_neg_is_pos: 0.07813| lr: 0.0| temp: 1.95156 | loss: 1.11224| constrast_loss: 4.35553| div_loss: 0.93424| %_mask_idx: 0.37312| ppl: 42.08426| %_neg_is_pos: 0.05893| lr: 0.0| temp: 1.95156 | loss: 1.10931| constrast_loss: 4.34295| div_loss: 0.94282| %_mask_idx: 0.37907| ppl: 36.59768| %_neg_is_pos: 0.05094| lr: 0.0| temp: 1.95155 | loss: 1.09645| constrast_loss: 4.29147| div_loss: 0.94336| %_mask_idx: 0.37359| ppl: 36.25098| %_neg_is_pos: 0.07464| lr: 0.0| temp: 1.95155 | loss: 1.11846| constrast_loss: 4.3799| div_loss: 0.93935| %_mask_idx: 0.42638| ppl: 38.81458| %_neg_is_pos: 0.04994| lr: 0.0| temp: 1.95153 | loss: 1.11731| constrast_loss: 4.37535| div_loss: 0.93901| %_mask_idx: 0.43531| ppl: 39.03575| %_neg_is_pos: 0.0672| lr: 0.0| temp: 1.95153 | loss: 1.09254| constrast_loss: 4.27608| div_loss: 0.94083| %_mask_idx: 0.3833| ppl: 37.87152| %_neg_is_pos: 0.05326| lr: 0.0| temp: 1.95152 | loss: 1.09726| constrast_loss: 4.29485| div_loss: 0.94203| %_mask_idx: 0.35636| ppl: 37.1039| %_neg_is_pos: 0.08404| lr: 0.0| temp: 1.95152 | loss: 1.08812| constrast_loss: 4.25803| div_loss: 0.94432| %_mask_idx: 0.41604| ppl: 35.63631| %_neg_is_pos: 0.06448| lr: 0.0| temp: 1.95151 | loss: 1.08515| constrast_loss: 4.24651| div_loss: 0.94073| %_mask_idx: 0.3468| ppl: 37.93328| %_neg_is_pos: 0.08954| lr: 0.0| temp: 1.95151 | loss: 1.10638| constrast_loss: 4.33123| div_loss: 0.94278| %_mask_idx: 0.40508| ppl: 36.61769| %_neg_is_pos: 0.07023| lr: 0.0| temp: 1.9515 | loss: 1.10775| constrast_loss: 4.33718| div_loss: 0.93838| %_mask_idx: 0.37578| ppl: 39.43494| %_neg_is_pos: 0.07437| lr: 0.0| temp: 1.9515 | loss: 1.12596| constrast_loss: 4.4104| div_loss: 0.9345| %_mask_idx: 0.35934| ppl: 41.91683| %_neg_is_pos: 0.07542| lr: 0.0| temp: 1.95148 | loss: 1.09483| constrast_loss: 4.28526| div_loss: 0.94048| %_mask_idx: 0.41447| ppl: 38.09324| %_neg_is_pos: 0.06968| lr: 0.0| temp: 1.95148 | loss: 1.1131| constrast_loss: 4.35808| div_loss: 0.94322| %_mask_idx: 0.32926| ppl: 36.34198| %_neg_is_pos: 0.08785| lr: 0.0| temp: 1.95147 | loss: 1.10599| constrast_loss: 4.32984| div_loss: 0.94127| %_mask_idx: 0.40179| ppl: 37.58731| %_neg_is_pos: 0.08026| lr: 0.0| temp: 1.95147 | loss: 1.09994| constrast_loss: 4.3055| div_loss: 0.94264| %_mask_idx: 0.40398| ppl: 36.71119| %_neg_is_pos: 0.07165| lr: 0.0| temp: 1.95145 | loss: 1.08489| constrast_loss: 4.24516| div_loss: 0.94386| %_mask_idx: 0.36983| ppl: 35.9305| %_neg_is_pos: 0.08357| lr: 0.0| temp: 1.95145 | loss: 1.11011| constrast_loss: 4.34629| div_loss: 0.94129| %_mask_idx: 0.39333| ppl: 37.57621| %_neg_is_pos: 0.06554| lr: 0.0| temp: 1.95144 | loss: 1.09014| constrast_loss: 4.26603| div_loss: 0.94511| %_mask_idx: 0.4364| ppl: 35.12786| %_neg_is_pos: 0.0732| lr: 0.0| temp: 1.95144 | loss: 1.10881| constrast_loss: 4.34131| div_loss: 0.93938| %_mask_idx: 0.4151| ppl: 38.79858| %_neg_is_pos: 0.07363| lr: 0.0| temp: 1.95143 | loss: 1.10819| constrast_loss: 4.33877| div_loss: 0.93973| %_mask_idx: 0.41698| ppl: 38.57378| %_neg_is_pos: 0.05158| lr: 0.0| temp: 1.95143 | loss: 1.09667| constrast_loss: 4.29294| div_loss: 0.93753| %_mask_idx: 0.38064| ppl: 39.98287| %_neg_is_pos: 0.08317| lr: 0.0| temp: 1.95142 | loss: 1.11932| constrast_loss: 4.38343| div_loss: 0.93861| %_mask_idx: 0.38503| ppl: 39.29194| %_neg_is_pos: 0.07156| lr: 0.0| temp: 1.95142 | loss: 1.10064| constrast_loss: 4.30856| div_loss: 0.94012| %_mask_idx: 0.35667| ppl: 38.3218| %_neg_is_pos: 0.07201| lr: 0.0| temp: 1.9514 | loss: 1.1076| constrast_loss: 4.33667| div_loss: 0.93715| %_mask_idx: 0.30874| ppl: 40.22277| %_neg_is_pos: 0.06366| lr: 0.0| temp: 1.9514 | loss: 1.11277| constrast_loss: 4.35718| div_loss: 0.93905| %_mask_idx: 0.37343| ppl: 39.01096| %_neg_is_pos: 0.07581| lr: 0.0| temp: 1.95139 | loss: 1.08986| constrast_loss: 4.26543| div_loss: 0.94013| %_mask_idx: 0.37124| ppl: 38.31889| %_neg_is_pos: 0.07092| lr: 0.0| temp: 1.95139 | loss: 1.11053| constrast_loss: 4.34788| div_loss: 0.94242| %_mask_idx: 0.39348| ppl: 36.85425| %_neg_is_pos: 0.07035| lr: 0.0| temp: 1.95138 | loss: 1.10446| constrast_loss: 4.3236| div_loss: 0.94226| %_mask_idx: 0.35009| ppl: 36.95493| %_neg_is_pos: 0.08107| lr: 0.0| temp: 1.95138 | loss: 1.08157| constrast_loss: 4.23185| div_loss: 0.94436| %_mask_idx: 0.42888| ppl: 35.60757| %_neg_is_pos: 0.10466| lr: 0.0| temp: 1.95137 | loss: 1.10671| constrast_loss: 4.33314| div_loss: 0.93687| %_mask_idx: 0.36654| ppl: 40.40533| %_neg_is_pos: 0.0674| lr: 0.0| temp: 1.95137 | loss: 1.10649| constrast_loss: 4.33152| div_loss: 0.94445| %_mask_idx: 0.42857| ppl: 35.54891| %_neg_is_pos: 0.07724| lr: 0.0| temp: 1.95135 | loss: 1.09778| constrast_loss: 4.2968| div_loss: 0.94327| %_mask_idx: 0.40977| ppl: 36.30821| %_neg_is_pos: 0.07605| lr: 0.0| temp: 1.95135 | loss: 1.10753| constrast_loss: 4.33632| div_loss: 0.93777| %_mask_idx: 0.40445| ppl: 39.82451| %_neg_is_pos: 0.07637| lr: 0.0| temp: 1.95134 | loss: 1.10858| constrast_loss: 4.34037| div_loss: 0.93949| %_mask_idx: 0.40351| ppl: 38.72563| %_neg_is_pos: 0.06735| lr: 0.0| temp: 1.95134 | loss: 1.09069| constrast_loss: 4.26836| div_loss: 0.94406| %_mask_idx: 0.38409| ppl: 35.80252| %_neg_is_pos: 0.07246| lr: 0.0| temp: 1.95133 | loss: 1.13208| constrast_loss: 4.43435| div_loss: 0.93951| %_mask_idx: 0.38675| ppl: 38.71212| %_neg_is_pos: 0.06875| lr: 0.0| temp: 1.95133 | loss: 1.1047| constrast_loss: 4.32444| div_loss: 0.94358| %_mask_idx: 0.37234| ppl: 36.10933| %_neg_is_pos: 0.07839| lr: 0.0| temp: 1.95132 | loss: 1.11049| constrast_loss: 4.34778| div_loss: 0.94167| %_mask_idx: 0.40132| ppl: 37.33387| %_neg_is_pos: 0.05896| lr: 0.0| temp: 1.95132 | loss: 1.10571| constrast_loss: 4.32934| div_loss: 0.93511| %_mask_idx: 0.34665| ppl: 41.53088| %_neg_is_pos: 0.07369| lr: 0.0| temp: 1.9513 | loss: 1.12933| constrast_loss: 4.42338| div_loss: 0.93963| %_mask_idx: 0.40727| ppl: 38.63905| %_neg_is_pos: 0.06147| lr: 0.0| temp: 1.9513 | loss: 1.09969| constrast_loss: 4.30475| div_loss: 0.94025| %_mask_idx: 0.38565| ppl: 38.23875| %_neg_is_pos: 0.06723| lr: 0.0| temp: 1.95129 | loss: 1.10464| constrast_loss: 4.3245| div_loss: 0.94062| %_mask_idx: 0.42466| ppl: 38.00274| %_neg_is_pos: 0.06897| lr: 0.0| temp: 1.95129 | loss: 1.12278| constrast_loss: 4.39737| div_loss: 0.93746| %_mask_idx: 0.43374| ppl: 40.02829| %_neg_is_pos: 0.05035| lr: 0.0| temp: 1.95127 | loss: 1.08157| constrast_loss: 4.23205| div_loss: 0.94235| %_mask_idx: 0.37077| ppl: 36.89338| %_neg_is_pos: 0.10361| lr: 0.0| temp: 1.95127 | loss: 1.05082| constrast_loss: 4.1086| div_loss: 0.94686| %_mask_idx: 0.31297| ppl: 34.01031| %_neg_is_pos: 0.11586| lr: 0.0| temp: 1.95126 | loss: 1.10152| constrast_loss: 4.31187| div_loss: 0.94201| %_mask_idx: 0.43108| ppl: 37.11333| %_neg_is_pos: 0.06119| lr: 0.0| temp: 1.95126 | loss: 1.12251| constrast_loss: 4.39598| div_loss: 0.94073| %_mask_idx: 0.39489| ppl: 37.93523| %_neg_is_pos: 0.05831| lr: 0.0| temp: 1.95125 | loss: 1.07096| constrast_loss: 4.18928| div_loss: 0.94558| %_mask_idx: 0.35542| ppl: 34.82615| %_neg_is_pos: 0.10254| lr: 0.0| temp: 1.95125 | loss: 1.10942| constrast_loss: 4.34368| div_loss: 0.93981| %_mask_idx: 0.40711| ppl: 38.52136| %_neg_is_pos: 0.06567| lr: 0.0| temp: 1.95124 | loss: 1.12417| constrast_loss: 4.40319| div_loss: 0.93499| %_mask_idx: 0.40226| ppl: 41.60627| %_neg_is_pos: 0.05852| lr: 0.0| temp: 1.95124 | loss: 1.10086| constrast_loss: 4.309| div_loss: 0.94422| %_mask_idx: 0.41447| ppl: 35.69734| %_neg_is_pos: 0.08699| lr: 0.0| temp: 1.95122 | loss: 1.11247| constrast_loss: 4.35553| div_loss: 0.94338| %_mask_idx: 0.38158| ppl: 36.23886| %_neg_is_pos: 0.07283| lr: 0.0| temp: 1.95122 | loss: 1.09576| constrast_loss: 4.28836| div_loss: 0.94691| %_mask_idx: 0.41917| ppl: 33.97764| %_neg_is_pos: 0.06962| lr: 0.0| temp: 1.95121 | loss: 1.09364| constrast_loss: 4.28035| div_loss: 0.94222| %_mask_idx: 0.36936| ppl: 36.97998| %_neg_is_pos: 0.07039| lr: 0.0| temp: 1.95121 | loss: 1.10825| constrast_loss: 4.33926| div_loss: 0.93746| %_mask_idx: 0.41698| ppl: 40.02451| %_neg_is_pos: 0.0732| lr: 0.0| temp: 1.95121 | loss: 1.09836| constrast_loss: 4.29902| div_loss: 0.94439| %_mask_idx: 0.43766| ppl: 35.5914| %_neg_is_pos: 0.06937| lr: 0.0| temp: 1.95121 | loss: 1.07989| constrast_loss: 4.22514| div_loss: 0.94435| %_mask_idx: 0.33145| ppl: 35.61803| %_neg_is_pos: 0.09173| lr: 0.0| temp: 1.9512 | loss: 1.10211| constrast_loss: 4.31432| div_loss: 0.94132| %_mask_idx: 0.42857| ppl: 37.55602| %_neg_is_pos: 0.06228| lr: 0.0| temp: 1.9512 | loss: 1.08927| constrast_loss: 4.26274| div_loss: 0.94349| %_mask_idx: 0.46695| ppl: 36.16786| %_neg_is_pos: 0.05953| lr: 0.0| temp: 1.95118 | loss: 1.13299| constrast_loss: 4.43844| div_loss: 0.93536| %_mask_idx: 0.36779| ppl: 41.37092| %_neg_is_pos: 0.04392| lr: 0.0| temp: 1.95118 | loss: 1.10042| constrast_loss: 4.30778| div_loss: 0.93885| %_mask_idx: 0.40445| ppl: 39.13901| %_neg_is_pos: 0.04332| lr: 0.0| temp: 1.95117 | loss: 1.09946| constrast_loss: 4.3038| div_loss: 0.94052| %_mask_idx: 0.35307| ppl: 38.06824| %_neg_is_pos: 0.06913| lr: 0.0| temp: 1.95117 | loss: 1.10704| constrast_loss: 4.33405| div_loss: 0.94118| %_mask_idx: 0.37265| ppl: 37.64223| %_neg_is_pos: 0.06842| lr: 0.0| temp: 1.95116 | loss: 1.08738| constrast_loss: 4.25492| div_loss: 0.94604| %_mask_idx: 0.39771| ppl: 34.53624| %_neg_is_pos: 0.07449| lr: 0.0| temp: 1.95116 | loss: 1.11044| constrast_loss: 4.3474| div_loss: 0.94379| %_mask_idx: 0.36325| ppl: 35.97273| %_neg_is_pos: 0.06917| lr: 0.0| temp: 1.95115 | loss: 1.1035| constrast_loss: 4.32002| div_loss: 0.93971| %_mask_idx: 0.35605| ppl: 38.58403| %_neg_is_pos: 0.06536| lr: 0.0| temp: 1.95115 | loss: 1.10919| constrast_loss: 4.34238| div_loss: 0.94359| %_mask_idx: 0.45175| ppl: 36.10209| %_neg_is_pos: 0.05588| lr: 0.0| temp: 1.95113 | loss: 1.07938| constrast_loss: 4.22274| div_loss: 0.94789| %_mask_idx: 0.38424| ppl: 33.35181| %_neg_is_pos: 0.09822| lr: 0.0| temp: 1.95113 | loss: 1.09981| constrast_loss: 4.30526| div_loss: 0.93988| %_mask_idx: 0.40602| ppl: 38.47792| %_neg_is_pos: 0.07557| lr: 0.0| temp: 1.95112 | loss: 1.1167| constrast_loss: 4.37289| div_loss: 0.93904| %_mask_idx: 0.3656| ppl: 39.01392| %_neg_is_pos: 0.07254| lr: 0.0| temp: 1.95112 [2021-09-02 07:52:39,147] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 07:52:39,147] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.10024| constrast_loss: 4.30669| div_loss: 0.94269| %_mask_idx: 0.38957| ppl: 36.6764| %_neg_is_pos: 0.07201| lr: 0.0| temp: 1.9511 | loss: 1.10305| constrast_loss: 4.31775| div_loss: 0.94454| %_mask_idx: 0.38612| ppl: 35.49506| %_neg_is_pos: 0.0925| lr: 0.0| temp: 1.9511 | loss: 1.0789| constrast_loss: 4.22088| div_loss: 0.947| %_mask_idx: 0.38064| ppl: 33.92209| %_neg_is_pos: 0.11608| lr: 0.0| temp: 1.95109 | loss: 1.11285| constrast_loss: 4.35725| div_loss: 0.94145| %_mask_idx: 0.41917| ppl: 37.47367| %_neg_is_pos: 0.06545| lr: 0.0| temp: 1.95109 | loss: 1.09673| constrast_loss: 4.29291| div_loss: 0.9402| %_mask_idx: 0.39348| ppl: 38.27492| %_neg_is_pos: 0.07422| lr: 0.0| temp: 1.95108 | loss: 1.09107| constrast_loss: 4.26972| div_loss: 0.94572| %_mask_idx: 0.40648| ppl: 34.73808| %_neg_is_pos: 0.06766| lr: 0.0| temp: 1.95108 | loss: 1.09944| constrast_loss: 4.30343| div_loss: 0.94341| %_mask_idx: 0.35244| ppl: 36.21675| %_neg_is_pos: 0.09211| lr: 0.0| temp: 1.95107 | loss: 1.10978| constrast_loss: 4.34457| div_loss: 0.94548| %_mask_idx: 0.43296| ppl: 34.89092| %_neg_is_pos: 0.05341| lr: 0.0| temp: 1.95107 | loss: 1.104| constrast_loss: 4.32158| div_loss: 0.94419| %_mask_idx: 0.39442| ppl: 35.72144| %_neg_is_pos: 0.07102| lr: 0.0| temp: 1.95105 | loss: 1.08449| constrast_loss: 4.24297| div_loss: 0.94994| %_mask_idx: 0.35573| ppl: 32.0369| %_neg_is_pos: 0.08173| lr: 0.0| temp: 1.95105 | loss: 1.0787| constrast_loss: 4.21992| div_loss: 0.94878| %_mask_idx: 0.36059| ppl: 32.77889| %_neg_is_pos: 0.10067| lr: 0.0| temp: 1.95104 | loss: 1.09904| constrast_loss: 4.30155| div_loss: 0.94594| %_mask_idx: 0.34273| ppl: 34.60043| %_neg_is_pos: 0.08769| lr: 0.0| temp: 1.95104 | loss: 1.1086| constrast_loss: 4.33978| div_loss: 0.94635| %_mask_idx: 0.38424| ppl: 34.33736| %_neg_is_pos: 0.06887| lr: 0.0| temp: 1.95103 | loss: 1.1129| constrast_loss: 4.35684| div_loss: 0.9474| %_mask_idx: 0.38111| ppl: 33.66284| %_neg_is_pos: 0.07706| lr: 0.0| temp: 1.95103 | loss: 1.13263| constrast_loss: 4.43534| div_loss: 0.95185| %_mask_idx: 0.39145| ppl: 30.81341| %_neg_is_pos: 0.05572| lr: 0.0| temp: 1.95102 | loss: 1.11508| constrast_loss: 4.36476| div_loss: 0.95547| %_mask_idx: 0.41604| ppl: 28.49606| %_neg_is_pos: 0.07625| lr: 0.0| temp: 1.95102 | loss: 1.0972| constrast_loss: 4.29346| div_loss: 0.95335| %_mask_idx: 0.39897| ppl: 29.85857| %_neg_is_pos: 0.09198| lr: 0.0| temp: 1.951| loss: 1.10359| constrast_loss: 4.3192| div_loss: 0.95182| %_mask_idx: 0.40022| ppl: 30.83347| %_neg_is_pos: 0.07883| lr: 0.0| temp: 1.951 | loss: 1.09923| constrast_loss: 4.30079| div_loss: 0.9615| %_mask_idx: 0.37265| ppl: 24.63749| %_neg_is_pos: 0.10046| lr: 0.0| temp: 1.95099 | loss: 1.07775| constrast_loss: 4.21476| div_loss: 0.96232| %_mask_idx: 0.39333| ppl: 24.11725| %_neg_is_pos: 0.11669| lr: 0.0| temp: 1.95099 | loss: 1.09082| constrast_loss: 4.26697| div_loss: 0.96303| %_mask_idx: 0.40977| ppl: 23.66199| %_neg_is_pos: 0.09744| lr: 0.0| temp: 1.95098 | loss: 1.04375| constrast_loss: 4.07845| div_loss: 0.96542| %_mask_idx: 0.33897| ppl: 22.13367| %_neg_is_pos: 0.12888| lr: 0.0| temp: 1.95098 | loss: 1.09696| constrast_loss: 4.29174| div_loss: 0.96084| %_mask_idx: 0.35887| ppl: 25.06341| %_neg_is_pos: 0.09943| lr: 0.0| temp: 1.95097 | loss: 1.08761| constrast_loss: 4.25423| div_loss: 0.96213| %_mask_idx: 0.42622| ppl: 24.2338| %_neg_is_pos: 0.10914| lr: 0.0| temp: 1.95097 | loss: 1.08271| constrast_loss: 4.23451| div_loss: 0.96319| %_mask_idx: 0.41557| ppl: 23.55732| %_neg_is_pos: 0.10215| lr: 0.0| temp: 1.95095 | loss: 1.07693| constrast_loss: 4.21127| div_loss: 0.96469| %_mask_idx: 0.39912| ppl: 22.59685| %_neg_is_pos: 0.1109| lr: 0.0| temp: 1.95095 | loss: 1.06594| constrast_loss: 4.16674| div_loss: 0.96999| %_mask_idx: 0.40476| ppl: 19.2087| %_neg_is_pos: 0.15432| lr: 0.0| temp: 1.95094 | loss: 1.06904| constrast_loss: 4.17907| div_loss: 0.97091| %_mask_idx: 0.4364| ppl: 18.6145| %_neg_is_pos: 0.13537| lr: 0.0| temp: 1.95094 | loss: 1.04034| constrast_loss: 4.06425| div_loss: 0.97107| %_mask_idx: 0.38988| ppl: 18.51314| %_neg_is_pos: 0.1553| lr: 0.0| temp: 1.95092 | loss: 1.02241| constrast_loss: 3.99249| div_loss: 0.97154| %_mask_idx: 0.37954| ppl: 18.21458| %_neg_is_pos: 0.14621| lr: 0.0| temp: 1.95092 | loss: 1.02658| constrast_loss: 4.00877| div_loss: 0.97535| %_mask_idx: 0.39771| ppl: 15.77646| %_neg_is_pos: 0.18736| lr: 0.0| temp: 1.95091 | loss: 1.05258| constrast_loss: 4.11297| div_loss: 0.97352| %_mask_idx: 0.37892| ppl: 16.94908| %_neg_is_pos: 0.16155| lr: 0.0| temp: 1.95091 | loss: 1.03843| constrast_loss: 4.05655| div_loss: 0.97181| %_mask_idx: 0.43609| ppl: 18.04243| %_neg_is_pos: 0.16705| lr: 0.0| temp: 1.9509 | loss: 1.05903| constrast_loss: 4.13915| div_loss: 0.96966| %_mask_idx: 0.39724| ppl: 19.41516| %_neg_is_pos: 0.14758| lr: 0.0| temp: 1.9509 | loss: 1.06378| constrast_loss: 4.15833| div_loss: 0.96787| %_mask_idx: 0.4093| ppl: 20.56138| %_neg_is_pos: 0.13836| lr: 0.0| temp: 1.95089 | loss: 1.06267| constrast_loss: 4.15379| div_loss: 0.969| %_mask_idx: 0.35981| ppl: 19.83998| %_neg_is_pos: 0.13928| lr: 0.0| temp: 1.95089 | loss: 1.03208| constrast_loss: 4.03113| div_loss: 0.9719| %_mask_idx: 0.39004| ppl: 17.98233| %_neg_is_pos: 0.17123| lr: 0.0| temp: 1.95087 | loss: 1.03861| constrast_loss: 4.05708| div_loss: 0.97375| %_mask_idx: 0.40836| ppl: 16.80062| %_neg_is_pos: 0.16441| lr: 0.0| temp: 1.95087 | loss: 1.05012| constrast_loss: 4.1031| div_loss: 0.97362| %_mask_idx: 0.37641| ppl: 16.88351| %_neg_is_pos: 0.17019| lr: 0.0| temp: 1.95086 | loss: 1.03591| constrast_loss: 4.04615| div_loss: 0.97484| %_mask_idx: 0.42199| ppl: 16.10138| %_neg_is_pos: 0.16891| lr: 0.0| temp: 1.95086 | loss: 1.04684| constrast_loss: 4.09028| div_loss: 0.97073| %_mask_idx: 0.39865| ppl: 18.73527| %_neg_is_pos: 0.1495| lr: 0.0| temp: 1.95085 | loss: 1.03224| constrast_loss: 4.03151| div_loss: 0.97465| %_mask_idx: 0.41244| ppl: 16.22152| %_neg_is_pos: 0.16655| lr: 0.0| temp: 1.95085 | loss: 1.04377| constrast_loss: 4.07786| div_loss: 0.97206| %_mask_idx: 0.3739| ppl: 17.88033| %_neg_is_pos: 0.14303| lr: 0.0| temp: 1.95084 | loss: 1.06343| constrast_loss: 4.15644| div_loss: 0.97267| %_mask_idx: 0.39192| ppl: 17.48996| %_neg_is_pos: 0.16857| lr: 0.0| temp: 1.95084 | loss: 1.05964| constrast_loss: 4.14136| div_loss: 0.97185| %_mask_idx: 0.37939| ppl: 18.01767| %_neg_is_pos: 0.14677| lr: 0.0| temp: 1.95082 | loss: 1.05655| constrast_loss: 4.12903| div_loss: 0.97183| %_mask_idx: 0.42372| ppl: 18.02643| %_neg_is_pos: 0.1524| lr: 0.0| temp: 1.95082 | loss: 1.0792| constrast_loss: 4.21986| div_loss: 0.96947| %_mask_idx: 0.39975| ppl: 19.5419| %_neg_is_pos: 0.12913| lr: 0.0| temp: 1.95081 | loss: 1.05583| constrast_loss: 4.12633| div_loss: 0.9699| %_mask_idx: 0.35605| ppl: 19.26377| %_neg_is_pos: 0.15773| lr: 0.0| temp: 1.95081 | loss: 1.06231| constrast_loss: 4.15207| div_loss: 0.97188| %_mask_idx: 0.39677| ppl: 17.99572| %_neg_is_pos: 0.15573| lr: 0.0| temp: 1.95081 | loss: 1.04037| constrast_loss: 4.06402| div_loss: 0.97474| %_mask_idx: 0.38784| ppl: 16.16825| %_neg_is_pos: 0.18459| lr: 0.0| temp: 1.95081 | loss: 1.04746| constrast_loss: 4.0929| div_loss: 0.96952| %_mask_idx: 0.36043| ppl: 19.50798| %_neg_is_pos: 0.16398| lr: 0.0| temp: 1.9508 | loss: 1.04599| constrast_loss: 4.08691| div_loss: 0.97059| %_mask_idx: 0.36028| ppl: 18.82023| %_neg_is_pos: 0.14534| lr: 0.0| temp: 1.9508 | loss: 1.05476| constrast_loss: 4.12174| div_loss: 0.97298| %_mask_idx: 0.40022| ppl: 17.2949| %_neg_is_pos: 0.16016| lr: 0.0| temp: 1.95078 | loss: 1.03124| constrast_loss: 4.02764| div_loss: 0.97304| %_mask_idx: 0.37359| ppl: 17.25723| %_neg_is_pos: 0.1534| lr: 0.0| temp: 1.95078 | loss: 1.06778| constrast_loss: 4.17379| div_loss: 0.97326| %_mask_idx: 0.43343| ppl: 17.1121| %_neg_is_pos: 0.16104| lr: 0.0| temp: 1.95077 | loss: 1.06112| constrast_loss: 4.14732| div_loss: 0.97182| %_mask_idx: 0.38315| ppl: 18.0378| %_neg_is_pos: 0.15542| lr: 0.0| temp: 1.95077 | loss: 1.04997| constrast_loss: 4.10275| div_loss: 0.97125| %_mask_idx: 0.36247| ppl: 18.40153| %_neg_is_pos: 0.14611| lr: 0.0| temp: 1.95075 | loss: 1.04092| constrast_loss: 4.06622| div_loss: 0.97469| %_mask_idx: 0.37986| ppl: 16.19554| %_neg_is_pos: 0.17406| lr: 0.0| temp: 1.95075 | loss: 1.05768| constrast_loss: 4.13363| div_loss: 0.9709| %_mask_idx: 0.39928| ppl: 18.62562| %_neg_is_pos: 0.14364| lr: 0.0| temp: 1.95074 | loss: 1.05968| constrast_loss: 4.14134| div_loss: 0.97387| %_mask_idx: 0.36983| ppl: 16.72286| %_neg_is_pos: 0.14005| lr: 0.0| temp: 1.95074 | loss: 1.05127| constrast_loss: 4.10821| div_loss: 0.96858| %_mask_idx: 0.38299| ppl: 20.11193| %_neg_is_pos: 0.13307| lr: 0.0| temp: 1.95073 | loss: 1.06738| constrast_loss: 4.17271| div_loss: 0.96798| %_mask_idx: 0.37187| ppl: 20.49312| %_neg_is_pos: 0.14239| lr: 0.0| temp: 1.95073 | loss: 1.04599| constrast_loss: 4.08672| div_loss: 0.9725| %_mask_idx: 0.4021| ppl: 17.60215| %_neg_is_pos: 0.17131| lr: 0.0| temp: 1.95072 | loss: 1.06334| constrast_loss: 4.15624| div_loss: 0.97123| %_mask_idx: 0.36873| ppl: 18.41026| %_neg_is_pos: 0.14346| lr: 0.0| temp: 1.95072 | loss: 1.05837| constrast_loss: 4.13653| div_loss: 0.96961| %_mask_idx: 0.40382| ppl: 19.4506| %_neg_is_pos: 0.14071| lr: 0.0| temp: 1.9507 | loss: 1.05186| constrast_loss: 4.11017| div_loss: 0.97263| %_mask_idx: 0.38957| ppl: 17.51605| %_neg_is_pos: 0.15327| lr: 0.0| temp: 1.9507 | loss: 1.02947| constrast_loss: 4.02057| div_loss: 0.97318| %_mask_idx: 0.34305| ppl: 17.1673| %_neg_is_pos: 0.16712| lr: 0.0| temp: 1.95069 | loss: 1.03764| constrast_loss: 4.05319| div_loss: 0.97368| %_mask_idx: 0.35793| ppl: 16.84342| %_neg_is_pos: 0.16431| lr: 0.0| temp: 1.95069 | loss: 1.02259| constrast_loss: 3.99302| div_loss: 0.97357| %_mask_idx: 0.38095| ppl: 16.91479| %_neg_is_pos: 0.16977| lr: 0.0| temp: 1.95068 | loss: 1.05804| constrast_loss: 4.13478| div_loss: 0.97369| %_mask_idx: 0.4198| ppl: 16.83646| %_neg_is_pos: 0.14997| lr: 0.0| temp: 1.95068 | loss: 1.07488| constrast_loss: 4.20237| div_loss: 0.97143| %_mask_idx: 0.39787| ppl: 18.28442| %_neg_is_pos: 0.13479| lr: 0.0| temp: 1.95067 | loss: 1.04986| constrast_loss: 4.10208| div_loss: 0.97348| %_mask_idx: 0.37093| ppl: 16.9737| %_neg_is_pos: 0.15468| lr: 0.0| temp: 1.95067 | loss: 1.05058| constrast_loss: 4.10508| div_loss: 0.9724| %_mask_idx: 0.43233| ppl: 17.66674| %_neg_is_pos: 0.16217| lr: 0.0| temp: 1.95065 | loss: 1.04203| constrast_loss: 4.0708| div_loss: 0.97313| %_mask_idx: 0.35965| ppl: 17.19758| %_neg_is_pos: 0.17023| lr: 0.0| temp: 1.95065 | loss: 1.06477| constrast_loss: 4.16196| div_loss: 0.97115| %_mask_idx: 0.39474| ppl: 18.46177| %_neg_is_pos: 0.15369| lr: 0.0| temp: 1.95064 | loss: 1.0654| constrast_loss: 4.16454| div_loss: 0.97077| %_mask_idx: 0.39818| ppl: 18.70499| %_neg_is_pos: 0.15303| lr: 0.0| temp: 1.95064 | loss: 1.07203| constrast_loss: 4.19127| div_loss: 0.96858| %_mask_idx: 0.37782| ppl: 20.10857| %_neg_is_pos: 0.13166| lr: 0.0| temp: 1.95063 | loss: 0.99666| constrast_loss: 3.88908| div_loss: 0.9757| %_mask_idx: 0.37437| ppl: 15.55387| %_neg_is_pos: 0.17893| lr: 0.0| temp: 1.95063 | loss: 1.04664| constrast_loss: 4.08884| div_loss: 0.97725| %_mask_idx: 0.45175| ppl: 14.55797| %_neg_is_pos: 0.18649| lr: 0.0| temp: 1.95062 | loss: 1.04446| constrast_loss: 4.08064| div_loss: 0.97212| %_mask_idx: 0.3869| ppl: 17.84566| %_neg_is_pos: 0.17336| lr: 0.0| temp: 1.95062 | loss: 1.03459| constrast_loss: 4.04122| div_loss: 0.97139| %_mask_idx: 0.37359| ppl: 18.31285| %_neg_is_pos: 0.15609| lr: 1e-05| temp: 1.9506 | loss: 1.02513| constrast_loss: 4.00304| div_loss: 0.97499| %_mask_idx: 0.36497| ppl: 16.00957| %_neg_is_pos: 0.19417| lr: 1e-05| temp: 1.9506 | loss: 1.0367| constrast_loss: 4.0495| div_loss: 0.97289| %_mask_idx: 0.36388| ppl: 17.34876| %_neg_is_pos: 0.13581| lr: 1e-05| temp: 1.95059 | loss: 1.04166| constrast_loss: 4.06927| div_loss: 0.97379| %_mask_idx: 0.41087| ppl: 16.77377| %_neg_is_pos: 0.16343| lr: 1e-05| temp: 1.95059 | loss: 1.03552| constrast_loss: 4.04492| div_loss: 0.97145| %_mask_idx: 0.37484| ppl: 18.27008| %_neg_is_pos: 0.15731| lr: 1e-05| temp: 1.95057 | loss: 1.04376| constrast_loss: 4.07789| div_loss: 0.97138| %_mask_idx: 0.36028| ppl: 18.31713| %_neg_is_pos: 0.15864| lr: 1e-05| temp: 1.95057 | loss: 1.0764| constrast_loss: 4.20857| div_loss: 0.97037| %_mask_idx: 0.39317| ppl: 18.96472| %_neg_is_pos: 0.1437| lr: 1e-05| temp: 1.95056 | loss: 1.03706| constrast_loss: 4.05079| div_loss: 0.97469| %_mask_idx: 0.44063| ppl: 16.19698| %_neg_is_pos: 0.15724| lr: 1e-05| temp: 1.95056 | loss: 1.03254| constrast_loss: 4.03294| div_loss: 0.97236| %_mask_idx: 0.33349| ppl: 17.6924| %_neg_is_pos: 0.17509| lr: 1e-05| temp: 1.95055 | loss: 1.05004| constrast_loss: 4.10283| div_loss: 0.9733| %_mask_idx: 0.35432| ppl: 17.08589| %_neg_is_pos: 0.13408| lr: 1e-05| temp: 1.95055 | loss: 1.06149| constrast_loss: 4.14881| div_loss: 0.97157| %_mask_idx: 0.40006| ppl: 18.1981| %_neg_is_pos: 0.14572| lr: 1e-05| temp: 1.95054 | loss: 1.03358| constrast_loss: 4.03697| div_loss: 0.9734| %_mask_idx: 0.39395| ppl: 17.02307| %_neg_is_pos: 0.16374| lr: 1e-05| temp: 1.95054 | loss: 1.04865| constrast_loss: 4.09743| div_loss: 0.97158| %_mask_idx: 0.38628| ppl: 18.18563| %_neg_is_pos: 0.13792| lr: 1e-05| temp: 1.95052 | loss: 1.05766| constrast_loss: 4.13352| div_loss: 0.97118| %_mask_idx: 0.40335| ppl: 18.44278| %_neg_is_pos: 0.1479| lr: 1e-05| temp: 1.95052 | loss: 1.0451| constrast_loss: 4.08323| div_loss: 0.97171| %_mask_idx: 0.44204| ppl: 18.10354| %_neg_is_pos: 0.15557| lr: 1e-05| temp: 1.95051 | loss: 1.05786| constrast_loss: 4.13418| div_loss: 0.97248| %_mask_idx: 0.41338| ppl: 17.61525| %_neg_is_pos: 0.16011| lr: 1e-05| temp: 1.95051 | loss: 1.06294| constrast_loss: 4.15449| div_loss: 0.97249| %_mask_idx: 0.43421| ppl: 17.60379| %_neg_is_pos: 0.15875| lr: 1e-05| temp: 1.9505 | loss: 1.04258| constrast_loss: 4.07298| div_loss: 0.97355| %_mask_idx: 0.3573| ppl: 16.92984| %_neg_is_pos: 0.16982| lr: 1e-05| temp: 1.9505 | loss: 1.02752| constrast_loss: 4.01254| div_loss: 0.97547| %_mask_idx: 0.44001| ppl: 15.70036| %_neg_is_pos: 0.17496| lr: 1e-05| temp: 1.95049 | loss: 1.05627| constrast_loss: 4.1277| div_loss: 0.97374| %_mask_idx: 0.42434| ppl: 16.80761| %_neg_is_pos: 0.17906| lr: 1e-05| temp: 1.95049 | loss: 1.07137| constrast_loss: 4.18852| div_loss: 0.96974| %_mask_idx: 0.41353| ppl: 19.3645| %_neg_is_pos: 0.14198| lr: 1e-05| temp: 1.95047 | loss: 1.03625| constrast_loss: 4.04761| div_loss: 0.9738| %_mask_idx: 0.38033| ppl: 16.76806| %_neg_is_pos: 0.15863| lr: 1e-05| temp: 1.95047 | loss: 1.04296| constrast_loss: 4.07462| div_loss: 0.97239| %_mask_idx: 0.3609| ppl: 17.66826| %_neg_is_pos: 0.15021| lr: 1e-05| temp: 1.95046 | loss: 1.03231| constrast_loss: 4.03199| div_loss: 0.97263| %_mask_idx: 0.40257| ppl: 17.51927| %_neg_is_pos: 0.15942| lr: 1e-05| temp: 1.95046 | loss: 1.04048| constrast_loss: 4.06467| div_loss: 0.97236| %_mask_idx: 0.38894| ppl: 17.69059| %_neg_is_pos: 0.16128| lr: 1e-05| temp: 1.95045 | loss: 1.02713| constrast_loss: 4.01103| div_loss: 0.97479| %_mask_idx: 0.38581| ppl: 16.13439| %_neg_is_pos: 0.16193| lr: 1e-05| temp: 1.95045 | loss: 1.07267| constrast_loss: 4.19329| div_loss: 0.9739| %_mask_idx: 0.35464| ppl: 16.7056| %_neg_is_pos: 0.12843| lr: 1e-05| temp: 1.95044 | loss: 1.0569| constrast_loss: 4.13046| div_loss: 0.97154| %_mask_idx: 0.39395| ppl: 18.21526| %_neg_is_pos: 0.14625| lr: 1e-05| temp: 1.95044 | loss: 1.04946| constrast_loss: 4.10066| div_loss: 0.97195| %_mask_idx: 0.42372| ppl: 17.95351| %_neg_is_pos: 0.15709| lr: 1e-05| temp: 1.95042 | loss: 1.04233| constrast_loss: 4.07191| div_loss: 0.97408| %_mask_idx: 0.37813| ppl: 16.58602| %_neg_is_pos: 0.16629| lr: 1e-05| temp: 1.95042 | loss: 1.05437| constrast_loss: 4.1205| div_loss: 0.9699| %_mask_idx: 0.37437| ppl: 19.26444| %_neg_is_pos: 0.1578| lr: 1e-05| temp: 1.95041 | loss: 1.04501| constrast_loss: 4.08306| div_loss: 0.96982| %_mask_idx: 0.41385| ppl: 19.31623| %_neg_is_pos: 0.15183| lr: 1e-05| temp: 1.95041 [2021-09-02 08:01:53,330] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 08:01:53,330] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.06823| constrast_loss: 4.17565| div_loss: 0.97255| %_mask_idx: 0.43139| ppl: 17.56756| %_neg_is_pos: 0.14415| lr: 1e-05| temp: 1.9504 | loss: 1.03577| constrast_loss: 4.04562| div_loss: 0.97456| %_mask_idx: 0.3797| ppl: 16.28374| %_neg_is_pos: 0.15876| lr: 1e-05| temp: 1.9504 | loss: 1.04083| constrast_loss: 4.06592| div_loss: 0.97412| %_mask_idx: 0.41526| ppl: 16.56181| %_neg_is_pos: 0.17231| lr: 1e-05| temp: 1.95039 | loss: 1.03082| constrast_loss: 4.0259| div_loss: 0.97367| %_mask_idx: 0.41776| ppl: 16.84828| %_neg_is_pos: 0.18531| lr: 1e-05| temp: 1.95039 | loss: 1.00628| constrast_loss: 3.92748| div_loss: 0.97657| %_mask_idx: 0.36216| ppl: 14.99781| %_neg_is_pos: 0.20519| lr: 1e-05| temp: 1.95038 | loss: 1.04325| constrast_loss: 4.07556| div_loss: 0.97428| %_mask_idx: 0.40367| ppl: 16.45788| %_neg_is_pos: 0.18195| lr: 1e-05| temp: 1.95038 | loss: 1.03732| constrast_loss: 4.05174| div_loss: 0.9754| %_mask_idx: 0.41353| ppl: 15.74267| %_neg_is_pos: 0.18719| lr: 1e-05| temp: 1.95037 | loss: 1.03705| constrast_loss: 4.05074| div_loss: 0.97467| %_mask_idx: 0.39552| ppl: 16.21045| %_neg_is_pos: 0.18406| lr: 1e-05| temp: 1.95037 | loss: 0.99535| constrast_loss: 3.88355| div_loss: 0.97835| %_mask_idx: 0.375| ppl: 13.85695| %_neg_is_pos: 0.20757| lr: 1e-05| temp: 1.95035 | loss: 1.00095| constrast_loss: 3.90602| div_loss: 0.9776| %_mask_idx: 0.3526| ppl: 14.33679| %_neg_is_pos: 0.19691| lr: 1e-05| temp: 1.95035 | loss: 0.99039| constrast_loss: 3.86362| div_loss: 0.97927| %_mask_idx: 0.36513| ppl: 13.26418| %_neg_is_pos: 0.20537| lr: 1e-05| temp: 1.95034 | loss: 1.01011| constrast_loss: 3.94264| div_loss: 0.97778| %_mask_idx: 0.3302| ppl: 14.22278| %_neg_is_pos: 0.19951| lr: 1e-05| temp: 1.95034 | loss: 0.98907| constrast_loss: 3.85841| div_loss: 0.97876| %_mask_idx: 0.41291| ppl: 13.59431| %_neg_is_pos: 0.22971| lr: 1e-05| temp: 1.95033 | loss: 0.9711| constrast_loss: 3.78645| div_loss: 0.97964| %_mask_idx: 0.37187| ppl: 13.03148| %_neg_is_pos: 0.24468| lr: 1e-05| temp: 1.95033 | loss: 0.97754| constrast_loss: 3.81213| div_loss: 0.98019| %_mask_idx: 0.43844| ppl: 12.67614| %_neg_is_pos: 0.25023| lr: 1e-05| temp: 1.95032 | loss: 0.98668| constrast_loss: 3.84902| div_loss: 0.97691| %_mask_idx: 0.36059| ppl: 14.77615| %_neg_is_pos: 0.20649| lr: 1e-05| temp: 1.95032 | loss: 0.99905| constrast_loss: 3.89841| div_loss: 0.97789| %_mask_idx: 0.38753| ppl: 14.14925| %_neg_is_pos: 0.20537| lr: 1e-05| temp: 1.9503 | loss: 1.00972| constrast_loss: 3.94134| div_loss: 0.9752| %_mask_idx: 0.39677| ppl: 15.87461| %_neg_is_pos: 0.2162| lr: 1e-05| temp: 1.9503 | loss: 1.02147| constrast_loss: 3.98826| div_loss: 0.97624| %_mask_idx: 0.42325| ppl: 15.20533| %_neg_is_pos: 0.21391| lr: 1e-05| temp: 1.95029 | loss: 1.01608| constrast_loss: 3.96681| div_loss: 0.9752| %_mask_idx: 0.37751| ppl: 15.87014| %_neg_is_pos: 0.19707| lr: 1e-05| temp: 1.95029 | loss: 1.00897| constrast_loss: 3.9383| div_loss: 0.97568| %_mask_idx: 0.38377| ppl: 15.56567| %_neg_is_pos: 0.19744| lr: 1e-05| temp: 1.95028 | loss: 1.01607| constrast_loss: 3.96675| div_loss: 0.97515| %_mask_idx: 0.42826| ppl: 15.90107| %_neg_is_pos: 0.20277| lr: 1e-05| temp: 1.95028 | loss: 1.00125| constrast_loss: 3.90753| div_loss: 0.97456| %_mask_idx: 0.38534| ppl: 16.28032| %_neg_is_pos: 0.20956| lr: 1e-05| temp: 1.95027 | loss: 0.99343| constrast_loss: 3.87604| div_loss: 0.97682| %_mask_idx: 0.39521| ppl: 14.83406| %_neg_is_pos: 0.19173| lr: 1e-05| temp: 1.95027 | loss: 0.9686| constrast_loss: 3.77678| div_loss: 0.97603| %_mask_idx: 0.35902| ppl: 15.34222| %_neg_is_pos: 0.20644| lr: 1e-05| temp: 1.95025 | loss: 1.06471| constrast_loss: 4.1617| div_loss: 0.9714| %_mask_idx: 0.40179| ppl: 18.30111| %_neg_is_pos: 0.14777| lr: 1e-05| temp: 1.95025 | loss: 1.02703| constrast_loss: 4.01088| div_loss: 0.97231| %_mask_idx: 0.35981| ppl: 17.72168| %_neg_is_pos: 0.17593| lr: 1e-05| temp: 1.95024 | loss: 1.04966| constrast_loss: 4.10167| div_loss: 0.96957| %_mask_idx: 0.4256| ppl: 19.47602| %_neg_is_pos: 0.16073| lr: 1e-05| temp: 1.95024 | loss: 1.03247| constrast_loss: 4.03272| div_loss: 0.9716| %_mask_idx: 0.37829| ppl: 18.17327| %_neg_is_pos: 0.16596| lr: 1e-05| temp: 1.95022 | loss: 1.01651| constrast_loss: 3.96889| div_loss: 0.9715| %_mask_idx: 0.41729| ppl: 18.23845| %_neg_is_pos: 0.16654| lr: 1e-05| temp: 1.95022 | loss: 1.02822| constrast_loss: 4.01573| div_loss: 0.97156| %_mask_idx: 0.39881| ppl: 18.19983| %_neg_is_pos: 0.17963| lr: 1e-05| temp: 1.95021 | loss: 1.08105| constrast_loss: 4.22772| div_loss: 0.96471| %_mask_idx: 0.36341| ppl: 22.58374| %_neg_is_pos: 0.11896| lr: 1e-05| temp: 1.95021 | loss: 1.05587| constrast_loss: 4.12641| div_loss: 0.97069| %_mask_idx: 0.35981| ppl: 18.75949| %_neg_is_pos: 0.1451| lr: 1e-05| temp: 1.9502 | loss: 1.0176| constrast_loss: 3.97303| div_loss: 0.97361| %_mask_idx: 0.34508| ppl: 16.88703| %_neg_is_pos: 0.17361| lr: 1e-05| temp: 1.9502 | loss: 1.00565| constrast_loss: 3.9255| div_loss: 0.97113| %_mask_idx: 0.39129| ppl: 18.47683| %_neg_is_pos: 0.16977| lr: 1e-05| temp: 1.95019 | loss: 1.03355| constrast_loss: 4.03713| div_loss: 0.97077| %_mask_idx: 0.36842| ppl: 18.71029| %_neg_is_pos: 0.16445| lr: 1e-05| temp: 1.95019 | loss: 1.05597| constrast_loss: 4.12721| div_loss: 0.96654| %_mask_idx: 0.35918| ppl: 21.41388| %_neg_is_pos: 0.12817| lr: 1e-05| temp: 1.95017 | loss: 1.04849| constrast_loss: 4.09684| div_loss: 0.9711| %_mask_idx: 0.40116| ppl: 18.49405| %_neg_is_pos: 0.15809| lr: 1e-05| temp: 1.95017 | loss: 1.04246| constrast_loss: 4.07292| div_loss: 0.969| %_mask_idx: 0.37876| ppl: 19.83949| %_neg_is_pos: 0.17245| lr: 1e-05| temp: 1.95016 | loss: 1.03934| constrast_loss: 4.06009| div_loss: 0.97261| %_mask_idx: 0.43546| ppl: 17.52864| %_neg_is_pos: 0.17274| lr: 1e-05| temp: 1.95016 | loss: 1.04776| constrast_loss: 4.09406| div_loss: 0.96962| %_mask_idx: 0.38847| ppl: 19.44011| %_neg_is_pos: 0.17404| lr: 1e-05| temp: 1.95015 | loss: 1.04091| constrast_loss: 4.06647| div_loss: 0.97167| %_mask_idx: 0.38221| ppl: 18.13388| %_neg_is_pos: 0.16664| lr: 1e-05| temp: 1.95015 | loss: 1.04328| constrast_loss: 4.07608| div_loss: 0.97055| %_mask_idx: 0.40883| ppl: 18.84693| %_neg_is_pos: 0.1667| lr: 1e-05| temp: 1.95014 | loss: 1.03956| constrast_loss: 4.06123| div_loss: 0.97011| %_mask_idx: 0.41259| ppl: 19.12948| %_neg_is_pos: 0.16288| lr: 1e-05| temp: 1.95014 | loss: 1.01905| constrast_loss: 3.97897| div_loss: 0.97249| %_mask_idx: 0.42575| ppl: 17.60874| %_neg_is_pos: 0.16028| lr: 1e-05| temp: 1.95012 | loss: 1.04612| constrast_loss: 4.08755| div_loss: 0.96928| %_mask_idx: 0.33239| ppl: 19.65786| %_neg_is_pos: 0.1451| lr: 1e-05| temp: 1.95012 | loss: 1.04439| constrast_loss: 4.08033| div_loss: 0.97223| %_mask_idx: 0.41369| ppl: 17.77347| %_neg_is_pos: 0.17595| lr: 1e-05| temp: 1.95011 | loss: 1.04741| constrast_loss: 4.09269| div_loss: 0.96953| %_mask_idx: 0.38549| ppl: 19.49951| %_neg_is_pos: 0.16291| lr: 1e-05| temp: 1.95011 | loss: 1.04514| constrast_loss: 4.08309| div_loss: 0.97471| %_mask_idx: 0.37077| ppl: 16.1864| %_neg_is_pos: 0.17245| lr: 1e-05| temp: 1.9501 | loss: 1.02993| constrast_loss: 4.0226| div_loss: 0.97106| %_mask_idx: 0.39254| ppl: 18.51924| %_neg_is_pos: 0.16499| lr: 1e-05| temp: 1.9501 | loss: 1.04481| constrast_loss: 4.08226| div_loss: 0.96992| %_mask_idx: 0.39834| ppl: 19.24966| %_neg_is_pos: 0.15367| lr: 1e-05| temp: 1.95009 | loss: 1.05769| constrast_loss: 4.13406| div_loss: 0.9669| %_mask_idx: 0.37641| ppl: 21.18437| %_neg_is_pos: 0.15159| lr: 1e-05| temp: 1.95009 | loss: 1.05318| constrast_loss: 4.11606| div_loss: 0.96676| %_mask_idx: 0.37672| ppl: 21.27197| %_neg_is_pos: 0.1296| lr: 1e-05| temp: 1.95007 | loss: 1.02899| constrast_loss: 4.01891| div_loss: 0.97061| %_mask_idx: 0.33976| ppl: 18.80902| %_neg_is_pos: 0.1817| lr: 1e-05| temp: 1.95007 | loss: 1.00137| constrast_loss: 3.90812| div_loss: 0.97349| %_mask_idx: 0.38001| ppl: 16.96768| %_neg_is_pos: 0.18039| lr: 1e-05| temp: 1.95006 | loss: 1.02323| constrast_loss: 3.99552| div_loss: 0.97407| %_mask_idx: 0.38158| ppl: 16.59444| %_neg_is_pos: 0.18085| lr: 1e-05| temp: 1.95006 | loss: 1.04181| constrast_loss: 4.07018| div_loss: 0.97077| %_mask_idx: 0.37265| ppl: 18.7052| %_neg_is_pos: 0.17705| lr: 1e-05| temp: 1.95004 | loss: 1.0524| constrast_loss: 4.11285| div_loss: 0.96757| %_mask_idx: 0.40069| ppl: 20.75546| %_neg_is_pos: 0.13731| lr: 1e-05| temp: 1.95004 | loss: 1.03009| constrast_loss: 4.02322| div_loss: 0.97142| %_mask_idx: 0.36842| ppl: 18.29364| %_neg_is_pos: 0.17833| lr: 1e-05| temp: 1.95003 | loss: 1.03449| constrast_loss: 4.04087| div_loss: 0.97091| %_mask_idx: 0.43311| ppl: 18.61915| %_neg_is_pos: 0.17827| lr: 1e-05| temp: 1.95003 | loss: 1.03659| constrast_loss: 4.04948| div_loss: 0.96888| %_mask_idx: 0.39756| ppl: 19.91731| %_neg_is_pos: 0.16016| lr: 1e-05| temp: 1.95002 | loss: 1.03868| constrast_loss: 4.05747| div_loss: 0.97231| %_mask_idx: 0.4057| ppl: 17.72167| %_neg_is_pos: 0.17211| lr: 1e-05| temp: 1.95002 | loss: 1.02063| constrast_loss: 3.98528| div_loss: 0.97242| %_mask_idx: 0.40836| ppl: 17.64818| %_neg_is_pos: 0.18911| lr: 1e-05| temp: 1.95002 | loss: 1.05746| constrast_loss: 4.13268| div_loss: 0.97166| %_mask_idx: 0.43875| ppl: 18.14| %_neg_is_pos: 0.172| lr: 1e-05| temp: 1.95002 | loss: 1.06407| constrast_loss: 4.15958| div_loss: 0.96721| %_mask_idx: 0.44815| ppl: 20.98663| %_neg_is_pos: 0.15178| lr: 1e-05| temp: 1.95 | loss: 1.02719| constrast_loss: 4.01167| div_loss: 0.97104| %_mask_idx: 0.37046| ppl: 18.53447| %_neg_is_pos: 0.17346| lr: 1e-05| temp: 1.95 | loss: 1.03025| constrast_loss: 4.02379| div_loss: 0.97203| %_mask_idx: 0.34336| ppl: 17.9| %_neg_is_pos: 0.14572| lr: 1e-05| temp: 1.94999 | loss: 1.04002| constrast_loss: 4.06284| div_loss: 0.97224| %_mask_idx: 0.44063| ppl: 17.76526| %_neg_is_pos: 0.17047| lr: 1e-05| temp: 1.94999 | loss: 1.04319| constrast_loss: 4.07579| div_loss: 0.96973| %_mask_idx: 0.34994| ppl: 19.37226| %_neg_is_pos: 0.17627| lr: 1e-05| temp: 1.94998 | loss: 1.04489| constrast_loss: 4.08244| div_loss: 0.97126| %_mask_idx: 0.38158| ppl: 18.39319| %_neg_is_pos: 0.14938| lr: 1e-05| temp: 1.94998 | loss: 1.03576| constrast_loss: 4.04603| div_loss: 0.97029| %_mask_idx: 0.38017| ppl: 19.01414| %_neg_is_pos: 0.16103| lr: 1e-05| temp: 1.94997 | loss: 1.03812| constrast_loss: 4.05554| div_loss: 0.96932| %_mask_idx: 0.31908| ppl: 19.63706| %_neg_is_pos: 0.14786| lr: 1e-05| temp: 1.94997 | loss: 1.03803| constrast_loss: 4.05509| div_loss: 0.97042| %_mask_idx: 0.39787| ppl: 18.93291| %_neg_is_pos: 0.16006| lr: 1e-05| temp: 1.94995 | loss: 1.01256| constrast_loss: 3.95304| div_loss: 0.972| %_mask_idx: 0.38675| ppl: 17.9183| %_neg_is_pos: 0.17582| lr: 1e-05| temp: 1.94995 | loss: 1.04134| constrast_loss: 4.06845| div_loss: 0.96907| %_mask_idx: 0.41259| ppl: 19.79759| %_neg_is_pos: 0.15379| lr: 1e-05| temp: 1.94994 | loss: 1.0306| constrast_loss: 4.02525| div_loss: 0.97156| %_mask_idx: 0.35401| ppl: 18.2034| %_neg_is_pos: 0.17712| lr: 1e-05| temp: 1.94994 | loss: 1.04795| constrast_loss: 4.09473| div_loss: 0.97056| %_mask_idx: 0.40116| ppl: 18.84385| %_neg_is_pos: 0.16675| lr: 1e-05| temp: 1.94993 | loss: 1.02477| constrast_loss: 4.00187| div_loss: 0.97209| %_mask_idx: 0.38409| ppl: 17.86408| %_neg_is_pos: 0.17187| lr: 1e-05| temp: 1.94993 | loss: 1.03514| constrast_loss: 4.0435| div_loss: 0.97072| %_mask_idx: 0.40038| ppl: 18.74058| %_neg_is_pos: 0.16275| lr: 1e-05| temp: 1.94992 | loss: 1.05463| constrast_loss: 4.12124| div_loss: 0.97281| %_mask_idx: 0.38596| ppl: 17.39942| %_neg_is_pos: 0.16924| lr: 1e-05| temp: 1.94992 | loss: 1.01453| constrast_loss: 3.96067| div_loss: 0.97434| %_mask_idx: 0.32973| ppl: 16.42389| %_neg_is_pos: 0.16663| lr: 1e-05| temp: 1.9499 | loss: 1.03451| constrast_loss: 4.04102| div_loss: 0.97027| %_mask_idx: 0.40899| ppl: 19.02926| %_neg_is_pos: 0.15995| lr: 1e-05| temp: 1.9499 | loss: 1.02493| constrast_loss: 4.0027| div_loss: 0.97029| %_mask_idx: 0.39834| ppl: 19.01617| %_neg_is_pos: 0.17459| lr: 1e-05| temp: 1.94989 | loss: 1.04207| constrast_loss: 4.07109| div_loss: 0.97182| %_mask_idx: 0.38111| ppl: 18.03456| %_neg_is_pos: 0.17714| lr: 1e-05| temp: 1.94989 | loss: 1.03154| constrast_loss: 4.02903| div_loss: 0.97127| %_mask_idx: 0.43844| ppl: 18.38757| %_neg_is_pos: 0.17618| lr: 1e-05| temp: 1.94987 | loss: 1.04968| constrast_loss: 4.10194| div_loss: 0.9677| %_mask_idx: 0.38957| ppl: 20.67153| %_neg_is_pos: 0.13486| lr: 1e-05| temp: 1.94987 | loss: 1.01362| constrast_loss: 3.95715| div_loss: 0.97335| %_mask_idx: 0.39458| ppl: 17.05414| %_neg_is_pos: 0.17655| lr: 1e-05| temp: 1.94986 | loss: 1.02857| constrast_loss: 4.01716| div_loss: 0.97143| %_mask_idx: 0.37547| ppl: 18.28489| %_neg_is_pos: 0.16133| lr: 1e-05| temp: 1.94986 | loss: 1.0355| constrast_loss: 4.0451| div_loss: 0.969| %_mask_idx: 0.40695| ppl: 19.83905| %_neg_is_pos: 0.16702| lr: 1e-05| temp: 1.94985 | loss: 1.05728| constrast_loss: 4.13224| div_loss: 0.96897| %_mask_idx: 0.38205| ppl: 19.85859| %_neg_is_pos: 0.13833| lr: 1e-05| temp: 1.94985 | loss: 1.04748| constrast_loss: 4.09309| div_loss: 0.96845| %_mask_idx: 0.38205| ppl: 20.1918| %_neg_is_pos: 0.15098| lr: 1e-05| temp: 1.94984 | loss: 1.01841| constrast_loss: 3.97651| div_loss: 0.97114| %_mask_idx: 0.3161| ppl: 18.47043| %_neg_is_pos: 0.16882| lr: 1e-05| temp: 1.94984 | loss: 1.01814| constrast_loss: 3.97527| div_loss: 0.97298| %_mask_idx: 0.33255| ppl: 17.29419| %_neg_is_pos: 0.19175| lr: 1e-05| temp: 1.94982 | loss: 1.0301| constrast_loss: 4.02329| div_loss: 0.97096| %_mask_idx: 0.41682| ppl: 18.58733| %_neg_is_pos: 0.16469| lr: 1e-05| temp: 1.94982 | loss: 1.05824| constrast_loss: 4.13626| div_loss: 0.96696| %_mask_idx: 0.39834| ppl: 21.14761| %_neg_is_pos: 0.1587| lr: 1e-05| temp: 1.94981 | loss: 1.04514| constrast_loss: 4.08359| div_loss: 0.96971| %_mask_idx: 0.38878| ppl: 19.3866| %_neg_is_pos: 0.16423| lr: 1e-05| temp: 1.94981 | loss: 1.05621| constrast_loss: 4.12788| div_loss: 0.96953| %_mask_idx: 0.38268| ppl: 19.50147| %_neg_is_pos: 0.16295| lr: 1e-05| temp: 1.9498 | loss: 1.03567| constrast_loss: 4.04568| div_loss: 0.97008| %_mask_idx: 0.38111| ppl: 19.14935| %_neg_is_pos: 0.15695| lr: 1e-05| temp: 1.9498 | loss: 1.0252| constrast_loss: 4.0036| div_loss: 0.97182| %_mask_idx: 0.35855| ppl: 18.0367| %_neg_is_pos: 0.18478| lr: 1e-05| temp: 1.94979 | loss: 1.03767| constrast_loss: 4.05372| div_loss: 0.96957| %_mask_idx: 0.44518| ppl: 19.47738| %_neg_is_pos: 0.16615| lr: 1e-05| temp: 1.94979 | loss: 1.02746| constrast_loss: 4.01253| div_loss: 0.97299| %_mask_idx: 0.38221| ppl: 17.28587| %_neg_is_pos: 0.16952| lr: 1e-05| temp: 1.94977 | loss: 1.03851| constrast_loss: 4.05695| div_loss: 0.97082| %_mask_idx: 0.44189| ppl: 18.67425| %_neg_is_pos: 0.14696| lr: 1e-05| temp: 1.94977 | loss: 1.00534| constrast_loss: 3.92405| div_loss: 0.9732| %_mask_idx: 0.38362| ppl: 17.15307| %_neg_is_pos: 0.19927| lr: 1e-05| temp: 1.94976 | loss: 1.01899| constrast_loss: 3.97862| div_loss: 0.97343| %_mask_idx: 0.41729| ppl: 17.00361| %_neg_is_pos: 0.17008| lr: 1e-05| temp: 1.94976 | loss: 1.03169| constrast_loss: 4.02969| div_loss: 0.97074| %_mask_idx: 0.42419| ppl: 18.72792| %_neg_is_pos: 0.16531| lr: 1e-05| temp: 1.94975 | loss: 1.0511| constrast_loss: 4.10765| div_loss: 0.96767| %_mask_idx: 0.43437| ppl: 20.69376| %_neg_is_pos: 0.15035| lr: 1e-05| temp: 1.94975 | loss: 0.99606| constrast_loss: 3.88674| div_loss: 0.97507| %_mask_idx: 0.39098| ppl: 15.95383| %_neg_is_pos: 0.19534| lr: 1e-05| temp: 1.94974 | loss: 1.02021| constrast_loss: 3.98358| div_loss: 0.97259| %_mask_idx: 0.36106| ppl: 17.54514| %_neg_is_pos: 0.16361| lr: 1e-05| temp: 1.94974 | loss: 1.03577| constrast_loss: 4.04578| div_loss: 0.97275| %_mask_idx: 0.43891| ppl: 17.43964| %_neg_is_pos: 0.18487| lr: 1e-05| temp: 1.94972 | loss: 1.02643| constrast_loss: 4.00839| div_loss: 0.97339| %_mask_idx: 0.41745| ppl: 17.02784| %_neg_is_pos: 0.20329| lr: 1e-05| temp: 1.94972 | loss: 1.03666| constrast_loss: 4.04967| div_loss: 0.96981| %_mask_idx: 0.39129| ppl: 19.32107| %_neg_is_pos: 0.15982| lr: 1e-05| temp: 1.94971 | loss: 1.0617| constrast_loss: 4.14981| div_loss: 0.97| %_mask_idx: 0.37672| ppl: 19.19983| %_neg_is_pos: 0.15062| lr: 1e-05| temp: 1.94971 [2021-09-02 08:11:05,972] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 08:11:05,972] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.03936| constrast_loss: 4.06022| div_loss: 0.9723| %_mask_idx: 0.43499| ppl: 17.72589| %_neg_is_pos: 0.18503| lr: 1e-05| temp: 1.94969 | loss: 1.04272| constrast_loss: 4.07393| div_loss: 0.96966| %_mask_idx: 0.40586| ppl: 19.41984| %_neg_is_pos: 0.16184| lr: 1e-05| temp: 1.94969 | loss: 1.04128| constrast_loss: 4.06828| div_loss: 0.96839| %_mask_idx: 0.36607| ppl: 20.2289| %_neg_is_pos: 0.16121| lr: 1e-05| temp: 1.94968 | loss: 1.0331| constrast_loss: 4.03555| div_loss: 0.9684| %_mask_idx: 0.38925| ppl: 20.22102| %_neg_is_pos: 0.15917| lr: 1e-05| temp: 1.94968 | loss: 1.02138| constrast_loss: 3.98839| div_loss: 0.97119| %_mask_idx: 0.31516| ppl: 18.43843| %_neg_is_pos: 0.15053| lr: 1e-05| temp: 1.94967 | loss: 0.95723| constrast_loss: 3.73127| div_loss: 0.97641| %_mask_idx: 0.31877| ppl: 15.09899| %_neg_is_pos: 0.22296| lr: 1e-05| temp: 1.94967 | loss: 0.98935| constrast_loss: 3.86041| div_loss: 0.97001| %_mask_idx: 0.39145| ppl: 19.1963| %_neg_is_pos: 0.19325| lr: 1e-05| temp: 1.94966 | loss: 0.99651| constrast_loss: 3.88887| div_loss: 0.97175| %_mask_idx: 0.41745| ppl: 18.08156| %_neg_is_pos: 0.19564| lr: 1e-05| temp: 1.94966 | loss: 1.01574| constrast_loss: 3.96575| div_loss: 0.97225| %_mask_idx: 0.35714| ppl: 17.75793| %_neg_is_pos: 0.15777| lr: 1e-05| temp: 1.94964 | loss: 1.00088| constrast_loss: 3.90628| div_loss: 0.9722| %_mask_idx: 0.32879| ppl: 17.78972| %_neg_is_pos: 0.16939| lr: 1e-05| temp: 1.94964 | loss: 0.99429| constrast_loss: 3.87981| div_loss: 0.97346| %_mask_idx: 0.44721| ppl: 16.98297| %_neg_is_pos: 0.2067| lr: 1e-05| temp: 1.94963 | loss: 0.99877| constrast_loss: 3.89796| div_loss: 0.97116| %_mask_idx: 0.35009| ppl: 18.45839| %_neg_is_pos: 0.19203| lr: 1e-05| temp: 1.94963 | loss: 0.99121| constrast_loss: 3.8674| div_loss: 0.97423| %_mask_idx: 0.36607| ppl: 16.49341| %_neg_is_pos: 0.20886| lr: 1e-05| temp: 1.94963 | loss: 1.04766| constrast_loss: 4.09402| div_loss: 0.96614| %_mask_idx: 0.388| ppl: 21.66907| %_neg_is_pos: 0.14518| lr: 1e-05| temp: 1.94963 | loss: 1.01591| constrast_loss: 3.9665| div_loss: 0.97142| %_mask_idx: 0.41353| ppl: 18.29237| %_neg_is_pos: 0.1978| lr: 1e-05| temp: 1.94962 | loss: 1.04042| constrast_loss: 4.06485| div_loss: 0.96839| %_mask_idx: 0.39129| ppl: 20.23012| %_neg_is_pos: 0.16188| lr: 1e-05| temp: 1.94962 | loss: 1.01031| constrast_loss: 3.9442| div_loss: 0.97044| %_mask_idx: 0.39364| ppl: 18.91865| %_neg_is_pos: 0.17006| lr: 1e-05| temp: 1.9496 | loss: 1.02452| constrast_loss: 4.00096| div_loss: 0.97116| %_mask_idx: 0.42372| ppl: 18.45482| %_neg_is_pos: 0.21519| lr: 1e-05| temp: 1.9496 | loss: 1.0461| constrast_loss: 4.08803| div_loss: 0.9639| %_mask_idx: 0.38048| ppl: 23.1014| %_neg_is_pos: 0.14768| lr: 1e-05| temp: 1.94959 | loss: 1.03073| constrast_loss: 4.02633| div_loss: 0.96593| %_mask_idx: 0.38878| ppl: 21.80555| %_neg_is_pos: 0.16722| lr: 1e-05| temp: 1.94959 | loss: 1.05327| constrast_loss: 4.11714| div_loss: 0.95926| %_mask_idx: 0.40742| ppl: 26.076| %_neg_is_pos: 0.12143| lr: 1e-05| temp: 1.94958 | loss: 1.0718| constrast_loss: 4.191| div_loss: 0.96205| %_mask_idx: 0.39803| ppl: 24.28952| %_neg_is_pos: 0.13676| lr: 1e-05| temp: 1.94958 | loss: 1.08287| constrast_loss: 4.23572| div_loss: 0.95739| %_mask_idx: 0.39944| ppl: 27.2697| %_neg_is_pos: 0.12695| lr: 1e-05| temp: 1.94957 | loss: 1.07571| constrast_loss: 4.20661| div_loss: 0.96226| %_mask_idx: 0.4021| ppl: 24.15678| %_neg_is_pos: 0.13723| lr: 1e-05| temp: 1.94957 | loss: 1.05531| constrast_loss: 4.12495| div_loss: 0.96281| %_mask_idx: 0.41056| ppl: 23.80392| %_neg_is_pos: 0.14439| lr: 1e-05| temp: 1.94955 | loss: 1.07442| constrast_loss: 4.2019| div_loss: 0.95786| %_mask_idx: 0.41541| ppl: 26.96822| %_neg_is_pos: 0.13309| lr: 1e-05| temp: 1.94955 | loss: 1.08426| constrast_loss: 4.24137| div_loss: 0.95651| %_mask_idx: 0.33349| ppl: 27.83304| %_neg_is_pos: 0.11665| lr: 1e-05| temp: 1.94954 | loss: 1.05553| constrast_loss: 4.1262| div_loss: 0.95938| %_mask_idx: 0.35432| ppl: 25.99604| %_neg_is_pos: 0.13779| lr: 1e-05| temp: 1.94954 | loss: 1.06809| constrast_loss: 4.17647| div_loss: 0.95902| %_mask_idx: 0.40132| ppl: 26.22881| %_neg_is_pos: 0.1193| lr: 1e-05| temp: 1.94952 | loss: 1.04563| constrast_loss: 4.08683| div_loss: 0.9569| %_mask_idx: 0.33083| ppl: 27.58515| %_neg_is_pos: 0.13854| lr: 1e-05| temp: 1.94952 | loss: 1.03718| constrast_loss: 4.05237| div_loss: 0.96369| %_mask_idx: 0.36372| ppl: 23.2391| %_neg_is_pos: 0.1597| lr: 1e-05| temp: 1.94951 | loss: 1.06695| constrast_loss: 4.17242| div_loss: 0.95384| %_mask_idx: 0.39098| ppl: 29.5398| %_neg_is_pos: 0.13375| lr: 1e-05| temp: 1.94951 | loss: 1.07106| constrast_loss: 4.18849| div_loss: 0.95753| %_mask_idx: 0.41118| ppl: 27.18163| %_neg_is_pos: 0.11528| lr: 1e-05| temp: 1.9495 | loss: 1.07058| constrast_loss: 4.18651| div_loss: 0.95828| %_mask_idx: 0.37641| ppl: 26.69906| %_neg_is_pos: 0.13408| lr: 1e-05| temp: 1.9495 | loss: 1.08098| constrast_loss: 4.22817| div_loss: 0.95742| %_mask_idx: 0.39098| ppl: 27.25105| %_neg_is_pos: 0.1171| lr: 1e-05| temp: 1.94949 | loss: 1.05316| constrast_loss: 4.11691| div_loss: 0.9573| %_mask_idx: 0.34038| ppl: 27.33076| %_neg_is_pos: 0.13516| lr: 1e-05| temp: 1.94949 | loss: 1.07284| constrast_loss: 4.1959| div_loss: 0.95448| %_mask_idx: 0.40132| ppl: 29.1318| %_neg_is_pos: 0.11989| lr: 1e-05| temp: 1.94947 | loss: 1.05743| constrast_loss: 4.1341| div_loss: 0.95611| %_mask_idx: 0.34571| ppl: 28.08797| %_neg_is_pos: 0.12919| lr: 1e-05| temp: 1.94947 | loss: 1.06798| constrast_loss: 4.17569| div_loss: 0.9621| %_mask_idx: 0.43609| ppl: 24.25288| %_neg_is_pos: 0.13781| lr: 1e-05| temp: 1.94946 | loss: 1.07983| constrast_loss: 4.22321| div_loss: 0.96097| %_mask_idx: 0.38127| ppl: 24.97803| %_neg_is_pos: 0.14164| lr: 1e-05| temp: 1.94946 | loss: 1.04103| constrast_loss: 4.06788| div_loss: 0.9622| %_mask_idx: 0.34759| ppl: 24.19016| %_neg_is_pos: 0.13033| lr: 1e-05| temp: 1.94945 | loss: 1.07541| constrast_loss: 4.20589| div_loss: 0.95746| %_mask_idx: 0.39192| ppl: 27.22274| %_neg_is_pos: 0.12247| lr: 1e-05| temp: 1.94945 | loss: 1.06525| constrast_loss: 4.16545| div_loss: 0.95553| %_mask_idx: 0.3562| ppl: 28.4584| %_neg_is_pos: 0.14936| lr: 1e-05| temp: 1.94944 | loss: 1.02| constrast_loss: 3.98388| div_loss: 0.96122| %_mask_idx: 0.3656| ppl: 24.82013| %_neg_is_pos: 0.16507| lr: 1e-05| temp: 1.94944 | loss: 1.08971| constrast_loss: 4.26325| div_loss: 0.95599| %_mask_idx: 0.45614| ppl: 28.16726| %_neg_is_pos: 0.12137| lr: 1e-05| temp: 1.94942 | loss: 1.06289| constrast_loss: 4.15577| div_loss: 0.95781| %_mask_idx: 0.36967| ppl: 27.00444| %_neg_is_pos: 0.12861| lr: 1e-05| temp: 1.94942 | loss: 1.0684| constrast_loss: 4.17769| div_loss: 0.95901| %_mask_idx: 0.4057| ppl: 26.23577| %_neg_is_pos: 0.12812| lr: 1e-05| temp: 1.94941 | loss: 1.03792| constrast_loss: 4.05531| div_loss: 0.96372| %_mask_idx: 0.4093| ppl: 23.21635| %_neg_is_pos: 0.15418| lr: 1e-05| temp: 1.94941 | loss: 1.0675| constrast_loss: 4.17405| div_loss: 0.95948| %_mask_idx: 0.40648| ppl: 25.93196| %_neg_is_pos: 0.1261| lr: 1e-05| temp: 1.9494 | loss: 1.07837| constrast_loss: 4.2177| div_loss: 0.95769| %_mask_idx: 0.3844| ppl: 27.08089| %_neg_is_pos: 0.10944| lr: 1e-05| temp: 1.9494 | loss: 1.07371| constrast_loss: 4.19906| div_loss: 0.95786| %_mask_idx: 0.40429| ppl: 26.96786| %_neg_is_pos: 0.13278| lr: 1e-05| temp: 1.94939 | loss: 1.07601| constrast_loss: 4.20838| div_loss: 0.95645| %_mask_idx: 0.3963| ppl: 27.86973| %_neg_is_pos: 0.13459| lr: 1e-05| temp: 1.94939 | loss: 1.04267| constrast_loss: 4.07473| div_loss: 0.95946| %_mask_idx: 0.38503| ppl: 25.94491| %_neg_is_pos: 0.14254| lr: 1e-05| temp: 1.94937 | loss: 1.05596| constrast_loss: 4.12792| div_loss: 0.95936| %_mask_idx: 0.34978| ppl: 26.01237| %_neg_is_pos: 0.12426| lr: 1e-05| temp: 1.94937 | loss: 1.08386| constrast_loss: 4.23982| div_loss: 0.95621| %_mask_idx: 0.46225| ppl: 28.02808| %_neg_is_pos: 0.11396| lr: 1e-05| temp: 1.94936 | loss: 1.06423| constrast_loss: 4.16114| div_loss: 0.95776| %_mask_idx: 0.39881| ppl: 27.03566| %_neg_is_pos: 0.14196| lr: 1e-05| temp: 1.94936 | loss: 1.04349| constrast_loss: 4.07775| div_loss: 0.96209| %_mask_idx: 0.36983| ppl: 24.2643| %_neg_is_pos: 0.14502| lr: 1e-05| temp: 1.94934 | loss: 1.08374| constrast_loss: 4.23963| div_loss: 0.95335| %_mask_idx: 0.40492| ppl: 29.85897| %_neg_is_pos: 0.1078| lr: 1e-05| temp: 1.94934 | loss: 1.07787| constrast_loss: 4.21614| div_loss: 0.95329| %_mask_idx: 0.42763| ppl: 29.8915| %_neg_is_pos: 0.1181| lr: 1e-05| temp: 1.94933 | loss: 1.0584| constrast_loss: 4.13787| div_loss: 0.95713| %_mask_idx: 0.35464| ppl: 27.43594| %_neg_is_pos: 0.11326| lr: 1e-05| temp: 1.94933 | loss: 1.06219| constrast_loss: 4.15275| div_loss: 0.96029| %_mask_idx: 0.33553| ppl: 25.41497| %_neg_is_pos: 0.12253| lr: 1e-05| temp: 1.94932 | loss: 1.06664| constrast_loss: 4.17079| div_loss: 0.95763| %_mask_idx: 0.45724| ppl: 27.11911| %_neg_is_pos: 0.11911| lr: 1e-05| temp: 1.94932 | loss: 1.06395| constrast_loss: 4.15998| div_loss: 0.95825| %_mask_idx: 0.39536| ppl: 26.72009| %_neg_is_pos: 0.10963| lr: 1e-05| temp: 1.94931 | loss: 1.06925| constrast_loss: 4.18123| div_loss: 0.95772| %_mask_idx: 0.40586| ppl: 27.05682| %_neg_is_pos: 0.12606| lr: 1e-05| temp: 1.94931 | loss: 1.05239| constrast_loss: 4.11348| div_loss: 0.96086| %_mask_idx: 0.40648| ppl: 25.04763| %_neg_is_pos: 0.13978| lr: 1e-05| temp: 1.94929 | loss: 1.06968| constrast_loss: 4.18287| div_loss: 0.95867| %_mask_idx: 0.32613| ppl: 26.45304| %_neg_is_pos: 0.10567| lr: 1e-05| temp: 1.94929 | loss: 1.03764| constrast_loss: 4.05436| div_loss: 0.96208| %_mask_idx: 0.38252| ppl: 24.26573| %_neg_is_pos: 0.13361| lr: 1e-05| temp: 1.94928 | loss: 1.04639| constrast_loss: 4.08995| div_loss: 0.95607| %_mask_idx: 0.3761| ppl: 28.11635| %_neg_is_pos: 0.13475| lr: 1e-05| temp: 1.94928 | loss: 1.03819| constrast_loss: 4.05659| div_loss: 0.9616| %_mask_idx: 0.38643| ppl: 24.57488| %_neg_is_pos: 0.15929| lr: 1e-05| temp: 1.94927 | loss: 1.06973| constrast_loss: 4.18303| div_loss: 0.9588| %_mask_idx: 0.4151| ppl: 26.36494| %_neg_is_pos: 0.11612| lr: 1e-05| temp: 1.94927 | loss: 1.06241| constrast_loss: 4.15385| div_loss: 0.95774| %_mask_idx: 0.42184| ppl: 27.04943| %_neg_is_pos: 0.12403| lr: 1e-05| temp: 1.94926 | loss: 1.02323| constrast_loss: 3.99686| div_loss: 0.96058| %_mask_idx: 0.32425| ppl: 25.22593| %_neg_is_pos: 0.15448| lr: 1e-05| temp: 1.94926 | loss: 1.05767| constrast_loss: 4.13516| div_loss: 0.95535| %_mask_idx: 0.38456| ppl: 28.57819| %_neg_is_pos: 0.1318| lr: 1e-05| temp: 1.94924 | loss: 1.04484| constrast_loss: 4.08306| div_loss: 0.96283| %_mask_idx: 0.38816| ppl: 23.79151| %_neg_is_pos: 0.15302| lr: 1e-05| temp: 1.94924 | loss: 1.04046| constrast_loss: 4.06608| div_loss: 0.95758| %_mask_idx: 0.37171| ppl: 27.14937| %_neg_is_pos: 0.15172| lr: 1e-05| temp: 1.94924 | loss: 1.05568| constrast_loss: 4.12684| div_loss: 0.959| %_mask_idx: 0.34007| ppl: 26.23953| %_neg_is_pos: 0.13062| lr: 1e-05| temp: 1.94924 | loss: 1.05835| constrast_loss: 4.13782| div_loss: 0.95587| %_mask_idx: 0.4115| ppl: 28.24511| %_neg_is_pos: 0.13161| lr: 1e-05| temp: 1.94923 | loss: 1.04759| constrast_loss: 4.09418| div_loss: 0.96187| %_mask_idx: 0.38816| ppl: 24.40326| %_neg_is_pos: 0.13754| lr: 1e-05| temp: 1.94923 | loss: 1.08354| constrast_loss: 4.23867| div_loss: 0.95477| %_mask_idx: 0.39615| ppl: 28.9492| %_neg_is_pos: 0.13477| lr: 1e-05| temp: 1.94922 | loss: 1.06963| constrast_loss: 4.18279| div_loss: 0.95712| %_mask_idx: 0.40774| ppl: 27.44399| %_neg_is_pos: 0.1212| lr: 1e-05| temp: 1.94922 | loss: 1.05466| constrast_loss: 4.12294| div_loss: 0.95685| %_mask_idx: 0.41917| ppl: 27.61504| %_neg_is_pos: 0.11723| lr: 1e-05| temp: 1.9492 | loss: 1.06031| constrast_loss: 4.14539| div_loss: 0.95841| %_mask_idx: 0.36294| ppl: 26.62066| %_neg_is_pos: 0.12601| lr: 1e-05| temp: 1.9492 | loss: 1.06279| constrast_loss: 4.15531| div_loss: 0.95837| %_mask_idx: 0.38393| ppl: 26.64159| %_neg_is_pos: 0.12949| lr: 1e-05| temp: 1.94919 | loss: 1.09399| constrast_loss: 4.28031| div_loss: 0.95656| %_mask_idx: 0.43374| ppl: 27.8004| %_neg_is_pos: 0.13265| lr: 1e-05| temp: 1.94919 | loss: 1.05649| constrast_loss: 4.13008| div_loss: 0.95893| %_mask_idx: 0.41306| ppl: 26.28493| %_neg_is_pos: 0.14132| lr: 1e-05| temp: 1.94917 | loss: 1.07079| constrast_loss: 4.18729| div_loss: 0.95861| %_mask_idx: 0.38612| ppl: 26.49089| %_neg_is_pos: 0.1238| lr: 1e-05| temp: 1.94917 | loss: 1.05288| constrast_loss: 4.11585| div_loss: 0.95684| %_mask_idx: 0.37578| ppl: 27.62337| %_neg_is_pos: 0.13289| lr: 1e-05| temp: 1.94916 | loss: 1.05253| constrast_loss: 4.11412| div_loss: 0.96003| %_mask_idx: 0.40648| ppl: 25.58182| %_neg_is_pos: 0.13229| lr: 1e-05| temp: 1.94916 | loss: 1.02328| constrast_loss: 3.99678| div_loss: 0.9633| %_mask_idx: 0.37688| ppl: 23.48686| %_neg_is_pos: 0.16592| lr: 1e-05| temp: 1.94915 | loss: 1.09175| constrast_loss: 4.27177| div_loss: 0.95232| %_mask_idx: 0.40586| ppl: 30.51833| %_neg_is_pos: 0.1197| lr: 1e-05| temp: 1.94915 | loss: 1.08016| constrast_loss: 4.22498| div_loss: 0.95673| %_mask_idx: 0.36404| ppl: 27.69421| %_neg_is_pos: 0.13548| lr: 1e-05| temp: 1.94914 | loss: 1.06311| constrast_loss: 4.15676| div_loss: 0.95673| %_mask_idx: 0.36263| ppl: 27.6908| %_neg_is_pos: 0.14333| lr: 1e-05| temp: 1.94914 | loss: 1.06145| constrast_loss: 4.15004| div_loss: 0.95783| %_mask_idx: 0.41573| ppl: 26.98747| %_neg_is_pos: 0.1238| lr: 1e-05| temp: 1.94912 | loss: 1.07023| constrast_loss: 4.18541| div_loss: 0.95507| %_mask_idx: 0.37187| ppl: 28.75484| %_neg_is_pos: 0.12028| lr: 1e-05| temp: 1.94912 | loss: 1.05416| constrast_loss: 4.12072| div_loss: 0.95905| %_mask_idx: 0.37531| ppl: 26.20888| %_neg_is_pos: 0.12733| lr: 1e-05| temp: 1.94911 | loss: 1.08894| constrast_loss: 4.26051| div_loss: 0.95229| %_mask_idx: 0.40414| ppl: 30.53486| %_neg_is_pos: 0.11576| lr: 1e-05| temp: 1.94911 | loss: 1.05594| constrast_loss: 4.12787| div_loss: 0.95877| %_mask_idx: 0.31375| ppl: 26.38639| %_neg_is_pos: 0.13812| lr: 1e-05| temp: 1.9491 | loss: 1.06716| constrast_loss: 4.17256| div_loss: 0.96077| %_mask_idx: 0.36576| ppl: 25.10472| %_neg_is_pos: 0.11605| lr: 1e-05| temp: 1.9491 | loss: 1.04952| constrast_loss: 4.10197| div_loss: 0.96097| %_mask_idx: 0.32644| ppl: 24.97738| %_neg_is_pos: 0.14276| lr: 1e-05| temp: 1.94909 | loss: 1.03658| constrast_loss: 4.05012| div_loss: 0.96205| %_mask_idx: 0.32895| ppl: 24.28862| %_neg_is_pos: 0.15456| lr: 1e-05| temp: 1.94909 | loss: 1.04827| constrast_loss: 4.09724| div_loss: 0.95839| %_mask_idx: 0.45363| ppl: 26.63153| %_neg_is_pos: 0.15016| lr: 1e-05| temp: 1.94907 | loss: 1.0774| constrast_loss: 4.21397| div_loss: 0.95623| %_mask_idx: 0.41244| ppl: 28.01455| %_neg_is_pos: 0.11751| lr: 1e-05| temp: 1.94907 | loss: 1.08457| constrast_loss: 4.24228| div_loss: 0.95988| %_mask_idx: 0.44956| ppl: 25.67867| %_neg_is_pos: 0.1252| lr: 1e-05| temp: 1.94906 | loss: 1.06683| constrast_loss: 4.17167| div_loss: 0.95648| %_mask_idx: 0.43249| ppl: 27.85495| %_neg_is_pos: 0.12959| lr: 1e-05| temp: 1.94906 | loss: 1.09239| constrast_loss: 4.27416| div_loss: 0.95419| %_mask_idx: 0.40586| ppl: 29.31709| %_neg_is_pos: 0.10346| lr: 1e-05| temp: 1.94905 | loss: 1.09143| constrast_loss: 4.27026| div_loss: 0.95474| %_mask_idx: 0.38503| ppl: 28.96465| %_neg_is_pos: 0.11345| lr: 1e-05| temp: 1.94905 | loss: 1.04825| constrast_loss: 4.09673| div_loss: 0.96263| %_mask_idx: 0.37766| ppl: 23.91865| %_neg_is_pos: 0.12674| lr: 1e-05| temp: 1.94904 | loss: 1.05929| constrast_loss: 4.14149| div_loss: 0.95669| %_mask_idx: 0.35417| ppl: 27.71745| %_neg_is_pos: 0.13626| lr: 1e-05| temp: 1.94904 | loss: 1.04915| constrast_loss: 4.10068| div_loss: 0.95909| %_mask_idx: 0.40742| ppl: 26.18124| %_neg_is_pos: 0.12425| lr: 1e-05| temp: 1.94902 | loss: 1.07532| constrast_loss: 4.20548| div_loss: 0.95782| %_mask_idx: 0.3609| ppl: 26.99738| %_neg_is_pos: 0.13207| lr: 1e-05| temp: 1.94902 | loss: 1.05693| constrast_loss: 4.13176| div_loss: 0.95959| %_mask_idx: 0.33976| ppl: 25.86223| %_neg_is_pos: 0.14492| lr: 1e-05| temp: 1.94901 | loss: 1.04903| constrast_loss: 4.10021| div_loss: 0.95893| %_mask_idx: 0.34336| ppl: 26.28347| %_neg_is_pos: 0.11992| lr: 1e-05| temp: 1.94901 [2021-09-02 08:20:18,576] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1, reducing to 1 [2021-09-02 08:20:18,576] [INFO] [stage2.py:1517:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1, reducing to 1 | loss: 1.05831| constrast_loss: 4.13726| div_loss: 0.95969| %_mask_idx: 0.39677| ppl: 25.79947| %_neg_is_pos: 0.12535| lr: 1e-05| temp: 1.94899 | loss: 1.05281| constrast_loss: 4.11576| div_loss: 0.95489| %_mask_idx: 0.38236| ppl: 28.86925| %_neg_is_pos: 0.13053| lr: 1e-05| temp: 1.94899 | loss: 1.05926| constrast_loss: 4.14091| div_loss: 0.96121| %_mask_idx: 0.41588| ppl: 24.82537| %_neg_is_pos: 0.14386| lr: 1e-05| temp: 1.94898 | loss: 1.06944| constrast_loss: 4.18228| div_loss: 0.95472| %_mask_idx: 0.41745| ppl: 28.97851| %_neg_is_pos: 0.11843| lr: 1e-05| temp: 1.94898 | loss: 1.05426| constrast_loss: 4.1208| div_loss: 0.96257| %_mask_idx: 0.39254| ppl: 23.95463| %_neg_is_pos: 0.1465| lr: 1e-05| temp: 1.94897 | loss: 1.06133| constrast_loss: 4.1493| div_loss: 0.96031| %_mask_idx: 0.35871| ppl: 25.3987| %_neg_is_pos: 0.1271| lr: 1e-05| temp: 1.94897 | loss: 1.06909| constrast_loss: 4.18085| div_loss: 0.95488| %_mask_idx: 0.35119| ppl: 28.87741| %_neg_is_pos: 0.13375| lr: 1e-05| temp: 1.94896 | loss: 1.02983| constrast_loss: 4.02299| div_loss: 0.96347| %_mask_idx: 0.38252| ppl: 23.37786| %_neg_is_pos: 0.15088| lr: 1e-05| temp: 1.94896 | loss: 1.08459| constrast_loss: 4.24289| div_loss: 0.95446| %_mask_idx: 0.42857| ppl: 29.14549| %_neg_is_pos: 0.10352| lr: 1e-05| temp: 1.94894 | loss: 1.0793| constrast_loss: 4.2215| div_loss: 0.95687| %_mask_idx: 0.36967| ppl: 27.6053| %_neg_is_pos: 0.11388| lr: 1e-05| temp: 1.94894 | loss: 1.0467| constrast_loss: 4.09081| div_loss: 0.95979| %_mask_idx: 0.4115| ppl: 25.73587| %_neg_is_pos: 0.14318| lr: 1e-05| temp: 1.94893 | loss: 1.04349| constrast_loss: 4.07763| div_loss: 0.96349| %_mask_idx: 0.39223| ppl: 23.36807| %_neg_is_pos: 0.1335| lr: 1e-05| temp: 1.94893 | loss: 1.04227| constrast_loss: 4.07288| div_loss: 0.9619| %_mask_idx: 0.37108| ppl: 24.38681| %_neg_is_pos: 0.12568| lr: 1e-05| temp: 1.94892 | loss: 1.05114| constrast_loss: 4.1086| div_loss: 0.9597| %_mask_idx: 0.37594| ppl: 25.79041| %_neg_is_pos: 0.13528| lr: 1e-05| temp: 1.94892 | loss: 1.07215| constrast_loss: 4.19292| div_loss: 0.95658| %_mask_idx: 0.37892| ppl: 27.79025| %_neg_is_pos: 0.11447| lr: 1e-05| temp: 1.94891 | loss: 1.06486| constrast_loss: 4.16329| div_loss: 0.9617| %_mask_idx: 0.44956| ppl: 24.51478| %_neg_is_pos: 0.13222| lr: 1e-05| temp: 1.94891 | loss: 1.04569| constrast_loss: 4.08689| div_loss: 0.95868| %_mask_idx: 0.34461| ppl: 26.44169| %_neg_is_pos: 0.15423| lr: 1e-05| temp: 1.94889 | loss: 1.08313| constrast_loss: 4.23671| div_loss: 0.95817| %_mask_idx: 0.42466| ppl: 26.77153| %_neg_is_pos: 0.11786| lr: 1e-05| temp: 1.94889 | loss: 1.06001| constrast_loss: 4.14415| div_loss: 0.95879| %_mask_idx: 0.39568| ppl: 26.37348| %_neg_is_pos: 0.1317| lr: 1e-05| temp: 1.94888 | loss: 1.0384| constrast_loss: 4.05715| div_loss: 0.96438| %_mask_idx: 0.40617| ppl: 22.79714| %_neg_is_pos: 0.14735| lr: 1e-05| temp: 1.94888 | loss: 0.99958| constrast_loss: 3.90145| div_loss: 0.96859| %_mask_idx: 0.3385| ppl: 20.10317| %_neg_is_pos: 0.18454| lr: 1e-05| temp: 1.94887 | loss: 1.05944| constrast_loss: 4.14127| div_loss: 0.96466| %_mask_idx: 0.3761| ppl: 22.61503| %_neg_is_pos: 0.15644| lr: 1e-05| temp: 1.94887 | loss: 1.07327| constrast_loss: 4.1968| div_loss: 0.96268| %_mask_idx: 0.41463| ppl: 23.88408| %_neg_is_pos: 0.14744| lr: 1e-05| temp: 1.94886 | loss: 1.04087| constrast_loss: 4.06725| div_loss: 0.96235| %_mask_idx: 0.36826| ppl: 24.09878| %_neg_is_pos: 0.13249| lr: 1e-05| temp: 1.94886 | loss: 1.03973| constrast_loss: 4.06277| div_loss: 0.96154| %_mask_idx: 0.42481| ppl: 24.61436| %_neg_is_pos: 0.15347| lr: 1e-05| temp: 1.94885 | loss: 1.05095| constrast_loss: 4.10723| div_loss: 0.96575| %_mask_idx: 0.3786| ppl: 21.91718| %_neg_is_pos: 0.16006| lr: 1e-05| temp: 1.94885 | loss: 1.02652| constrast_loss: 4.00937| div_loss: 0.96705| %_mask_idx: 0.40868| ppl: 21.0901| %_neg_is_pos: 0.17908| lr: 1e-05| temp: 1.94884 | loss: 1.04835| constrast_loss: 4.09675| div_loss: 0.96644| %_mask_idx: 0.39442| ppl: 21.47915| %_neg_is_pos: 0.15285| lr: 1e-05| temp: 1.94884 | loss: 1.06253| constrast_loss: 4.15352| div_loss: 0.96598| %_mask_idx: 0.40069| ppl: 21.77239| %_neg_is_pos: 0.16039| lr: 1e-05| temp: 1.94882 | loss: 1.0509| constrast_loss: 4.10711| div_loss: 0.96479| %_mask_idx: 0.38643| ppl: 22.53139| %_neg_is_pos: 0.16271| lr: 1e-05| temp: 1.94882