Setting ds_accelerator to cuda (auto detect) [2023-07-01 08:03:29,654] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. Detected CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7: setting --include=localhost:0,1,2,3,4,5,6,7 [2023-07-01 08:03:29,722] [INFO] [runner.py:555:main] cmd = /home/zhaiyuanzhao/anaconda3/envs/RLHF/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=12346 --enable_each_rank_log=None main.py --data_path /home/zhaiyuanzhao/llm/dataset/rm-static/data --data_split 2,4,4 --actor_model_name_or_path /home/zhaiyuanzhao/code/DeepSpeedExamples-4datasets/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/output-1.3b --critic_model_name_or_path /home/zhaiyuanzhao/code/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/output-RM --num_padding_at_beginning 1 --per_device_train_batch_size 4 --per_device_mini_train_batch_size 4 --generation_batch_numbers 1 --ppo_epochs 1 --max_answer_seq_len 256 --max_prompt_seq_len 256 --actor_learning_rate 9.65e-6 --critic_learning_rate 5e-6 --num_train_epochs 1 --lr_scheduler_type cosine --gradient_accumulation_steps 1 --disable_actor_dropout --num_warmup_steps 100 --deepspeed --seed 1234 --enable_hybrid_engine --actor_zero_stage 2 --critic_zero_stage 2 --enable_ema --output_dir ./output-1.3b-RM_350m-nokl --kl_ctl 0 Setting ds_accelerator to cuda (auto detect) [2023-07-01 08:03:32,113] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]} [2023-07-01 08:03:32,113] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=8, node_rank=0 [2023-07-01 08:03:32,114] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}) [2023-07-01 08:03:32,114] [INFO] [launch.py:163:main] dist_world_size=8 [2023-07-01 08:03:32,114] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 Setting ds_accelerator to cuda (auto detect) Setting ds_accelerator to cuda (auto detect) Setting ds_accelerator to cuda (auto detect) Setting ds_accelerator to cuda (auto detect) Setting ds_accelerator to cuda (auto detect) Setting ds_accelerator to cuda (auto detect) Setting ds_accelerator to cuda (auto detect) Setting ds_accelerator to cuda (auto detect) [2023-07-01 08:03:57,552] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-01 08:03:57,552] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-01 08:03:57,583] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-01 08:03:57,584] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-01 08:03:57,584] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2023-07-01 08:03:57,629] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-01 08:03:57,629] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-01 08:03:57,661] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-01 08:03:57,661] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-01 08:03:57,682] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-01 08:03:57,682] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-01 08:03:57,697] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-01 08:03:57,697] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-01 08:03:57,708] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-01 08:03:57,708] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-01 08:03:57,710] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-01 08:03:57,710] [INFO] [comm.py:594:init_distributed] cdb=None Found cached dataset parquet (/home/zhaiyuanzhao/.cache/huggingface/datasets/parquet/default-d09980a08a1dbd7c/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) 0%| | 0/2 [00:00 [2023-07-01 08:05:28,577] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer [2023-07-01 08:05:28,577] [INFO] [stage_1_and_2.py:133:__init__] Reduce bucket size 500,000,000 [2023-07-01 08:05:28,577] [INFO] [stage_1_and_2.py:134:__init__] Allgather bucket size 500,000,000 [2023-07-01 08:05:28,577] [INFO] [stage_1_and_2.py:135:__init__] CPU Offload: False [2023-07-01 08:05:28,577] [INFO] [stage_1_and_2.py:136:__init__] Round robin gradient partitioning: False Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Loading extension module utils... Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... Loading extension module utils... Loading extension module utils...Loading extension module utils... Time to load utils op: 0.581378698348999 secondsTime to load utils op: 0.5792138576507568 secondsTime to load utils op: 0.5809986591339111 secondsTime to load utils op: 0.5794942378997803 secondsTime to load utils op: 0.5797784328460693 secondsTime to load utils op: 0.5806789398193359 seconds Time to load utils op: 0.5814478397369385 seconds Time to load utils op: 0.5814688205718994 seconds Rank: 1 partition count [8, 8] and sizes[(164401920, False), (67840, False)] Rank: 6 partition count [8, 8] and sizes[(164401920, False), (67840, False)] Rank: 7 partition count [8, 8] and sizes[(164401920, False), (67840, False)] Rank: 2 partition count [8, 8] and sizes[(164401920, False), (67840, False)] Rank: 5 partition count [8, 8] and sizes[(164401920, False), (67840, False)] Rank: 3 partition count [8, 8] and sizes[(164401920, False), (67840, False)] Rank: 4 partition count [8, 8] and sizes[(164401920, False), (67840, False)] Rank: 0 partition count [8, 8] and sizes[(164401920, False), (67840, False)] Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Loading extension module utils... Time to load utils op: 0.0009922981262207031 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0007748603820800781 seconds No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0008537769317626953 seconds No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils...Loading extension module utils... Time to load utils op: 0.0007731914520263672 seconds Time to load utils op: 0.0008482933044433594 seconds Time to load utils op: 0.0007557868957519531 seconds No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.002641916275024414 seconds [2023-07-01 08:05:38,577] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states [2023-07-01 08:05:38,578] [INFO] [utils.py:786:see_memory_usage] MA 3.06 GB Max_MA 3.06 GB CA 3.07 GB Max_CA 3 GB [2023-07-01 08:05:38,578] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 37.52 GB, percent = 3.7% [2023-07-01 08:05:38,729] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states [2023-07-01 08:05:38,730] [INFO] [utils.py:786:see_memory_usage] MA 4.29 GB Max_MA 4.91 GB CA 4.91 GB Max_CA 5 GB [2023-07-01 08:05:38,730] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 37.52 GB, percent = 3.7% [2023-07-01 08:05:38,730] [INFO] [stage_1_and_2.py:489:__init__] optimizer state initialized [2023-07-01 08:05:38,872] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer [2023-07-01 08:05:38,873] [INFO] [utils.py:786:see_memory_usage] MA 4.29 GB Max_MA 4.29 GB CA 4.91 GB Max_CA 5 GB [2023-07-01 08:05:38,873] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 37.52 GB, percent = 3.7% [2023-07-01 08:05:38,875] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2023-07-01 08:05:38,875] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler [2023-07-01 08:05:38,875] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2023-07-01 08:05:38,875] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:05:38,876] [INFO] [config.py:960:print] DeepSpeedEngine configuration: [2023-07-01 08:05:38,876] [INFO] [config.py:964:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2023-07-01 08:05:38,876] [INFO] [config.py:964:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2023-07-01 08:05:38,876] [INFO] [config.py:964:print] amp_enabled .................. False [2023-07-01 08:05:38,876] [INFO] [config.py:964:print] amp_params ................... False [2023-07-01 08:05:38,876] [INFO] [config.py:964:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] bfloat16_enabled ............. False [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] checkpoint_parallel_write_pipeline False [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] checkpoint_tag_validation_enabled True [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] checkpoint_tag_validation_fail False [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] comms_config ................. [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] communication_data_type ...... None [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] curriculum_enabled_legacy .... False [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] curriculum_params_legacy ..... False [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] data_efficiency_enabled ...... False [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] dataloader_drop_last ......... False [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] disable_allgather ............ False [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] dump_state ................... False [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1} [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] eigenvalue_enabled ........... False [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] eigenvalue_gas_boundary_resolution 1 [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] eigenvalue_layer_name ........ bert.encoder.layer [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] eigenvalue_layer_num ......... 0 [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] eigenvalue_max_iter .......... 100 [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] eigenvalue_stability ......... 1e-06 [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] eigenvalue_tol ............... 0.01 [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] eigenvalue_verbose ........... False [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] elasticity_enabled ........... False [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] fp16_auto_cast ............... False [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] fp16_enabled ................. True [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] fp16_master_weights_and_gradients False [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] global_rank .................. 0 [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] grad_accum_dtype ............. None [2023-07-01 08:05:38,877] [INFO] [config.py:964:print] gradient_accumulation_steps .. 1 [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] gradient_clipping ............ 1.0 [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] gradient_predivide_factor .... 1.0 [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] hybrid_engine ................ enabled=True max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] initial_dynamic_scale ........ 65536 [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] load_universal_checkpoint .... False [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] loss_scale ................... 0 [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] memory_breakdown ............. False [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] mics_hierarchial_params_gather False [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] mics_shard_size .............. -1 [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] optimizer_legacy_fusion ...... False [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] optimizer_name ............... None [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] optimizer_params ............. None [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] pld_enabled .................. False [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] pld_params ................... False [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] prescale_gradients ........... False [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] scheduler_name ............... None [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] scheduler_params ............. None [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] sparse_attention ............. None [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] sparse_gradients_enabled ..... False [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] steps_per_print .............. 10 [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] train_batch_size ............. 32 [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] train_micro_batch_size_per_gpu 4 [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] use_node_local_storage ....... False [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] wall_clock_breakdown ......... False [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] world_size ................... 8 [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] zero_allow_untested_optimizer False [2023-07-01 08:05:38,878] [INFO] [config.py:964:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False [2023-07-01 08:05:38,879] [INFO] [config.py:964:print] zero_enabled ................. True [2023-07-01 08:05:38,879] [INFO] [config.py:964:print] zero_force_ds_cpu_optimizer .. True [2023-07-01 08:05:38,879] [INFO] [config.py:964:print] zero_optimization_stage ...... 2 [2023-07-01 08:05:38,879] [INFO] [config.py:950:print_user_config] json = { "train_batch_size": 32, "train_micro_batch_size_per_gpu": 4, "steps_per_print": 10, "zero_optimization": { "stage": 2, "offload_param": { "device": "none" }, "offload_optimizer": { "device": "none" }, "stage3_param_persistence_threshold": 1.000000e+04, "stage3_max_live_parameters": 3.000000e+07, "stage3_prefetch_bucket_size": 3.000000e+07, "memory_efficient_linear": false }, "fp16": { "enabled": true, "loss_scale_window": 100 }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false, "hybrid_engine": { "enabled": true, "max_out_tokens": 512, "inference_tp_size": 1, "release_inference_cache": false, "pin_parameters": true, "tp_gather_partition_size": 8 } } Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0009009838104248047 seconds huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117/transformer_inference/build.ninja... Building extension module transformer_inference... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module transformer_inference... Loading extension module transformer_inference...Loading extension module transformer_inference...Loading extension module transformer_inference...Loading extension module transformer_inference... Loading extension module transformer_inference... Loading extension module transformer_inference...Loading extension module transformer_inference... Time to load transformer_inference op: 1.1420021057128906 secondsTime to load transformer_inference op: 1.1416473388671875 secondsTime to load transformer_inference op: 1.1401557922363281 seconds Time to load transformer_inference op: 1.128908395767212 seconds Time to load transformer_inference op: 1.1238932609558105 seconds Time to load transformer_inference op: 1.1305203437805176 seconds Time to load transformer_inference op: 1.1278560161590576 seconds Time to load transformer_inference op: 1.1280443668365479 seconds [2023-07-01 08:05:40,500] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 2048, 'intermediate_size': 8192, 'heads': 32, 'num_hidden_layers': -1, 'dtype': torch.float16, 'pre_layer_norm': True, 'norm_type': , 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': , 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 512, 'min_out_tokens': 512, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': True, 'transposed_mode': True} huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.04882502555847168 seconds Time to load transformer_inference op: 0.04806041717529297 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.05038714408874512 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.05112457275390625 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.05184316635131836 seconds No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.05165362358093262 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.05492138862609863 seconds No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.05556344985961914 seconds huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Time to load transformer_inference op: 0.04999732971191406 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.04800009727478027 seconds No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.0471189022064209 seconds huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.04591679573059082 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.05129432678222656 secondsUsing /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.05254411697387695 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.05318403244018555 seconds ******************[end] Initialized Actor Model [end] (duration: 49.50s)****************** *************************[start] Initializing Ref Model [start] ************************** Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.055281877517700195 seconds model loaded [2023-07-01 08:05:57,456] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.3, git-hash=unknown, git-branch=unknown model loaded model loaded model loaded model loaded model loaded model loaded model loaded [2023-07-01 08:06:08,170] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2023-07-01 08:06:08,172] [INFO] [config.py:960:print] DeepSpeedEngine configuration: [2023-07-01 08:06:08,173] [INFO] [config.py:964:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2023-07-01 08:06:08,173] [INFO] [config.py:964:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2023-07-01 08:06:08,173] [INFO] [config.py:964:print] amp_enabled .................. False [2023-07-01 08:06:08,173] [INFO] [config.py:964:print] amp_params ................... False [2023-07-01 08:06:08,173] [INFO] [config.py:964:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2023-07-01 08:06:08,173] [INFO] [config.py:964:print] bfloat16_enabled ............. False [2023-07-01 08:06:08,173] [INFO] [config.py:964:print] checkpoint_parallel_write_pipeline False [2023-07-01 08:06:08,173] [INFO] [config.py:964:print] checkpoint_tag_validation_enabled True [2023-07-01 08:06:08,173] [INFO] [config.py:964:print] checkpoint_tag_validation_fail False [2023-07-01 08:06:08,173] [INFO] [config.py:964:print] comms_config ................. [2023-07-01 08:06:08,173] [INFO] [config.py:964:print] communication_data_type ...... None [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] curriculum_enabled_legacy .... False [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] curriculum_params_legacy ..... False [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] data_efficiency_enabled ...... False [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] dataloader_drop_last ......... False [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] disable_allgather ............ False [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] dump_state ................... False [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] dynamic_loss_scale_args ...... None [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] eigenvalue_enabled ........... False [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] eigenvalue_gas_boundary_resolution 1 [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] eigenvalue_layer_name ........ bert.encoder.layer [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] eigenvalue_layer_num ......... 0 [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] eigenvalue_max_iter .......... 100 [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] eigenvalue_stability ......... 1e-06 [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] eigenvalue_tol ............... 0.01 [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] eigenvalue_verbose ........... False [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] elasticity_enabled ........... False [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] fp16_auto_cast ............... False [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] fp16_enabled ................. True [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] fp16_master_weights_and_gradients False [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] global_rank .................. 0 [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] grad_accum_dtype ............. None [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] gradient_accumulation_steps .. 1 [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] gradient_clipping ............ 1.0 [2023-07-01 08:06:08,174] [INFO] [config.py:964:print] gradient_predivide_factor .... 1.0 [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] initial_dynamic_scale ........ 65536 [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] load_universal_checkpoint .... False [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] loss_scale ................... 0 [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] memory_breakdown ............. False [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] mics_hierarchial_params_gather False [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] mics_shard_size .............. -1 [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] optimizer_legacy_fusion ...... False [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] optimizer_name ............... None [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] optimizer_params ............. None [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] pld_enabled .................. False [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] pld_params ................... False [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] prescale_gradients ........... False [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] scheduler_name ............... None [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] scheduler_params ............. None [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] sparse_attention ............. None [2023-07-01 08:06:08,175] [INFO] [config.py:964:print] sparse_gradients_enabled ..... False [2023-07-01 08:06:08,176] [INFO] [config.py:964:print] steps_per_print .............. 10 [2023-07-01 08:06:08,176] [INFO] [config.py:964:print] train_batch_size ............. 32 [2023-07-01 08:06:08,176] [INFO] [config.py:964:print] train_micro_batch_size_per_gpu 4 [2023-07-01 08:06:08,176] [INFO] [config.py:964:print] use_node_local_storage ....... False [2023-07-01 08:06:08,176] [INFO] [config.py:964:print] wall_clock_breakdown ......... False [2023-07-01 08:06:08,176] [INFO] [config.py:964:print] world_size ................... 8 [2023-07-01 08:06:08,176] [INFO] [config.py:964:print] zero_allow_untested_optimizer False [2023-07-01 08:06:08,176] [INFO] [config.py:964:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False [2023-07-01 08:06:08,176] [INFO] [config.py:964:print] zero_enabled ................. False [2023-07-01 08:06:08,176] [INFO] [config.py:964:print] zero_force_ds_cpu_optimizer .. True [2023-07-01 08:06:08,176] [INFO] [config.py:964:print] zero_optimization_stage ...... 0 [2023-07-01 08:06:08,176] [INFO] [config.py:950:print_user_config] json = { "train_batch_size": 32, "train_micro_batch_size_per_gpu": 4, "steps_per_print": 10, "zero_optimization": { "stage": 0, "stage3_param_persistence_threshold": 1.000000e+04, "offload_param": { "device": "none" }, "memory_efficient_linear": false }, "fp16": { "enabled": true }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false } Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.002164125442504883 seconds *******************[end] Initialized Ref Model [end] (duration: 27.32s)******************* *************************[start] Initializing EMA Model [start] ************************** Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.001653432846069336 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.002496004104614258 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0014553070068359375 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0012822151184082031 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0011990070343017578 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0014603137969970703 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.002300739288330078 seconds model loaded [2023-07-01 08:06:24,453] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.3, git-hash=unknown, git-branch=unknown model loaded model loaded model loaded model loaded model loaded model loaded model loaded [2023-07-01 08:06:34,732] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2023-07-01 08:06:34,733] [INFO] [config.py:960:print] DeepSpeedEngine configuration: [2023-07-01 08:06:34,734] [INFO] [config.py:964:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2023-07-01 08:06:34,734] [INFO] [config.py:964:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2023-07-01 08:06:34,734] [INFO] [config.py:964:print] amp_enabled .................. False [2023-07-01 08:06:34,734] [INFO] [config.py:964:print] amp_params ................... False [2023-07-01 08:06:34,734] [INFO] [config.py:964:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2023-07-01 08:06:34,734] [INFO] [config.py:964:print] bfloat16_enabled ............. False [2023-07-01 08:06:34,734] [INFO] [config.py:964:print] checkpoint_parallel_write_pipeline False [2023-07-01 08:06:34,734] [INFO] [config.py:964:print] checkpoint_tag_validation_enabled True [2023-07-01 08:06:34,734] [INFO] [config.py:964:print] checkpoint_tag_validation_fail False [2023-07-01 08:06:34,734] [INFO] [config.py:964:print] comms_config ................. [2023-07-01 08:06:34,734] [INFO] [config.py:964:print] communication_data_type ...... None [2023-07-01 08:06:34,734] [INFO] [config.py:964:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2023-07-01 08:06:34,734] [INFO] [config.py:964:print] curriculum_enabled_legacy .... False [2023-07-01 08:06:34,734] [INFO] [config.py:964:print] curriculum_params_legacy ..... False [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] data_efficiency_enabled ...... False [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] dataloader_drop_last ......... False [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] disable_allgather ............ False [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] dump_state ................... False [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] dynamic_loss_scale_args ...... None [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] eigenvalue_enabled ........... False [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] eigenvalue_gas_boundary_resolution 1 [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] eigenvalue_layer_name ........ bert.encoder.layer [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] eigenvalue_layer_num ......... 0 [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] eigenvalue_max_iter .......... 100 [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] eigenvalue_stability ......... 1e-06 [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] eigenvalue_tol ............... 0.01 [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] eigenvalue_verbose ........... False [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] elasticity_enabled ........... False [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] fp16_auto_cast ............... False [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] fp16_enabled ................. True [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] fp16_master_weights_and_gradients False [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] global_rank .................. 0 [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] grad_accum_dtype ............. None [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] gradient_accumulation_steps .. 1 [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] gradient_clipping ............ 1.0 [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] gradient_predivide_factor .... 1.0 [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] initial_dynamic_scale ........ 65536 [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] load_universal_checkpoint .... False [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] loss_scale ................... 0 [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] memory_breakdown ............. False [2023-07-01 08:06:34,735] [INFO] [config.py:964:print] mics_hierarchial_params_gather False [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] mics_shard_size .............. -1 [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] optimizer_legacy_fusion ...... False [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] optimizer_name ............... None [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] optimizer_params ............. None [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] pld_enabled .................. False [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] pld_params ................... False [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] prescale_gradients ........... False [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] scheduler_name ............... None [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] scheduler_params ............. None [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] sparse_attention ............. None [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] sparse_gradients_enabled ..... False [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] steps_per_print .............. 10 [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] train_batch_size ............. 32 [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] train_micro_batch_size_per_gpu 4 [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] use_node_local_storage ....... False [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] wall_clock_breakdown ......... False [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] world_size ................... 8 [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] zero_allow_untested_optimizer False [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] zero_enabled ................. False [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] zero_force_ds_cpu_optimizer .. True [2023-07-01 08:06:34,736] [INFO] [config.py:964:print] zero_optimization_stage ...... 0 [2023-07-01 08:06:34,737] [INFO] [config.py:950:print_user_config] json = { "train_batch_size": 32, "train_micro_batch_size_per_gpu": 4, "steps_per_print": 10, "zero_optimization": { "stage": 0, "stage3_param_persistence_threshold": 1.000000e+04, "offload_param": { "device": "none" }, "memory_efficient_linear": false }, "fp16": { "enabled": true }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false } Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.006189823150634766 seconds *******************[end] Initialized EMA Model [end] (duration: 26.56s)******************* ************************[start] Initializing Critic Model [start] ************************ Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0012662410736083984 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0014331340789794922 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0011889934539794922 seconds No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0016870498657226562 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0011091232299804688 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0018308162689208984 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0019044876098632812 seconds model loaded model loaded model loaded model loaded model loaded model loaded model loaded model loaded Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module fused_adam, skipping build step... Loading extension module fused_adam... Time to load fused_adam op: 0.0026984214782714844 seconds Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module fused_adam, skipping build step... Loading extension module fused_adam... Time to load fused_adam op: 0.0019664764404296875 seconds huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module fused_adam, skipping build step... Loading extension module fused_adam... Time to load fused_adam op: 0.009085655212402344 seconds Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module fused_adam, skipping build step... Loading extension module fused_adam... Time to load fused_adam op: 0.008561134338378906 seconds No modifications detected for re-loaded extension module fused_adam, skipping build step... Loading extension module fused_adam... Time to load fused_adam op: 0.009094715118408203 seconds Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module fused_adam, skipping build step... Loading extension module fused_adam... Time to load fused_adam op: 0.0236053466796875 seconds Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module fused_adam, skipping build step... Loading extension module fused_adam... Time to load fused_adam op: 0.023058176040649414 seconds [2023-07-01 08:06:52,419] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.3, git-hash=unknown, git-branch=unknown No modifications detected for re-loaded extension module fused_adam, skipping build step... Loading extension module fused_adam... Time to load fused_adam op: 0.03352189064025879 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.008983373641967773 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0023987293243408203 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0010488033294677734 seconds [2023-07-01 08:07:01,952] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2023-07-01 08:07:01,953] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer [2023-07-01 08:07:01,953] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0007925033569335938 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.002661466598510742 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0010764598846435547 seconds [2023-07-01 08:07:01,970] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam [2023-07-01 08:07:01,970] [INFO] [utils.py:54:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2023-07-01 08:07:01,970] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer [2023-07-01 08:07:01,970] [INFO] [stage_1_and_2.py:133:__init__] Reduce bucket size 500,000,000 [2023-07-01 08:07:01,970] [INFO] [stage_1_and_2.py:134:__init__] Allgather bucket size 500,000,000 [2023-07-01 08:07:01,970] [INFO] [stage_1_and_2.py:135:__init__] CPU Offload: False [2023-07-01 08:07:01,970] [INFO] [stage_1_and_2.py:136:__init__] Round robin gradient partitioning: False Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0009701251983642578 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0022344589233398438 seconds Rank: 4 partition count [8, 8] and sizes[(41365824, False), (33792, False)] Rank: 7 partition count [8, 8] and sizes[(41365824, False), (33792, False)] Rank: 5 partition count [8, 8] and sizes[(41365824, False), (33792, False)] Rank: 2 partition count [8, 8] and sizes[(41365824, False), (33792, False)] Rank: 1 partition count [8, 8] and sizes[(41365824, False), (33792, False)] Rank: 0 partition count [8, 8] and sizes[(41365824, False), (33792, False)] Rank: 6 partition count [8, 8] and sizes[(41365824, False), (33792, False)] Rank: 3 partition count [8, 8] and sizes[(41365824, False), (33792, False)] Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Time to load utils op: 0.001149892807006836 seconds No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... No modifications detected for re-loaded extension module utils, skipping build step... Time to load utils op: 0.0009431838989257812 secondsLoading extension module utils... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Time to load utils op: 0.0007071495056152344 secondsUsing /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Time to load utils op: 0.0010597705841064453 seconds No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.00122833251953125 seconds No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0017001628875732422 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0030956268310546875 seconds [2023-07-01 08:07:10,098] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states [2023-07-01 08:07:10,099] [INFO] [utils.py:786:see_memory_usage] MA 10.58 GB Max_MA 10.58 GB CA 10.97 GB Max_CA 11 GB [2023-07-01 08:07:10,099] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 59.17 GB, percent = 5.9% [2023-07-01 08:07:10,364] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states [2023-07-01 08:07:10,364] [INFO] [utils.py:786:see_memory_usage] MA 10.89 GB Max_MA 11.05 GB CA 11.43 GB Max_CA 11 GB [2023-07-01 08:07:10,365] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 60.12 GB, percent = 6.0% [2023-07-01 08:07:10,365] [INFO] [stage_1_and_2.py:489:__init__] optimizer state initialized [2023-07-01 08:07:10,627] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer [2023-07-01 08:07:10,628] [INFO] [utils.py:786:see_memory_usage] MA 10.89 GB Max_MA 10.89 GB CA 11.43 GB Max_CA 11 GB [2023-07-01 08:07:10,628] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 61.06 GB, percent = 6.1% [2023-07-01 08:07:10,630] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2023-07-01 08:07:10,630] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler [2023-07-01 08:07:10,630] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2023-07-01 08:07:10,630] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:07:10,631] [INFO] [config.py:960:print] DeepSpeedEngine configuration: [2023-07-01 08:07:10,631] [INFO] [config.py:964:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2023-07-01 08:07:10,631] [INFO] [config.py:964:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2023-07-01 08:07:10,631] [INFO] [config.py:964:print] amp_enabled .................. False [2023-07-01 08:07:10,631] [INFO] [config.py:964:print] amp_params ................... False [2023-07-01 08:07:10,631] [INFO] [config.py:964:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2023-07-01 08:07:10,631] [INFO] [config.py:964:print] bfloat16_enabled ............. False [2023-07-01 08:07:10,631] [INFO] [config.py:964:print] checkpoint_parallel_write_pipeline False [2023-07-01 08:07:10,631] [INFO] [config.py:964:print] checkpoint_tag_validation_enabled True [2023-07-01 08:07:10,631] [INFO] [config.py:964:print] checkpoint_tag_validation_fail False [2023-07-01 08:07:10,631] [INFO] [config.py:964:print] comms_config ................. [2023-07-01 08:07:10,631] [INFO] [config.py:964:print] communication_data_type ...... None [2023-07-01 08:07:10,631] [INFO] [config.py:964:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] curriculum_enabled_legacy .... False [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] curriculum_params_legacy ..... False [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] data_efficiency_enabled ...... False [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] dataloader_drop_last ......... False [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] disable_allgather ............ False [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] dump_state ................... False [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1} [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] eigenvalue_enabled ........... False [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] eigenvalue_gas_boundary_resolution 1 [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] eigenvalue_layer_name ........ bert.encoder.layer [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] eigenvalue_layer_num ......... 0 [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] eigenvalue_max_iter .......... 100 [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] eigenvalue_stability ......... 1e-06 [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] eigenvalue_tol ............... 0.01 [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] eigenvalue_verbose ........... False [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] elasticity_enabled ........... False [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] fp16_auto_cast ............... False [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] fp16_enabled ................. True [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] fp16_master_weights_and_gradients False [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] global_rank .................. 0 [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] grad_accum_dtype ............. None [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] gradient_accumulation_steps .. 1 [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] gradient_clipping ............ 1.0 [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] gradient_predivide_factor .... 1.0 [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] initial_dynamic_scale ........ 65536 [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] load_universal_checkpoint .... False [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] loss_scale ................... 0 [2023-07-01 08:07:10,632] [INFO] [config.py:964:print] memory_breakdown ............. False [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] mics_hierarchial_params_gather False [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] mics_shard_size .............. -1 [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] optimizer_legacy_fusion ...... False [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] optimizer_name ............... None [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] optimizer_params ............. None [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] pld_enabled .................. False [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] pld_params ................... False [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] prescale_gradients ........... False [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] scheduler_name ............... None [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] scheduler_params ............. None [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] sparse_attention ............. None [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] sparse_gradients_enabled ..... False [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] steps_per_print .............. 10 [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] train_batch_size ............. 32 [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] train_micro_batch_size_per_gpu 4 [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] use_node_local_storage ....... False [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] wall_clock_breakdown ......... False [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] world_size ................... 8 [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] zero_allow_untested_optimizer False [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] zero_enabled ................. True [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] zero_force_ds_cpu_optimizer .. True [2023-07-01 08:07:10,633] [INFO] [config.py:964:print] zero_optimization_stage ...... 2 [2023-07-01 08:07:10,633] [INFO] [config.py:950:print_user_config] json = { "train_batch_size": 32, "train_micro_batch_size_per_gpu": 4, "steps_per_print": 10, "zero_optimization": { "stage": 2, "offload_param": { "device": "none" }, "offload_optimizer": { "device": "none" }, "stage3_param_persistence_threshold": 1.000000e+04, "stage3_max_live_parameters": 3.000000e+07, "stage3_prefetch_bucket_size": 3.000000e+07, "memory_efficient_linear": false }, "fp16": { "enabled": true, "loss_scale_window": 100 }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false, "hybrid_engine": { "enabled": false, "max_out_tokens": 512, "inference_tp_size": 1, "release_inference_cache": false, "pin_parameters": true, "tp_gather_partition_size": 8 } } Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0009529590606689453 seconds *****************[end] Initialized Critic Model [end] (duration: 35.89s)****************** ************************[start] Initializing Reward Model [start] ************************ model loaded model loaded model loaded model loaded model loaded model loaded model loaded model loaded [2023-07-01 08:07:23,866] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.3, git-hash=unknown, git-branch=unknown Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0057489871978759766 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0014493465423583984 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0013382434844970703 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0013589859008789062 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0016677379608154297 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0011448860168457031 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0011296272277832031 seconds [2023-07-01 08:07:31,957] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2023-07-01 08:07:31,958] [INFO] [config.py:960:print] DeepSpeedEngine configuration: [2023-07-01 08:07:31,958] [INFO] [config.py:964:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2023-07-01 08:07:31,958] [INFO] [config.py:964:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2023-07-01 08:07:31,958] [INFO] [config.py:964:print] amp_enabled .................. False [2023-07-01 08:07:31,958] [INFO] [config.py:964:print] amp_params ................... False [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] bfloat16_enabled ............. False [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] checkpoint_parallel_write_pipeline False [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] checkpoint_tag_validation_enabled True [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] checkpoint_tag_validation_fail False [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] comms_config ................. [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] communication_data_type ...... None [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] curriculum_enabled_legacy .... False [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] curriculum_params_legacy ..... False [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] data_efficiency_enabled ...... False [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] dataloader_drop_last ......... False [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] disable_allgather ............ False [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] dump_state ................... False [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] dynamic_loss_scale_args ...... None [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] eigenvalue_enabled ........... False [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] eigenvalue_gas_boundary_resolution 1 [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] eigenvalue_layer_name ........ bert.encoder.layer [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] eigenvalue_layer_num ......... 0 [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] eigenvalue_max_iter .......... 100 [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] eigenvalue_stability ......... 1e-06 [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] eigenvalue_tol ............... 0.01 [2023-07-01 08:07:31,959] [INFO] [config.py:964:print] eigenvalue_verbose ........... False [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] elasticity_enabled ........... False [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] fp16_auto_cast ............... False [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] fp16_enabled ................. True [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] fp16_master_weights_and_gradients False [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] global_rank .................. 0 [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] grad_accum_dtype ............. None [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] gradient_accumulation_steps .. 1 [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] gradient_clipping ............ 1.0 [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] gradient_predivide_factor .... 1.0 [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] initial_dynamic_scale ........ 65536 [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] load_universal_checkpoint .... False [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] loss_scale ................... 0 [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] memory_breakdown ............. False [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] mics_hierarchial_params_gather False [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] mics_shard_size .............. -1 [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] optimizer_legacy_fusion ...... False [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] optimizer_name ............... None [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] optimizer_params ............. None [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] pld_enabled .................. False [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] pld_params ................... False [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] prescale_gradients ........... False [2023-07-01 08:07:31,960] [INFO] [config.py:964:print] scheduler_name ............... None [2023-07-01 08:07:31,961] [INFO] [config.py:964:print] scheduler_params ............. None [2023-07-01 08:07:31,961] [INFO] [config.py:964:print] sparse_attention ............. None [2023-07-01 08:07:31,961] [INFO] [config.py:964:print] sparse_gradients_enabled ..... False [2023-07-01 08:07:31,961] [INFO] [config.py:964:print] steps_per_print .............. 10 [2023-07-01 08:07:31,961] [INFO] [config.py:964:print] train_batch_size ............. 32 [2023-07-01 08:07:31,961] [INFO] [config.py:964:print] train_micro_batch_size_per_gpu 4 [2023-07-01 08:07:31,961] [INFO] [config.py:964:print] use_node_local_storage ....... False [2023-07-01 08:07:31,961] [INFO] [config.py:964:print] wall_clock_breakdown ......... False [2023-07-01 08:07:31,961] [INFO] [config.py:964:print] world_size ................... 8 [2023-07-01 08:07:31,961] [INFO] [config.py:964:print] zero_allow_untested_optimizer False [2023-07-01 08:07:31,961] [INFO] [config.py:964:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False [2023-07-01 08:07:31,961] [INFO] [config.py:964:print] zero_enabled ................. False [2023-07-01 08:07:31,961] [INFO] [config.py:964:print] zero_force_ds_cpu_optimizer .. True [2023-07-01 08:07:31,961] [INFO] [config.py:964:print] zero_optimization_stage ...... 0 [2023-07-01 08:07:31,961] [INFO] [config.py:950:print_user_config] json = { "train_batch_size": 32, "train_micro_batch_size_per_gpu": 4, "steps_per_print": 10, "zero_optimization": { "stage": 0, "stage3_param_persistence_threshold": 1.000000e+04, "offload_param": { "device": "none" }, "memory_efficient_linear": false }, "fp16": { "enabled": true }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false } Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0013725757598876953 seconds *****************[end] Initialized Reward Model [end] (duration: 21.33s)****************** ***** Running training ***** Beginning of Epoch 1/1, Total Generation Batches 954 ------------------------------------------------------ Free memory : 26.453308 (GigaBytes) Total memory: 39.586121 (GigaBytes) Requested memory: 1.031250 (GigaBytes) Setting maximum total tokens (input + output) to 512 WorkSpace: 0x2b4ed0000000 ------------------------------------------------------ [2023-07-01 08:07:36,757] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 [2023-07-01 08:07:36,921] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 epoch: 0|step: 0|ppo_ep: 1|act_loss: 0.00479888916015625|cri_loss: 0.201416015625|unsuper_loss: 0.0 average reward score: -1.482421875 ------------------------------------------------------------------------------------- |E2E latency=4.93s |Gather latency=0.00s (0.00%) |Generate time=3.91s (79.35%) |Training time=0.83s (16.80%) |Others=0.19 (3.85%)|CurSamplesPerSec=6.49 |AvgSamplesPerSec=6.49 [2023-07-01 08:07:39,092] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 [2023-07-01 08:07:39,251] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 epoch: 0|step: 1|ppo_ep: 1|act_loss: -0.30029296875|cri_loss: 1.76953125|unsuper_loss: 0.0 average reward score: -3.720703125 ------------------------------------------------------------------------------------- |E2E latency=2.33s |Gather latency=0.00s (0.00%) |Generate time=1.52s (65.40%) |Training time=0.62s (26.43%) |Others=0.19 (8.18%)|CurSamplesPerSec=13.73 |AvgSamplesPerSec=8.82 [2023-07-01 08:07:41,420] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384 [2023-07-01 08:07:41,581] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384 epoch: 0|step: 2|ppo_ep: 1|act_loss: -0.11614990234375|cri_loss: 0.6455078125|unsuper_loss: 0.0 average reward score: -1.78515625 ------------------------------------------------------------------------------------- |E2E latency=2.33s |Gather latency=0.00s (0.00%) |Generate time=1.53s (65.56%) |Training time=0.62s (26.51%) |Others=0.18 (7.93%)|CurSamplesPerSec=13.73 |AvgSamplesPerSec=10.01 [2023-07-01 08:07:44,100] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192 epoch: 0|step: 3|ppo_ep: 1|act_loss: -0.0853271484375|cri_loss: 0.2236328125|unsuper_loss: 0.0 average reward score: 0.70947265625 ------------------------------------------------------------------------------------- |E2E latency=2.52s |Gather latency=0.00s (0.00%) |Generate time=1.53s (60.77%) |Training time=0.80s (31.96%) |Others=0.18 (7.27%)|CurSamplesPerSec=12.72 |AvgSamplesPerSec=10.57 [2023-07-01 08:07:46,255] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192 epoch: 0|step: 4|ppo_ep: 1|act_loss: -0.032318115234375|cri_loss: 0.200439453125|unsuper_loss: 0.0 average reward score: -0.22509765625 ------------------------------------------------------------------------------------- |E2E latency=2.36s |Gather latency=0.00s (0.00%) |Generate time=1.52s (64.56%) |Training time=0.61s (25.79%) |Others=0.23 (9.66%)|CurSamplesPerSec=13.56 |AvgSamplesPerSec=11.06 epoch: 0|step: 5|ppo_ep: 1|act_loss: -0.345458984375|cri_loss: 1.0078125|unsuper_loss: 0.0 average reward score: -0.45458984375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.63%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=11.28 [2023-07-01 08:07:51,512] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096 epoch: 0|step: 6|ppo_ep: 1|act_loss: 0.09063720703125|cri_loss: 0.2022705078125|unsuper_loss: 0.0 average reward score: 0.46240234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.52s (60.65%) |Training time=0.80s (31.99%) |Others=0.18 (7.36%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=11.48 epoch: 0|step: 7|ppo_ep: 1|act_loss: 0.14013671875|cri_loss: 0.08990478515625|unsuper_loss: 0.0 average reward score: -1.755859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.80s (31.64%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=11.61 [2023-07-01 08:07:56,194] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096 epoch: 0|step: 8|ppo_ep: 1|act_loss: -0.0853271484375|cri_loss: 0.61572265625|unsuper_loss: 0.0 average reward score: -0.556640625 ------------------------------------------------------------------------------------- |E2E latency=2.35s |Gather latency=0.00s (0.00%) |Generate time=1.51s (64.14%) |Training time=0.62s (26.23%) |Others=0.23 (9.64%)|CurSamplesPerSec=13.62 |AvgSamplesPerSec=11.80 [2023-07-01 08:07:58,553] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=5, lr=[4.825e-07, 4.825e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:07:58,731] [INFO] [timer.py:215:stop] epoch=0/micro_step=10/global_step=10, RunningAvgSamplesPerSec=57.05384728645054, CurrSamplesPerSec=50.21911950796253, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:07:58,896] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=5, lr=[2.5000000000000004e-07, 2.5000000000000004e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 9|ppo_ep: 1|act_loss: 0.199462890625|cri_loss: 0.1419677734375|unsuper_loss: 0.0 average reward score: -0.7958984375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.26%) |Training time=0.81s (31.87%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=11.88 epoch: 0|step: 10|ppo_ep: 1|act_loss: 0.123046875|cri_loss: 0.1575927734375|unsuper_loss: 0.0 average reward score: -0.1348876953125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.56%) |Training time=0.80s (31.46%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=11.94 epoch: 0|step: 11|ppo_ep: 1|act_loss: 0.0229949951171875|cri_loss: 0.1405029296875|unsuper_loss: 0.0 average reward score: -1.232421875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.14%) |Training time=0.81s (31.83%) |Others=0.23 (9.03%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=11.99 [2023-07-01 08:08:06,190] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048 epoch: 0|step: 12|ppo_ep: 1|act_loss: -0.11627197265625|cri_loss: 1.2509765625|unsuper_loss: 0.0 average reward score: -2.00390625 ------------------------------------------------------------------------------------- |E2E latency=2.36s |Gather latency=0.00s (0.00%) |Generate time=1.51s (64.07%) |Training time=0.62s (26.34%) |Others=0.23 (9.59%)|CurSamplesPerSec=13.56 |AvgSamplesPerSec=12.09 [2023-07-01 08:08:08,886] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048 epoch: 0|step: 13|ppo_ep: 1|act_loss: -0.10198974609375|cri_loss: 0.263427734375|unsuper_loss: 0.0 average reward score: -0.035888671875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.51s (60.59%) |Training time=0.80s (32.22%) |Others=0.18 (7.19%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.14 epoch: 0|step: 14|ppo_ep: 1|act_loss: -0.367919921875|cri_loss: 0.2203369140625|unsuper_loss: 0.0 average reward score: -0.9765625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.80s (31.59%) |Others=0.23 (9.03%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.17 epoch: 0|step: 15|ppo_ep: 1|act_loss: 0.12371826171875|cri_loss: 0.16064453125|unsuper_loss: 0.0 average reward score: 0.66015625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.46%) |Training time=0.80s (31.58%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.20 epoch: 0|step: 16|ppo_ep: 1|act_loss: -0.037841796875|cri_loss: 0.1976318359375|unsuper_loss: 0.0 average reward score: -0.326904296875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.26%) |Training time=0.81s (31.78%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.22 epoch: 0|step: 17|ppo_ep: 1|act_loss: 0.03997802734375|cri_loss: 0.0546875|unsuper_loss: 0.0 average reward score: 0.403076171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.81s (31.67%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.24 epoch: 0|step: 18|ppo_ep: 1|act_loss: -0.11065673828125|cri_loss: 0.1983642578125|unsuper_loss: 0.0 average reward score: -0.39208984375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.18%) |Training time=0.81s (31.81%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.25 [2023-07-01 08:08:23,780] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=6, lr=[1.3510000000000003e-06, 1.3510000000000003e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:08:23,961] [INFO] [timer.py:215:stop] epoch=0/micro_step=20/global_step=20, RunningAvgSamplesPerSec=54.095903560254364, CurrSamplesPerSec=50.21105986335567, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:08:24,128] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=6, lr=[7.000000000000001e-07, 7.000000000000001e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 19|ppo_ep: 1|act_loss: -0.1505126953125|cri_loss: 0.1583251953125|unsuper_loss: 0.0 average reward score: 0.42919921875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.72%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.27 epoch: 0|step: 20|ppo_ep: 1|act_loss: 0.05194091796875|cri_loss: 0.10504150390625|unsuper_loss: 0.0 average reward score: -0.5322265625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.81s (31.68%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.28 epoch: 0|step: 21|ppo_ep: 1|act_loss: -0.157958984375|cri_loss: 1.16015625|unsuper_loss: 0.0 average reward score: -0.0382080078125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.80s (31.64%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.29 epoch: 0|step: 22|ppo_ep: 1|act_loss: 0.057037353515625|cri_loss: 0.112060546875|unsuper_loss: 0.0 average reward score: -0.424072265625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.59%) |Training time=0.80s (31.49%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.31 epoch: 0|step: 23|ppo_ep: 1|act_loss: -0.10540771484375|cri_loss: 0.46435546875|unsuper_loss: 0.0 average reward score: -0.78173828125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.56%) |Training time=0.80s (31.53%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.32 epoch: 0|step: 24|ppo_ep: 1|act_loss: 0.038909912109375|cri_loss: 0.07330322265625|unsuper_loss: 0.0 average reward score: -0.0738525390625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.56%) |Training time=0.80s (31.46%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.33 epoch: 0|step: 25|ppo_ep: 1|act_loss: 0.037322998046875|cri_loss: 0.1531982421875|unsuper_loss: 0.0 average reward score: 0.059814453125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.58%) |Training time=0.80s (31.47%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.34 epoch: 0|step: 26|ppo_ep: 1|act_loss: 0.1767578125|cri_loss: 0.11053466796875|unsuper_loss: 0.0 average reward score: -1.0439453125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.55%) |Training time=0.80s (31.50%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.35 epoch: 0|step: 27|ppo_ep: 1|act_loss: 0.07501220703125|cri_loss: 0.1719970703125|unsuper_loss: 0.0 average reward score: 1.2197265625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.26%) |Training time=0.81s (31.84%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.36 epoch: 0|step: 28|ppo_ep: 1|act_loss: 0.2171630859375|cri_loss: 0.1456298828125|unsuper_loss: 0.0 average reward score: -0.366943359375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.80s (31.60%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.36 [2023-07-01 08:08:49,209] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=6, lr=[2.316e-06, 2.316e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:08:49,387] [INFO] [timer.py:215:stop] epoch=0/micro_step=30/global_step=30, RunningAvgSamplesPerSec=52.846617835000465, CurrSamplesPerSec=50.29139602390737, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:08:49,553] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=6, lr=[1.2000000000000002e-06, 1.2000000000000002e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 29|ppo_ep: 1|act_loss: 0.052337646484375|cri_loss: 0.166015625|unsuper_loss: 0.0 average reward score: -1.0771484375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.79%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.37 epoch: 0|step: 30|ppo_ep: 1|act_loss: 0.053802490234375|cri_loss: 0.1561279296875|unsuper_loss: 0.0 average reward score: 0.409423828125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.23%) |Training time=0.81s (31.77%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.38 epoch: 0|step: 31|ppo_ep: 1|act_loss: 0.2149658203125|cri_loss: 0.2462158203125|unsuper_loss: 0.0 average reward score: 0.31982421875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.81s (31.68%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.38 epoch: 0|step: 32|ppo_ep: 1|act_loss: -0.01303863525390625|cri_loss: 0.0921630859375|unsuper_loss: 0.0 average reward score: 1.552734375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.51%) |Training time=0.80s (31.55%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.39 epoch: 0|step: 33|ppo_ep: 1|act_loss: -0.08642578125|cri_loss: 0.1295166015625|unsuper_loss: 0.0 average reward score: 1.685546875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.16%) |Training time=0.81s (31.87%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.39 epoch: 0|step: 34|ppo_ep: 1|act_loss: 0.1895751953125|cri_loss: 0.1927490234375|unsuper_loss: 0.0 average reward score: -0.654296875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.23%) |Training time=0.81s (31.78%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.40 epoch: 0|step: 35|ppo_ep: 1|act_loss: 0.12469482421875|cri_loss: 0.1187744140625|unsuper_loss: 0.0 average reward score: -0.56787109375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.80%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.40 epoch: 0|step: 36|ppo_ep: 1|act_loss: 0.0865478515625|cri_loss: 0.06805419921875|unsuper_loss: 0.0 average reward score: 0.978515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.50%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.41 epoch: 0|step: 37|ppo_ep: 1|act_loss: 0.1055908203125|cri_loss: 0.4501953125|unsuper_loss: 0.0 average reward score: -0.71044921875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.81s (31.73%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.41 epoch: 0|step: 38|ppo_ep: 1|act_loss: 0.07806396484375|cri_loss: 0.384521484375|unsuper_loss: 0.0 average reward score: 0.85986328125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.58%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.41 [2023-07-01 08:09:14,695] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=6, lr=[3.2810000000000004e-06, 3.2810000000000004e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:09:14,872] [INFO] [timer.py:215:stop] epoch=0/micro_step=40/global_step=40, RunningAvgSamplesPerSec=52.19390649599056, CurrSamplesPerSec=50.73494643104871, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:09:15,039] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=6, lr=[1.7000000000000002e-06, 1.7000000000000002e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 39|ppo_ep: 1|act_loss: -0.156982421875|cri_loss: 0.2034912109375|unsuper_loss: 0.0 average reward score: -0.13427734375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.49%) |Training time=0.80s (31.58%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.42 epoch: 0|step: 40|ppo_ep: 1|act_loss: -0.037841796875|cri_loss: 0.12188720703125|unsuper_loss: 0.0 average reward score: -0.04388427734375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.56%) |Training time=0.80s (31.53%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.42 epoch: 0|step: 41|ppo_ep: 1|act_loss: 0.14306640625|cri_loss: 0.2890625|unsuper_loss: 0.0 average reward score: -0.447265625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.79%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.43 epoch: 0|step: 42|ppo_ep: 1|act_loss: -0.0902099609375|cri_loss: 0.1630859375|unsuper_loss: 0.0 average reward score: 0.9951171875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.31%) |Training time=0.81s (31.72%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.43 epoch: 0|step: 43|ppo_ep: 1|act_loss: -0.083251953125|cri_loss: 0.0677490234375|unsuper_loss: 0.0 average reward score: 0.8994140625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.62%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.43 epoch: 0|step: 44|ppo_ep: 1|act_loss: 0.0496826171875|cri_loss: 0.2196044921875|unsuper_loss: 0.0 average reward score: -0.120361328125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.25%) |Training time=0.81s (31.86%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.43 epoch: 0|step: 45|ppo_ep: 1|act_loss: -0.08050537109375|cri_loss: 0.154541015625|unsuper_loss: 0.0 average reward score: 1.1318359375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.81s (31.77%) |Others=0.23 (8.86%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.44 epoch: 0|step: 46|ppo_ep: 1|act_loss: 0.0667724609375|cri_loss: 0.10546875|unsuper_loss: 0.0 average reward score: -1.0234375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.58%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.44 epoch: 0|step: 47|ppo_ep: 1|act_loss: -0.1798095703125|cri_loss: 0.11456298828125|unsuper_loss: 0.0 average reward score: 0.80078125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.22%) |Training time=0.81s (31.81%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.44 epoch: 0|step: 48|ppo_ep: 1|act_loss: -0.0260009765625|cri_loss: 0.1363525390625|unsuper_loss: 0.0 average reward score: 1.5859375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.19%) |Training time=0.81s (31.87%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.45 [2023-07-01 08:09:40,151] [INFO] [logging.py:96:log_dist] [Rank 0] step=50, skipped=6, lr=[4.2460000000000005e-06, 4.2460000000000005e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:09:40,333] [INFO] [timer.py:215:stop] epoch=0/micro_step=50/global_step=50, RunningAvgSamplesPerSec=51.811458544558285, CurrSamplesPerSec=49.8596639560758, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:09:40,498] [INFO] [logging.py:96:log_dist] [Rank 0] step=50, skipped=6, lr=[2.2e-06, 2.2e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 49|ppo_ep: 1|act_loss: -0.0413818359375|cri_loss: 0.21044921875|unsuper_loss: 0.0 average reward score: 1.5859375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.17%) |Training time=0.82s (31.93%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.45 epoch: 0|step: 50|ppo_ep: 1|act_loss: -0.0550537109375|cri_loss: 0.12091064453125|unsuper_loss: 0.0 average reward score: -0.29931640625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.37%) |Training time=0.81s (31.72%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.45 epoch: 0|step: 51|ppo_ep: 1|act_loss: -0.1781005859375|cri_loss: 0.267333984375|unsuper_loss: 0.0 average reward score: 0.7529296875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.21%) |Training time=0.81s (31.87%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.45 epoch: 0|step: 52|ppo_ep: 1|act_loss: -0.1932373046875|cri_loss: 0.1837158203125|unsuper_loss: 0.0 average reward score: 1.6982421875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.62%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.45 epoch: 0|step: 53|ppo_ep: 1|act_loss: -0.09027099609375|cri_loss: 0.09649658203125|unsuper_loss: 0.0 average reward score: 1.9296875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.23%) |Training time=0.81s (31.82%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.46 epoch: 0|step: 54|ppo_ep: 1|act_loss: -0.307861328125|cri_loss: 0.280029296875|unsuper_loss: 0.0 average reward score: 0.8291015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.66%) |Training time=0.80s (31.45%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.46 epoch: 0|step: 55|ppo_ep: 1|act_loss: -0.11029052734375|cri_loss: 0.0491943359375|unsuper_loss: 0.0 average reward score: 1.361328125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.57%) |Training time=0.80s (31.48%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.46 epoch: 0|step: 56|ppo_ep: 1|act_loss: -0.061004638671875|cri_loss: 0.06158447265625|unsuper_loss: 0.0 average reward score: 0.578125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.50%) |Training time=0.80s (31.55%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.46 epoch: 0|step: 57|ppo_ep: 1|act_loss: -0.04693603515625|cri_loss: 0.1094970703125|unsuper_loss: 0.0 average reward score: 0.85546875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.41%) |Training time=0.81s (31.64%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.46 epoch: 0|step: 58|ppo_ep: 1|act_loss: -0.057037353515625|cri_loss: 0.2431640625|unsuper_loss: 0.0 average reward score: 1.8388671875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.26%) |Training time=0.81s (31.82%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.47 [2023-07-01 08:10:05,603] [INFO] [logging.py:96:log_dist] [Rank 0] step=60, skipped=6, lr=[5.211000000000001e-06, 5.211000000000001e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:10:05,781] [INFO] [timer.py:215:stop] epoch=0/micro_step=60/global_step=60, RunningAvgSamplesPerSec=51.608427254494195, CurrSamplesPerSec=51.24631138292929, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:10:05,944] [INFO] [logging.py:96:log_dist] [Rank 0] step=60, skipped=6, lr=[2.7000000000000004e-06, 2.7000000000000004e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 59|ppo_ep: 1|act_loss: -0.01324462890625|cri_loss: 0.06976318359375|unsuper_loss: 0.0 average reward score: 1.5302734375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.68%) |Training time=0.80s (31.46%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.47 epoch: 0|step: 60|ppo_ep: 1|act_loss: 0.0682373046875|cri_loss: 0.05938720703125|unsuper_loss: 0.0 average reward score: 1.580078125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.56%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.47 epoch: 0|step: 61|ppo_ep: 1|act_loss: 0.0312042236328125|cri_loss: 0.0911865234375|unsuper_loss: 0.0 average reward score: 0.83642578125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.12%) |Training time=0.81s (31.95%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.47 epoch: 0|step: 62|ppo_ep: 1|act_loss: 0.1256103515625|cri_loss: 0.08343505859375|unsuper_loss: 0.0 average reward score: 0.404296875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.01%) |Training time=0.82s (32.03%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.47 epoch: 0|step: 63|ppo_ep: 1|act_loss: 0.142333984375|cri_loss: 0.1456298828125|unsuper_loss: 0.0 average reward score: 1.8056640625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.26%) |Training time=0.81s (31.82%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.47 epoch: 0|step: 64|ppo_ep: 1|act_loss: 0.151611328125|cri_loss: 0.078125|unsuper_loss: 0.0 average reward score: 0.9873046875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.81s (31.63%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.48 epoch: 0|step: 65|ppo_ep: 1|act_loss: 0.259765625|cri_loss: 0.125244140625|unsuper_loss: 0.0 average reward score: -0.45849609375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.81s (31.72%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.48 epoch: 0|step: 66|ppo_ep: 1|act_loss: 0.1741943359375|cri_loss: 0.22265625|unsuper_loss: 0.0 average reward score: -1.7998046875 ------------------------------------------------------------------------------------- |E2E latency=2.56s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.12%) |Training time=0.82s (31.93%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.52 |AvgSamplesPerSec=12.48 epoch: 0|step: 67|ppo_ep: 1|act_loss: -0.0309600830078125|cri_loss: 0.15966796875|unsuper_loss: 0.0 average reward score: -1.15625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.17%) |Training time=0.81s (31.88%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.48 epoch: 0|step: 68|ppo_ep: 1|act_loss: 0.06488037109375|cri_loss: 0.232177734375|unsuper_loss: 0.0 average reward score: -2.349609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.78%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.48 [2023-07-01 08:10:31,089] [INFO] [logging.py:96:log_dist] [Rank 0] step=70, skipped=6, lr=[6.176000000000001e-06, 6.176000000000001e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:10:31,269] [INFO] [timer.py:215:stop] epoch=0/micro_step=70/global_step=70, RunningAvgSamplesPerSec=51.409136051593045, CurrSamplesPerSec=50.844574422344415, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:10:31,433] [INFO] [logging.py:96:log_dist] [Rank 0] step=70, skipped=6, lr=[3.2000000000000003e-06, 3.2000000000000003e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 69|ppo_ep: 1|act_loss: 0.16064453125|cri_loss: 0.2646484375|unsuper_loss: 0.0 average reward score: -3.220703125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.60%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.48 epoch: 0|step: 70|ppo_ep: 1|act_loss: 0.211669921875|cri_loss: 0.4462890625|unsuper_loss: 0.0 average reward score: -4.57421875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.57%) |Training time=0.80s (31.49%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.48 epoch: 0|step: 71|ppo_ep: 1|act_loss: -0.0789794921875|cri_loss: 0.3310546875|unsuper_loss: 0.0 average reward score: -3.51953125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.52%) |Training time=0.80s (31.59%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.48 epoch: 0|step: 72|ppo_ep: 1|act_loss: -0.232177734375|cri_loss: 0.1922607421875|unsuper_loss: 0.0 average reward score: -4.6328125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.49%) |Training time=0.81s (31.65%) |Others=0.23 (8.86%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.48 epoch: 0|step: 73|ppo_ep: 1|act_loss: 0.09918212890625|cri_loss: 0.218017578125|unsuper_loss: 0.0 average reward score: -4.625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.64%) |Training time=0.80s (31.50%) |Others=0.23 (8.86%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.49 epoch: 0|step: 74|ppo_ep: 1|act_loss: -0.314208984375|cri_loss: 0.272705078125|unsuper_loss: 0.0 average reward score: -4.375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.66%) |Training time=0.80s (31.46%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.49 epoch: 0|step: 75|ppo_ep: 1|act_loss: 0.0401611328125|cri_loss: 0.1531982421875|unsuper_loss: 0.0 average reward score: -3.9375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.81s (31.70%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.49 epoch: 0|step: 76|ppo_ep: 1|act_loss: -0.2626953125|cri_loss: 0.11065673828125|unsuper_loss: 0.0 average reward score: -4.96875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.48%) |Training time=0.80s (31.57%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.49 epoch: 0|step: 77|ppo_ep: 1|act_loss: -0.06817626953125|cri_loss: 0.1427001953125|unsuper_loss: 0.0 average reward score: -4.08203125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.56%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.49 epoch: 0|step: 78|ppo_ep: 1|act_loss: -0.11895751953125|cri_loss: 0.13671875|unsuper_loss: 0.0 average reward score: -3.625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.56%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.49 [2023-07-01 08:10:56,546] [INFO] [logging.py:96:log_dist] [Rank 0] step=80, skipped=6, lr=[7.141000000000001e-06, 7.141000000000001e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:10:56,728] [INFO] [timer.py:215:stop] epoch=0/micro_step=80/global_step=80, RunningAvgSamplesPerSec=51.32700471811606, CurrSamplesPerSec=50.60044388296178, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:10:56,894] [INFO] [logging.py:96:log_dist] [Rank 0] step=80, skipped=6, lr=[3.7e-06, 3.7e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 79|ppo_ep: 1|act_loss: -0.11212158203125|cri_loss: 0.263427734375|unsuper_loss: 0.0 average reward score: -3.41015625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.45%) |Training time=0.81s (31.59%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.49 epoch: 0|step: 80|ppo_ep: 1|act_loss: -0.10980224609375|cri_loss: 0.2310791015625|unsuper_loss: 0.0 average reward score: -5.07421875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.37%) |Training time=0.81s (31.64%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.49 epoch: 0|step: 81|ppo_ep: 1|act_loss: 0.03558349609375|cri_loss: 0.38037109375|unsuper_loss: 0.0 average reward score: -4.1328125 ------------------------------------------------------------------------------------- |E2E latency=2.56s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.34%) |Training time=0.81s (31.74%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.52 |AvgSamplesPerSec=12.49 epoch: 0|step: 82|ppo_ep: 1|act_loss: 0.05450439453125|cri_loss: 0.372314453125|unsuper_loss: 0.0 average reward score: -6.484375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.26%) |Training time=0.81s (31.80%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.49 epoch: 0|step: 83|ppo_ep: 1|act_loss: 0.10955810546875|cri_loss: 0.173583984375|unsuper_loss: 0.0 average reward score: -3.87109375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.50%) |Training time=0.80s (31.58%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.49 epoch: 0|step: 84|ppo_ep: 1|act_loss: 0.1385498046875|cri_loss: 0.340576171875|unsuper_loss: 0.0 average reward score: -4.765625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.51%) |Training time=0.80s (31.53%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.50 epoch: 0|step: 85|ppo_ep: 1|act_loss: 0.052093505859375|cri_loss: 0.161376953125|unsuper_loss: 0.0 average reward score: -4.9453125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.81s (31.62%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.50 epoch: 0|step: 86|ppo_ep: 1|act_loss: 0.141357421875|cri_loss: 0.194091796875|unsuper_loss: 0.0 average reward score: -5.6953125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.25%) |Training time=0.81s (31.85%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.50 epoch: 0|step: 87|ppo_ep: 1|act_loss: 0.1273193359375|cri_loss: 0.04815673828125|unsuper_loss: 0.0 average reward score: -5.54296875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.56%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.50 epoch: 0|step: 88|ppo_ep: 1|act_loss: 0.0860595703125|cri_loss: 0.058013916015625|unsuper_loss: 0.0 average reward score: -5.4921875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.26%) |Training time=0.81s (31.82%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.50 [2023-07-01 08:11:22,017] [INFO] [logging.py:96:log_dist] [Rank 0] step=90, skipped=6, lr=[8.106e-06, 8.106e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:11:22,195] [INFO] [timer.py:215:stop] epoch=0/micro_step=90/global_step=90, RunningAvgSamplesPerSec=51.229823380821855, CurrSamplesPerSec=50.06663279595282, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:11:22,361] [INFO] [logging.py:96:log_dist] [Rank 0] step=90, skipped=6, lr=[4.2000000000000004e-06, 4.2000000000000004e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 89|ppo_ep: 1|act_loss: 0.07598876953125|cri_loss: 0.05743408203125|unsuper_loss: 0.0 average reward score: -6.0703125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.18%) |Training time=0.81s (31.92%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.50 epoch: 0|step: 90|ppo_ep: 1|act_loss: 0.0255126953125|cri_loss: 0.08709716796875|unsuper_loss: 0.0 average reward score: -3.384765625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.68%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.50 epoch: 0|step: 91|ppo_ep: 1|act_loss: 0.039703369140625|cri_loss: 0.071044921875|unsuper_loss: 0.0 average reward score: -3.388671875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.62%) |Training time=0.80s (31.51%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.50 epoch: 0|step: 92|ppo_ep: 1|act_loss: 0.0927734375|cri_loss: 0.038482666015625|unsuper_loss: 0.0 average reward score: -4.67578125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.77%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.50 epoch: 0|step: 93|ppo_ep: 1|act_loss: -0.034576416015625|cri_loss: 0.1512451171875|unsuper_loss: 0.0 average reward score: -6.234375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.64%) |Training time=0.80s (31.42%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.50 epoch: 0|step: 94|ppo_ep: 1|act_loss: -0.045196533203125|cri_loss: 0.120361328125|unsuper_loss: 0.0 average reward score: -5.26953125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.42%) |Training time=0.81s (31.60%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.50 epoch: 0|step: 95|ppo_ep: 1|act_loss: -0.1468505859375|cri_loss: 0.1895751953125|unsuper_loss: 0.0 average reward score: -4.4765625 ------------------------------------------------------------------------------------- |E2E latency=2.56s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.22%) |Training time=0.81s (31.84%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.52 |AvgSamplesPerSec=12.50 [2023-07-01 08:11:39,836] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024 epoch: 0|step: 96|ppo_ep: 1|act_loss: -0.2025146484375|cri_loss: 0.178466796875|unsuper_loss: 0.0 average reward score: -2.916015625 ------------------------------------------------------------------------------------- |E2E latency=2.35s |Gather latency=0.00s (0.00%) |Generate time=1.51s (64.20%) |Training time=0.62s (26.18%) |Others=0.23 (9.62%)|CurSamplesPerSec=13.60 |AvgSamplesPerSec=12.51 epoch: 0|step: 97|ppo_ep: 1|act_loss: -0.09326171875|cri_loss: 0.11358642578125|unsuper_loss: 0.0 average reward score: -4.484375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.81s (31.62%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.51 epoch: 0|step: 98|ppo_ep: 1|act_loss: -0.183837890625|cri_loss: 0.23583984375|unsuper_loss: 0.0 average reward score: -4.40625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.44%) |Training time=0.81s (31.61%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.51 [2023-07-01 08:11:47,305] [INFO] [logging.py:96:log_dist] [Rank 0] step=100, skipped=7, lr=[8.974500000000002e-06, 8.974500000000002e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:11:47,488] [INFO] [timer.py:215:stop] epoch=0/micro_step=100/global_step=100, RunningAvgSamplesPerSec=51.32384275389724, CurrSamplesPerSec=50.546857564437786, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:11:47,653] [INFO] [logging.py:96:log_dist] [Rank 0] step=100, skipped=6, lr=[4.7e-06, 4.7e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 99|ppo_ep: 1|act_loss: -0.1881103515625|cri_loss: 0.11492919921875|unsuper_loss: 0.0 average reward score: -3.041015625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.45%) |Training time=0.81s (31.63%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.51 epoch: 0|step: 100|ppo_ep: 1|act_loss: -0.1273193359375|cri_loss: 0.10101318359375|unsuper_loss: 0.0 average reward score: -4.1015625 ------------------------------------------------------------------------------------- |E2E latency=2.56s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.50%) |Training time=0.81s (31.61%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.52 |AvgSamplesPerSec=12.51 epoch: 0|step: 101|ppo_ep: 1|act_loss: -0.06329345703125|cri_loss: 0.048126220703125|unsuper_loss: 0.0 average reward score: -5.5 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.66%) |Training time=0.80s (31.47%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.52 epoch: 0|step: 102|ppo_ep: 1|act_loss: 0.0202789306640625|cri_loss: 0.046173095703125|unsuper_loss: 0.0 average reward score: -4.0 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.68%) |Training time=0.80s (31.44%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.52 epoch: 0|step: 103|ppo_ep: 1|act_loss: -0.0049285888671875|cri_loss: 0.01212310791015625|unsuper_loss: 0.0 average reward score: -4.2109375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.60%) |Training time=0.80s (31.50%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.52 epoch: 0|step: 104|ppo_ep: 1|act_loss: 0.042816162109375|cri_loss: 0.0548095703125|unsuper_loss: 0.0 average reward score: -3.89453125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.47%) |Training time=0.80s (31.56%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.52 epoch: 0|step: 105|ppo_ep: 1|act_loss: -0.0083160400390625|cri_loss: 0.0223388671875|unsuper_loss: 0.0 average reward score: -4.953125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.58%) |Training time=0.80s (31.51%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.52 epoch: 0|step: 106|ppo_ep: 1|act_loss: 0.10870361328125|cri_loss: 0.0516357421875|unsuper_loss: 0.0 average reward score: -3.88671875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.61%) |Training time=0.80s (31.47%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.52 epoch: 0|step: 107|ppo_ep: 1|act_loss: 0.1151123046875|cri_loss: 0.05731201171875|unsuper_loss: 0.0 average reward score: -4.47265625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.48%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.52 epoch: 0|step: 108|ppo_ep: 1|act_loss: 0.1324462890625|cri_loss: 0.06695556640625|unsuper_loss: 0.0 average reward score: -4.80859375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.75%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.52 [2023-07-01 08:12:12,783] [INFO] [logging.py:96:log_dist] [Rank 0] step=110, skipped=7, lr=[9.649706174538074e-06, 9.649706174538074e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:12:12,961] [INFO] [timer.py:215:stop] epoch=0/micro_step=110/global_step=110, RunningAvgSamplesPerSec=51.28166560587494, CurrSamplesPerSec=50.821221532326284, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:12:13,127] [INFO] [logging.py:96:log_dist] [Rank 0] step=110, skipped=6, lr=[4.999729351164122e-06, 4.999729351164122e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 109|ppo_ep: 1|act_loss: 0.06866455078125|cri_loss: 0.0279083251953125|unsuper_loss: 0.0 average reward score: -4.8359375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.55%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.52 epoch: 0|step: 110|ppo_ep: 1|act_loss: 0.04827880859375|cri_loss: 0.019989013671875|unsuper_loss: 0.0 average reward score: -4.7890625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.56%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.52 epoch: 0|step: 111|ppo_ep: 1|act_loss: 0.005802154541015625|cri_loss: 0.01776123046875|unsuper_loss: 0.0 average reward score: -5.4609375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.27%) |Training time=0.81s (31.77%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.52 epoch: 0|step: 112|ppo_ep: 1|act_loss: 0.027587890625|cri_loss: 0.01424407958984375|unsuper_loss: 0.0 average reward score: -3.73046875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.81s (31.62%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.52 epoch: 0|step: 113|ppo_ep: 1|act_loss: -0.01116180419921875|cri_loss: 0.01424407958984375|unsuper_loss: 0.0 average reward score: -4.125 ------------------------------------------------------------------------------------- |E2E latency=2.56s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.25%) |Training time=0.81s (31.76%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.52 |AvgSamplesPerSec=12.52 epoch: 0|step: 114|ppo_ep: 1|act_loss: -0.0208282470703125|cri_loss: 0.0262603759765625|unsuper_loss: 0.0 average reward score: -4.13671875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.45%) |Training time=0.81s (31.63%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.52 epoch: 0|step: 115|ppo_ep: 1|act_loss: -0.02191162109375|cri_loss: 0.0148773193359375|unsuper_loss: 0.0 average reward score: -4.265625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.81s (31.72%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.52 epoch: 0|step: 116|ppo_ep: 1|act_loss: -0.08856201171875|cri_loss: 0.0667724609375|unsuper_loss: 0.0 average reward score: -6.31640625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.28%) |Training time=0.81s (31.76%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.52 epoch: 0|step: 117|ppo_ep: 1|act_loss: -0.0440673828125|cri_loss: 0.05584716796875|unsuper_loss: 0.0 average reward score: -4.23046875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.80s (31.64%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.52 epoch: 0|step: 118|ppo_ep: 1|act_loss: 0.055267333984375|cri_loss: 0.0250244140625|unsuper_loss: 0.0 average reward score: -5.40625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.67%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.52 [2023-07-01 08:12:38,266] [INFO] [logging.py:96:log_dist] [Rank 0] step=120, skipped=7, lr=[9.644483606235295e-06, 9.644483606235295e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:12:38,443] [INFO] [timer.py:215:stop] epoch=0/micro_step=120/global_step=120, RunningAvgSamplesPerSec=51.22001477278903, CurrSamplesPerSec=51.161201298453776, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:12:38,609] [INFO] [logging.py:96:log_dist] [Rank 0] step=120, skipped=6, lr=[4.996685224712077e-06, 4.996685224712077e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 119|ppo_ep: 1|act_loss: -0.058380126953125|cri_loss: 0.051177978515625|unsuper_loss: 0.0 average reward score: -4.32421875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.69%) |Training time=0.80s (31.43%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.52 epoch: 0|step: 120|ppo_ep: 1|act_loss: 0.051849365234375|cri_loss: 0.0214385986328125|unsuper_loss: 0.0 average reward score: -4.51171875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.71%) |Training time=0.80s (31.37%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.52 epoch: 0|step: 121|ppo_ep: 1|act_loss: 0.07427978515625|cri_loss: 0.0521240234375|unsuper_loss: 0.0 average reward score: -3.826171875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.64%) |Training time=0.80s (31.43%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.52 epoch: 0|step: 122|ppo_ep: 1|act_loss: 0.035064697265625|cri_loss: 0.04278564453125|unsuper_loss: 0.0 average reward score: -4.81640625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.60%) |Training time=0.80s (31.49%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.52 epoch: 0|step: 123|ppo_ep: 1|act_loss: 0.066162109375|cri_loss: 0.0307159423828125|unsuper_loss: 0.0 average reward score: -3.54296875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.81s (31.65%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.52 epoch: 0|step: 124|ppo_ep: 1|act_loss: 0.1104736328125|cri_loss: 0.08404541015625|unsuper_loss: 0.0 average reward score: -5.171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.63%) |Training time=0.80s (31.44%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.52 epoch: 0|step: 125|ppo_ep: 1|act_loss: 0.05743408203125|cri_loss: 0.034271240234375|unsuper_loss: 0.0 average reward score: -6.00390625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.81s (31.66%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.53 epoch: 0|step: 126|ppo_ep: 1|act_loss: -0.002399444580078125|cri_loss: 0.039337158203125|unsuper_loss: 0.0 average reward score: -5.828125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.46%) |Training time=0.80s (31.58%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.53 epoch: 0|step: 127|ppo_ep: 1|act_loss: -0.0227508544921875|cri_loss: 0.00868988037109375|unsuper_loss: 0.0 average reward score: -4.015625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.45%) |Training time=0.81s (31.58%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.53 epoch: 0|step: 128|ppo_ep: 1|act_loss: -0.04278564453125|cri_loss: 0.042572021484375|unsuper_loss: 0.0 average reward score: -4.2890625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.50%) |Training time=0.81s (31.57%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.53 [2023-07-01 08:13:03,731] [INFO] [logging.py:96:log_dist] [Rank 0] step=130, skipped=7, lr=[9.632739717588912e-06, 9.632739717588912e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:13:03,914] [INFO] [timer.py:215:stop] epoch=0/micro_step=130/global_step=130, RunningAvgSamplesPerSec=51.19020786678154, CurrSamplesPerSec=50.84000996968942, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:13:04,079] [INFO] [logging.py:96:log_dist] [Rank 0] step=130, skipped=6, lr=[4.99026279355402e-06, 4.99026279355402e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 129|ppo_ep: 1|act_loss: 0.01258087158203125|cri_loss: 0.01308441162109375|unsuper_loss: 0.0 average reward score: -5.953125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.53%) |Training time=0.80s (31.53%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.53 epoch: 0|step: 130|ppo_ep: 1|act_loss: -0.0027561187744140625|cri_loss: 0.01007080078125|unsuper_loss: 0.0 average reward score: -6.265625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.53%) |Training time=0.80s (31.47%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.53 epoch: 0|step: 131|ppo_ep: 1|act_loss: -0.0166015625|cri_loss: 0.00750732421875|unsuper_loss: 0.0 average reward score: -4.7734375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.81s (31.70%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.53 epoch: 0|step: 132|ppo_ep: 1|act_loss: 0.039825439453125|cri_loss: 0.0167694091796875|unsuper_loss: 0.0 average reward score: -5.4921875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.41%) |Training time=0.81s (31.65%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.53 epoch: 0|step: 133|ppo_ep: 1|act_loss: 0.0194854736328125|cri_loss: 0.00791168212890625|unsuper_loss: 0.0 average reward score: -5.390625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.57%) |Training time=0.80s (31.52%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.53 epoch: 0|step: 134|ppo_ep: 1|act_loss: 0.0176239013671875|cri_loss: 0.018646240234375|unsuper_loss: 0.0 average reward score: -3.875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.60%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.53 epoch: 0|step: 135|ppo_ep: 1|act_loss: 0.01038360595703125|cri_loss: 0.0234375|unsuper_loss: 0.0 average reward score: -4.828125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.24%) |Training time=0.81s (31.80%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.53 epoch: 0|step: 136|ppo_ep: 1|act_loss: 0.035369873046875|cri_loss: 0.005126953125|unsuper_loss: 0.0 average reward score: -3.6015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.66%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.53 epoch: 0|step: 137|ppo_ep: 1|act_loss: 0.025634765625|cri_loss: 0.0036468505859375|unsuper_loss: 0.0 average reward score: -3.943359375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.81s (31.78%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.53 epoch: 0|step: 138|ppo_ep: 1|act_loss: 0.0025787353515625|cri_loss: 0.0016832351684570312|unsuper_loss: 0.0 average reward score: -4.96875 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.58%) |Training time=0.80s (31.51%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.53 [2023-07-01 08:13:29,179] [INFO] [logging.py:96:log_dist] [Rank 0] step=140, skipped=7, lr=[9.61449039944247e-06, 9.61449039944247e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:13:29,356] [INFO] [timer.py:215:stop] epoch=0/micro_step=140/global_step=140, RunningAvgSamplesPerSec=51.15801467130955, CurrSamplesPerSec=51.184418805615664, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:13:29,522] [INFO] [logging.py:96:log_dist] [Rank 0] step=140, skipped=6, lr=[4.980470747984265e-06, 4.980470747984265e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 139|ppo_ep: 1|act_loss: 0.023345947265625|cri_loss: 0.00701141357421875|unsuper_loss: 0.0 average reward score: -4.34375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.57%) |Training time=0.80s (31.52%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.53 epoch: 0|step: 140|ppo_ep: 1|act_loss: -0.0290374755859375|cri_loss: 0.0136871337890625|unsuper_loss: 0.0 average reward score: -4.765625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.80s (31.67%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.53 epoch: 0|step: 141|ppo_ep: 1|act_loss: -0.04620361328125|cri_loss: 0.0184783935546875|unsuper_loss: 0.0 average reward score: -4.296875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.53%) |Training time=0.80s (31.55%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.53 epoch: 0|step: 142|ppo_ep: 1|act_loss: -0.06597900390625|cri_loss: 0.0247344970703125|unsuper_loss: 0.0 average reward score: -3.38671875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.63%) |Training time=0.80s (31.45%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.53 epoch: 0|step: 143|ppo_ep: 1|act_loss: -0.005218505859375|cri_loss: 0.0203857421875|unsuper_loss: 0.0 average reward score: -4.80859375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.60%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.53 epoch: 0|step: 144|ppo_ep: 1|act_loss: -0.04986572265625|cri_loss: 0.0144195556640625|unsuper_loss: 0.0 average reward score: -4.14453125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.80s (31.62%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.53 epoch: 0|step: 145|ppo_ep: 1|act_loss: -0.057769775390625|cri_loss: 0.03375244140625|unsuper_loss: 0.0 average reward score: -4.15625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.68%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.53 epoch: 0|step: 146|ppo_ep: 1|act_loss: 0.0311126708984375|cri_loss: 0.0077056884765625|unsuper_loss: 0.0 average reward score: -5.078125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.70%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.53 epoch: 0|step: 147|ppo_ep: 1|act_loss: 0.042816162109375|cri_loss: 0.009368896484375|unsuper_loss: 0.0 average reward score: -4.25 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.50%) |Training time=0.80s (31.54%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.53 epoch: 0|step: 148|ppo_ep: 1|act_loss: 0.0704345703125|cri_loss: 0.0168609619140625|unsuper_loss: 0.0 average reward score: -4.3125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.58%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.53 [2023-07-01 08:13:54,640] [INFO] [logging.py:96:log_dist] [Rank 0] step=150, skipped=7, lr=[9.589760345240206e-06, 9.589760345240206e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:13:54,821] [INFO] [timer.py:215:stop] epoch=0/micro_step=150/global_step=150, RunningAvgSamplesPerSec=51.12761146442748, CurrSamplesPerSec=50.58425311398798, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:13:54,987] [INFO] [logging.py:96:log_dist] [Rank 0] step=150, skipped=6, lr=[4.967322337776272e-06, 4.967322337776272e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 149|ppo_ep: 1|act_loss: 0.0423583984375|cri_loss: 0.01122283935546875|unsuper_loss: 0.0 average reward score: -5.63671875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.49%) |Training time=0.81s (31.59%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.53 epoch: 0|step: 150|ppo_ep: 1|act_loss: 0.03692626953125|cri_loss: 0.010467529296875|unsuper_loss: 0.0 average reward score: -4.05078125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.81s (31.52%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.52 |AvgSamplesPerSec=12.53 epoch: 0|step: 151|ppo_ep: 1|act_loss: 0.0020465850830078125|cri_loss: 0.006305694580078125|unsuper_loss: 0.0 average reward score: -4.80078125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.69%) |Training time=0.80s (31.44%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.53 epoch: 0|step: 152|ppo_ep: 1|act_loss: -0.006404876708984375|cri_loss: 0.0030059814453125|unsuper_loss: 0.0 average reward score: -4.875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.73%) |Training time=0.80s (31.33%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.53 epoch: 0|step: 153|ppo_ep: 1|act_loss: -0.00732421875|cri_loss: 0.001430511474609375|unsuper_loss: 0.0 average reward score: -4.1171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.65%) |Training time=0.80s (31.43%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.53 epoch: 0|step: 154|ppo_ep: 1|act_loss: 0.016937255859375|cri_loss: 0.0187530517578125|unsuper_loss: 0.0 average reward score: -4.4609375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.62%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.53 epoch: 0|step: 155|ppo_ep: 1|act_loss: 0.0008111000061035156|cri_loss: 0.0024356842041015625|unsuper_loss: 0.0 average reward score: -6.44140625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.50%) |Training time=0.80s (31.56%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.53 epoch: 0|step: 156|ppo_ep: 1|act_loss: -0.01509857177734375|cri_loss: 0.002330780029296875|unsuper_loss: 0.0 average reward score: -5.44140625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.70%) |Training time=0.80s (31.44%) |Others=0.23 (8.86%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.53 epoch: 0|step: 157|ppo_ep: 1|act_loss: 0.00215911865234375|cri_loss: 0.007144927978515625|unsuper_loss: 0.0 average reward score: -4.046875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.55%) |Training time=0.80s (31.54%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.53 epoch: 0|step: 158|ppo_ep: 1|act_loss: 0.016845703125|cri_loss: 0.003650665283203125|unsuper_loss: 0.0 average reward score: -4.1015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.67%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.53 [2023-07-01 08:14:20,074] [INFO] [logging.py:96:log_dist] [Rank 0] step=160, skipped=7, lr=[9.558583017613959e-06, 9.558583017613959e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:14:20,256] [INFO] [timer.py:215:stop] epoch=0/micro_step=160/global_step=160, RunningAvgSamplesPerSec=51.11356093341441, CurrSamplesPerSec=50.12746005738873, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:14:20,422] [INFO] [logging.py:96:log_dist] [Rank 0] step=160, skipped=6, lr=[4.950835354254168e-06, 4.950835354254168e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 159|ppo_ep: 1|act_loss: 0.041961669921875|cri_loss: 0.01123809814453125|unsuper_loss: 0.0 average reward score: -4.14453125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.20%) |Training time=0.81s (31.86%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.53 epoch: 0|step: 160|ppo_ep: 1|act_loss: 0.045501708984375|cri_loss: 0.007793426513671875|unsuper_loss: 0.0 average reward score: -4.58984375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.42%) |Training time=0.81s (31.68%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.53 epoch: 0|step: 161|ppo_ep: 1|act_loss: 0.056732177734375|cri_loss: 0.034698486328125|unsuper_loss: 0.0 average reward score: -5.8359375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.81s (31.65%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54 epoch: 0|step: 162|ppo_ep: 1|act_loss: 0.0287017822265625|cri_loss: 0.0207672119140625|unsuper_loss: 0.0 average reward score: -3.76953125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.80s (31.59%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54 epoch: 0|step: 163|ppo_ep: 1|act_loss: -0.0164031982421875|cri_loss: 0.0018892288208007812|unsuper_loss: 0.0 average reward score: -4.53125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.25%) |Training time=0.81s (31.77%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54 epoch: 0|step: 164|ppo_ep: 1|act_loss: -0.004817962646484375|cri_loss: 0.0056304931640625|unsuper_loss: 0.0 average reward score: -4.33984375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.41%) |Training time=0.81s (31.65%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.54 epoch: 0|step: 165|ppo_ep: 1|act_loss: -0.1483154296875|cri_loss: 0.184814453125|unsuper_loss: 0.0 average reward score: -5.6796875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.81s (31.70%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.54 epoch: 0|step: 166|ppo_ep: 1|act_loss: -0.0347900390625|cri_loss: 0.00630950927734375|unsuper_loss: 0.0 average reward score: -4.4375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.68%) |Training time=0.80s (31.42%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.54 epoch: 0|step: 167|ppo_ep: 1|act_loss: -0.017822265625|cri_loss: 0.03143310546875|unsuper_loss: 0.0 average reward score: -5.03125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.67%) |Training time=0.80s (31.38%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.54 epoch: 0|step: 168|ppo_ep: 1|act_loss: -0.0017986297607421875|cri_loss: 0.0015411376953125|unsuper_loss: 0.0 average reward score: -4.30078125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.49s (58.57%) |Training time=0.83s (32.49%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.54 [2023-07-01 08:14:45,543] [INFO] [logging.py:96:log_dist] [Rank 0] step=170, skipped=7, lr=[9.521000603104346e-06, 9.521000603104346e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:14:45,721] [INFO] [timer.py:215:stop] epoch=0/micro_step=170/global_step=170, RunningAvgSamplesPerSec=51.08043032722592, CurrSamplesPerSec=50.96938329495944, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:14:45,886] [INFO] [logging.py:96:log_dist] [Rank 0] step=170, skipped=6, lr=[4.931032106219029e-06, 4.931032106219029e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 169|ppo_ep: 1|act_loss: 0.029571533203125|cri_loss: 0.01090240478515625|unsuper_loss: 0.0 average reward score: -4.7265625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.64%) |Training time=0.80s (31.47%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54 epoch: 0|step: 170|ppo_ep: 1|act_loss: 0.008148193359375|cri_loss: 0.006717681884765625|unsuper_loss: 0.0 average reward score: -3.87890625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.77%) |Training time=0.80s (31.41%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.54 epoch: 0|step: 171|ppo_ep: 1|act_loss: 0.01207733154296875|cri_loss: 0.004581451416015625|unsuper_loss: 0.0 average reward score: -4.359375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.56%) |Training time=0.80s (31.56%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.54 epoch: 0|step: 172|ppo_ep: 1|act_loss: 0.040924072265625|cri_loss: 0.0110321044921875|unsuper_loss: 0.0 average reward score: -4.7421875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.57%) |Training time=0.80s (31.49%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.54 epoch: 0|step: 173|ppo_ep: 1|act_loss: 0.017120361328125|cri_loss: 0.006683349609375|unsuper_loss: 0.0 average reward score: -6.41796875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.68%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.54 epoch: 0|step: 174|ppo_ep: 1|act_loss: 0.0014066696166992188|cri_loss: 0.00324249267578125|unsuper_loss: 0.0 average reward score: -5.234375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.51%) |Training time=0.80s (31.58%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54 epoch: 0|step: 175|ppo_ep: 1|act_loss: -0.00894927978515625|cri_loss: 0.006313323974609375|unsuper_loss: 0.0 average reward score: -3.623046875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.51%) |Training time=0.80s (31.54%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54 epoch: 0|step: 176|ppo_ep: 1|act_loss: 0.014862060546875|cri_loss: 0.004730224609375|unsuper_loss: 0.0 average reward score: -3.515625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.53%) |Training time=0.80s (31.50%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54 epoch: 0|step: 177|ppo_ep: 1|act_loss: 0.00820159912109375|cri_loss: 0.00408172607421875|unsuper_loss: 0.0 average reward score: -4.8515625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.31%) |Training time=0.81s (31.69%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54 epoch: 0|step: 178|ppo_ep: 1|act_loss: -0.01119232177734375|cri_loss: 0.0016412734985351562|unsuper_loss: 0.0 average reward score: -3.703125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.47%) |Training time=0.81s (31.59%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.54 [2023-07-01 08:15:10,996] [INFO] [logging.py:96:log_dist] [Rank 0] step=180, skipped=7, lr=[9.47706395507748e-06, 9.47706395507748e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:15:11,178] [INFO] [timer.py:215:stop] epoch=0/micro_step=180/global_step=180, RunningAvgSamplesPerSec=51.0658335380177, CurrSamplesPerSec=50.65431599287461, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:15:11,342] [INFO] [logging.py:96:log_dist] [Rank 0] step=180, skipped=6, lr=[4.907939389762475e-06, 4.907939389762475e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 179|ppo_ep: 1|act_loss: -0.0067901611328125|cri_loss: 0.001873016357421875|unsuper_loss: 0.0 average reward score: -5.546875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.81s (31.67%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.54 epoch: 0|step: 180|ppo_ep: 1|act_loss: -0.056732177734375|cri_loss: 0.020721435546875|unsuper_loss: 0.0 average reward score: -4.26953125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.56%) |Training time=0.80s (31.52%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54 epoch: 0|step: 181|ppo_ep: 1|act_loss: -0.01043701171875|cri_loss: 0.0020351409912109375|unsuper_loss: 0.0 average reward score: -3.75390625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.49%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54 epoch: 0|step: 182|ppo_ep: 1|act_loss: -0.0273895263671875|cri_loss: 0.004871368408203125|unsuper_loss: 0.0 average reward score: -4.28125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.81s (31.77%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.54 epoch: 0|step: 183|ppo_ep: 1|act_loss: -0.0251617431640625|cri_loss: 0.0274505615234375|unsuper_loss: 0.0 average reward score: -3.509765625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.31%) |Training time=0.81s (31.74%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54 epoch: 0|step: 184|ppo_ep: 1|act_loss: -0.028076171875|cri_loss: 0.004650115966796875|unsuper_loss: 0.0 average reward score: -7.34375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.55%) |Training time=0.80s (31.55%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.54 epoch: 0|step: 185|ppo_ep: 1|act_loss: 0.06512451171875|cri_loss: 0.04296875|unsuper_loss: 0.0 average reward score: -4.69140625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.57%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.54 epoch: 0|step: 186|ppo_ep: 1|act_loss: 0.03607177734375|cri_loss: 0.01641845703125|unsuper_loss: 0.0 average reward score: -3.08984375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.57%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.54 epoch: 0|step: 187|ppo_ep: 1|act_loss: 0.027374267578125|cri_loss: 0.01361083984375|unsuper_loss: 0.0 average reward score: -3.88671875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.63%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.54 epoch: 0|step: 188|ppo_ep: 1|act_loss: -0.0076751708984375|cri_loss: 0.01186370849609375|unsuper_loss: 0.0 average reward score: -4.859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.49s (58.59%) |Training time=0.82s (32.47%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.54 [2023-07-01 08:15:36,423] [INFO] [logging.py:96:log_dist] [Rank 0] step=190, skipped=7, lr=[9.426832524914468e-06, 9.426832524914468e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:15:36,600] [INFO] [timer.py:215:stop] epoch=0/micro_step=190/global_step=190, RunningAvgSamplesPerSec=51.04633948371713, CurrSamplesPerSec=51.249168265605164, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:15:36,765] [INFO] [logging.py:96:log_dist] [Rank 0] step=190, skipped=6, lr=[4.881588452008457e-06, 4.881588452008457e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 189|ppo_ep: 1|act_loss: 0.0013113021850585938|cri_loss: 0.0013742446899414062|unsuper_loss: 0.0 average reward score: -5.7890625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.69%) |Training time=0.80s (31.42%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.54 epoch: 0|step: 190|ppo_ep: 1|act_loss: 0.01316070556640625|cri_loss: 0.013458251953125|unsuper_loss: 0.0 average reward score: -3.8359375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.43%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.54 epoch: 0|step: 191|ppo_ep: 1|act_loss: -0.0120086669921875|cri_loss: 0.0018033981323242188|unsuper_loss: 0.0 average reward score: -5.875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.33%) |Training time=0.81s (31.69%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.54 epoch: 0|step: 192|ppo_ep: 1|act_loss: 0.00186920166015625|cri_loss: 0.00511932373046875|unsuper_loss: 0.0 average reward score: -4.94921875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.26%) |Training time=0.81s (31.77%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.54 epoch: 0|step: 193|ppo_ep: 1|act_loss: -0.0286712646484375|cri_loss: 0.004795074462890625|unsuper_loss: 0.0 average reward score: -5.140625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.56%) |Training time=0.80s (31.52%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.54 epoch: 0|step: 194|ppo_ep: 1|act_loss: -0.01007843017578125|cri_loss: 0.005573272705078125|unsuper_loss: 0.0 average reward score: -3.66796875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.80s (31.58%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54 epoch: 0|step: 195|ppo_ep: 1|act_loss: 0.0251922607421875|cri_loss: 0.00690460205078125|unsuper_loss: 0.0 average reward score: -4.0625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.81s (31.61%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54 epoch: 0|step: 196|ppo_ep: 1|act_loss: 0.003810882568359375|cri_loss: 0.0014600753784179688|unsuper_loss: 0.0 average reward score: -4.02734375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.48%) |Training time=0.81s (31.57%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.54 epoch: 0|step: 197|ppo_ep: 1|act_loss: -0.031768798828125|cri_loss: 0.008697509765625|unsuper_loss: 0.0 average reward score: -5.1640625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.43%) |Training time=0.81s (31.61%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.54 epoch: 0|step: 198|ppo_ep: 1|act_loss: -0.00102996826171875|cri_loss: 0.007480621337890625|unsuper_loss: 0.0 average reward score: -6.10546875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.58%) |Training time=0.80s (31.54%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54 [2023-07-01 08:16:01,904] [INFO] [logging.py:96:log_dist] [Rank 0] step=200, skipped=7, lr=[9.370374281566792e-06, 9.370374281566792e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:16:02,081] [INFO] [timer.py:215:stop] epoch=0/micro_step=200/global_step=200, RunningAvgSamplesPerSec=51.03230639238851, CurrSamplesPerSec=51.198730500227924, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:16:02,246] [INFO] [logging.py:96:log_dist] [Rank 0] step=200, skipped=6, lr=[4.852014948832268e-06, 4.852014948832268e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 199|ppo_ep: 1|act_loss: 0.0186004638671875|cri_loss: 0.003108978271484375|unsuper_loss: 0.0 average reward score: -3.7265625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.69%) |Training time=0.80s (31.44%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.54 epoch: 0|step: 200|ppo_ep: 1|act_loss: 0.003887176513671875|cri_loss: 0.004302978515625|unsuper_loss: 0.0 average reward score: -3.634765625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.53%) |Training time=0.80s (31.51%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.54 epoch: 0|step: 201|ppo_ep: 1|act_loss: -0.005275726318359375|cri_loss: 0.0028781890869140625|unsuper_loss: 0.0 average reward score: -3.716796875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.54%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54 epoch: 0|step: 202|ppo_ep: 1|act_loss: 0.0159454345703125|cri_loss: 0.003612518310546875|unsuper_loss: 0.0 average reward score: -4.38671875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.44%) |Training time=0.80s (31.63%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.54 epoch: 0|step: 203|ppo_ep: 1|act_loss: 0.00446319580078125|cri_loss: 0.001026153564453125|unsuper_loss: 0.0 average reward score: -4.640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.61%) |Training time=0.80s (31.47%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.54 epoch: 0|step: 204|ppo_ep: 1|act_loss: -0.0184326171875|cri_loss: 0.00092315673828125|unsuper_loss: 0.0 average reward score: -4.31640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.59%) |Training time=0.80s (31.40%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.54 epoch: 0|step: 205|ppo_ep: 1|act_loss: -0.0272369384765625|cri_loss: 0.0034389495849609375|unsuper_loss: 0.0 average reward score: -4.15625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.20%) |Training time=0.81s (31.85%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.54 epoch: 0|step: 206|ppo_ep: 1|act_loss: -0.0035800933837890625|cri_loss: 0.00862884521484375|unsuper_loss: 0.0 average reward score: -4.3671875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.57%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54 epoch: 0|step: 207|ppo_ep: 1|act_loss: -0.0309906005859375|cri_loss: 0.00408935546875|unsuper_loss: 0.0 average reward score: -3.22265625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.28%) |Training time=0.81s (31.78%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54 epoch: 0|step: 208|ppo_ep: 1|act_loss: -0.0180206298828125|cri_loss: 0.007373809814453125|unsuper_loss: 0.0 average reward score: -4.63671875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.80s (31.64%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.54 [2023-07-01 08:16:27,348] [INFO] [logging.py:96:log_dist] [Rank 0] step=210, skipped=7, lr=[9.30776561958644e-06, 9.30776561958644e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:16:27,531] [INFO] [timer.py:215:stop] epoch=0/micro_step=210/global_step=210, RunningAvgSamplesPerSec=51.01971803804604, CurrSamplesPerSec=50.701119734787326, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:16:27,697] [INFO] [logging.py:96:log_dist] [Rank 0] step=210, skipped=6, lr=[4.819258896614014e-06, 4.819258896614014e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 209|ppo_ep: 1|act_loss: -0.00366973876953125|cri_loss: 0.0026416778564453125|unsuper_loss: 0.0 average reward score: -3.982421875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.46%) |Training time=0.81s (31.58%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.54 epoch: 0|step: 210|ppo_ep: 1|act_loss: -0.038726806640625|cri_loss: 0.022308349609375|unsuper_loss: 0.0 average reward score: -4.2734375 ------------------------------------------------------------------------------------- |E2E latency=2.56s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.36%) |Training time=0.81s (31.69%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.52 |AvgSamplesPerSec=12.54 epoch: 0|step: 211|ppo_ep: 1|act_loss: 0.0160980224609375|cri_loss: 0.001773834228515625|unsuper_loss: 0.0 average reward score: -3.685546875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.72%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.54 epoch: 0|step: 212|ppo_ep: 1|act_loss: 0.0457763671875|cri_loss: 0.01435089111328125|unsuper_loss: 0.0 average reward score: -5.4609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.58%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54 epoch: 0|step: 213|ppo_ep: 1|act_loss: 0.02874755859375|cri_loss: 0.0050201416015625|unsuper_loss: 0.0 average reward score: -4.97265625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.70%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.54 epoch: 0|step: 214|ppo_ep: 1|act_loss: 0.01007843017578125|cri_loss: 0.001079559326171875|unsuper_loss: 0.0 average reward score: -6.3671875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.47%) |Training time=0.81s (31.59%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54 epoch: 0|step: 215|ppo_ep: 1|act_loss: 0.01255035400390625|cri_loss: 0.01186370849609375|unsuper_loss: 0.0 average reward score: -4.1015625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.50%) |Training time=0.80s (31.55%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54 epoch: 0|step: 216|ppo_ep: 1|act_loss: -0.0134124755859375|cri_loss: 0.001556396484375|unsuper_loss: 0.0 average reward score: -4.82421875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.58%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.54 epoch: 0|step: 217|ppo_ep: 1|act_loss: -0.036376953125|cri_loss: 0.0057830810546875|unsuper_loss: 0.0 average reward score: -4.06640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.73%) |Training time=0.80s (31.36%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.54 epoch: 0|step: 218|ppo_ep: 1|act_loss: -0.036834716796875|cri_loss: 0.00553131103515625|unsuper_loss: 0.0 average reward score: -5.6796875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.67%) |Training time=0.80s (31.36%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.54 [2023-07-01 08:16:52,825] [INFO] [logging.py:96:log_dist] [Rank 0] step=220, skipped=7, lr=[9.239091255755212e-06, 9.239091255755212e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:16:53,003] [INFO] [timer.py:215:stop] epoch=0/micro_step=220/global_step=220, RunningAvgSamplesPerSec=51.0095211309395, CurrSamplesPerSec=50.965996272607, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:16:53,169] [INFO] [logging.py:96:log_dist] [Rank 0] step=220, skipped=6, lr=[4.783364618091804e-06, 4.783364618091804e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 219|ppo_ep: 1|act_loss: -0.02386474609375|cri_loss: 0.0040283203125|unsuper_loss: 0.0 average reward score: -4.2421875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.60%) |Training time=0.80s (31.51%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54 epoch: 0|step: 220|ppo_ep: 1|act_loss: -0.0226593017578125|cri_loss: 0.0017938613891601562|unsuper_loss: 0.0 average reward score: -3.9765625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.54%) |Training time=0.80s (31.51%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54 epoch: 0|step: 221|ppo_ep: 1|act_loss: 0.0114593505859375|cri_loss: 0.010986328125|unsuper_loss: 0.0 average reward score: -5.4453125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.57%) |Training time=0.80s (31.55%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.54 epoch: 0|step: 222|ppo_ep: 1|act_loss: 0.00458526611328125|cri_loss: 0.0015325546264648438|unsuper_loss: 0.0 average reward score: -4.70703125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.41%) |Training time=0.81s (31.64%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55 epoch: 0|step: 223|ppo_ep: 1|act_loss: 0.0121612548828125|cri_loss: 0.007373809814453125|unsuper_loss: 0.0 average reward score: -5.44921875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.81s (31.67%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.55 epoch: 0|step: 224|ppo_ep: 1|act_loss: 0.02984619140625|cri_loss: 0.005184173583984375|unsuper_loss: 0.0 average reward score: -6.2109375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.41%) |Training time=0.81s (31.61%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.54 epoch: 0|step: 225|ppo_ep: 1|act_loss: 0.0408935546875|cri_loss: 0.00785064697265625|unsuper_loss: 0.0 average reward score: -6.0625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.68%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.54 epoch: 0|step: 226|ppo_ep: 1|act_loss: -0.003871917724609375|cri_loss: 0.00443267822265625|unsuper_loss: 0.0 average reward score: -5.19921875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.81s (31.71%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55 epoch: 0|step: 227|ppo_ep: 1|act_loss: -0.005840301513671875|cri_loss: 0.0009756088256835938|unsuper_loss: 0.0 average reward score: -3.80859375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.56%) |Training time=0.80s (31.50%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55 epoch: 0|step: 228|ppo_ep: 1|act_loss: -0.02532958984375|cri_loss: 0.0066375732421875|unsuper_loss: 0.0 average reward score: -3.87890625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.71%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.55 [2023-07-01 08:17:18,305] [INFO] [logging.py:96:log_dist] [Rank 0] step=230, skipped=7, lr=[9.16444411445309e-06, 9.16444411445309e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:17:18,485] [INFO] [timer.py:215:stop] epoch=0/micro_step=230/global_step=230, RunningAvgSamplesPerSec=50.99276075623739, CurrSamplesPerSec=50.51208347618738, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:17:18,651] [INFO] [logging.py:96:log_dist] [Rank 0] step=230, skipped=6, lr=[4.74438068238795e-06, 4.74438068238795e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 229|ppo_ep: 1|act_loss: -0.01898193359375|cri_loss: 0.0040130615234375|unsuper_loss: 0.0 average reward score: -4.7421875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.81s (31.67%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.55 epoch: 0|step: 230|ppo_ep: 1|act_loss: -0.0140533447265625|cri_loss: 0.0019969940185546875|unsuper_loss: 0.0 average reward score: -3.306640625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.52%) |Training time=0.80s (31.60%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55 epoch: 0|step: 231|ppo_ep: 1|act_loss: 0.0028781890869140625|cri_loss: 0.0003573894500732422|unsuper_loss: 0.0 average reward score: -4.6015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.60%) |Training time=0.80s (31.45%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.55 epoch: 0|step: 232|ppo_ep: 1|act_loss: 0.025909423828125|cri_loss: 0.007793426513671875|unsuper_loss: 0.0 average reward score: -4.59375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.53%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55 epoch: 0|step: 233|ppo_ep: 1|act_loss: -0.019439697265625|cri_loss: 0.0020503997802734375|unsuper_loss: 0.0 average reward score: -4.5703125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.53%) |Training time=0.80s (31.52%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55 epoch: 0|step: 234|ppo_ep: 1|act_loss: -0.0200347900390625|cri_loss: 0.0025844573974609375|unsuper_loss: 0.0 average reward score: -3.76953125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.70%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55 epoch: 0|step: 235|ppo_ep: 1|act_loss: -0.037872314453125|cri_loss: 0.0152587890625|unsuper_loss: 0.0 average reward score: -5.0625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.67%) |Training time=0.80s (31.43%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55 epoch: 0|step: 236|ppo_ep: 1|act_loss: 0.0111846923828125|cri_loss: 0.00437164306640625|unsuper_loss: 0.0 average reward score: -4.6875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.70%) |Training time=0.80s (31.42%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.55 epoch: 0|step: 237|ppo_ep: 1|act_loss: 0.0167388916015625|cri_loss: 0.007404327392578125|unsuper_loss: 0.0 average reward score: -3.662109375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.53%) |Training time=0.80s (31.53%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55 epoch: 0|step: 238|ppo_ep: 1|act_loss: -0.00798797607421875|cri_loss: 0.0012331008911132812|unsuper_loss: 0.0 average reward score: -3.802734375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.44%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55 [2023-07-01 08:17:43,740] [INFO] [logging.py:96:log_dist] [Rank 0] step=240, skipped=7, lr=[9.083925201920767e-06, 9.083925201920767e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:17:43,922] [INFO] [timer.py:215:stop] epoch=0/micro_step=240/global_step=240, RunningAvgSamplesPerSec=50.9921219056515, CurrSamplesPerSec=50.55043660844593, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:17:44,088] [INFO] [logging.py:96:log_dist] [Rank 0] step=240, skipped=6, lr=[4.702359839289306e-06, 4.702359839289306e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 239|ppo_ep: 1|act_loss: 0.053131103515625|cri_loss: 0.01371002197265625|unsuper_loss: 0.0 average reward score: -4.88671875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.40%) |Training time=0.81s (31.68%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.55 epoch: 0|step: 240|ppo_ep: 1|act_loss: 0.0006341934204101562|cri_loss: 0.0028324127197265625|unsuper_loss: 0.0 average reward score: -4.39453125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.56%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55 epoch: 0|step: 241|ppo_ep: 1|act_loss: -0.0220184326171875|cri_loss: 0.003536224365234375|unsuper_loss: 0.0 average reward score: -4.1484375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.80s (31.55%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55 epoch: 0|step: 242|ppo_ep: 1|act_loss: 0.0154876708984375|cri_loss: 0.003978729248046875|unsuper_loss: 0.0 average reward score: -3.69921875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.27%) |Training time=0.81s (31.77%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.55 epoch: 0|step: 243|ppo_ep: 1|act_loss: -0.0217742919921875|cri_loss: 0.0021533966064453125|unsuper_loss: 0.0 average reward score: -3.67578125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.22%) |Training time=0.81s (31.81%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.55 epoch: 0|step: 244|ppo_ep: 1|act_loss: 0.0015211105346679688|cri_loss: 0.006099700927734375|unsuper_loss: 0.0 average reward score: -4.1328125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.80s (31.69%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55 epoch: 0|step: 245|ppo_ep: 1|act_loss: 0.0012369155883789062|cri_loss: 0.0012674331665039062|unsuper_loss: 0.0 average reward score: -3.587890625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.71%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55 epoch: 0|step: 246|ppo_ep: 1|act_loss: -0.0059661865234375|cri_loss: 0.0011720657348632812|unsuper_loss: 0.0 average reward score: -4.5234375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.59%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55 epoch: 0|step: 247|ppo_ep: 1|act_loss: -0.0251312255859375|cri_loss: 0.00565338134765625|unsuper_loss: 0.0 average reward score: -4.16796875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.53%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55 epoch: 0|step: 248|ppo_ep: 1|act_loss: 0.01308441162109375|cri_loss: 0.01374053955078125|unsuper_loss: 0.0 average reward score: -5.1015625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.54%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55 [2023-07-01 08:18:09,191] [INFO] [logging.py:96:log_dist] [Rank 0] step=250, skipped=7, lr=[8.9976434695865e-06, 8.9976434695865e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:18:09,368] [INFO] [timer.py:215:stop] epoch=0/micro_step=250/global_step=250, RunningAvgSamplesPerSec=50.9825013275233, CurrSamplesPerSec=51.27168224923838, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:18:09,533] [INFO] [logging.py:96:log_dist] [Rank 0] step=250, skipped=6, lr=[4.657358947870691e-06, 4.657358947870691e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 249|ppo_ep: 1|act_loss: 0.0078582763671875|cri_loss: 0.0013494491577148438|unsuper_loss: 0.0 average reward score: -4.93359375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.64%) |Training time=0.80s (31.45%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.55 epoch: 0|step: 250|ppo_ep: 1|act_loss: -0.00982666015625|cri_loss: 0.002044677734375|unsuper_loss: 0.0 average reward score: -5.76171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.56%) |Training time=0.80s (31.53%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55 epoch: 0|step: 251|ppo_ep: 1|act_loss: -0.002544403076171875|cri_loss: 0.00104522705078125|unsuper_loss: 0.0 average reward score: -5.546875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.71%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55 epoch: 0|step: 252|ppo_ep: 1|act_loss: 0.0118865966796875|cri_loss: 0.0028934478759765625|unsuper_loss: 0.0 average reward score: -3.361328125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.27%) |Training time=0.81s (31.77%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55 [2023-07-01 08:18:19,349] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, but hysteresis is 2. Reducing hysteresis to 1 epoch: 0|step: 253|ppo_ep: 1|act_loss: 0.0092315673828125|cri_loss: 0.0016069412231445312|unsuper_loss: 0.0 average reward score: -5.7734375 ------------------------------------------------------------------------------------- |E2E latency=2.35s |Gather latency=0.00s (0.00%) |Generate time=1.51s (64.20%) |Training time=0.62s (26.19%) |Others=0.23 (9.61%)|CurSamplesPerSec=13.61 |AvgSamplesPerSec=12.55 epoch: 0|step: 254|ppo_ep: 1|act_loss: 0.03448486328125|cri_loss: 0.005481719970703125|unsuper_loss: 0.0 average reward score: -3.416015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.65%) |Training time=0.80s (31.43%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.55 epoch: 0|step: 255|ppo_ep: 1|act_loss: 0.01197052001953125|cri_loss: 0.0013523101806640625|unsuper_loss: 0.0 average reward score: -4.12890625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.45%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55 epoch: 0|step: 256|ppo_ep: 1|act_loss: -0.0228424072265625|cri_loss: 0.00531768798828125|unsuper_loss: 0.0 average reward score: -4.41796875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.47%) |Training time=0.81s (31.57%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.55 epoch: 0|step: 257|ppo_ep: 1|act_loss: 0.0013647079467773438|cri_loss: 0.0010805130004882812|unsuper_loss: 0.0 average reward score: -5.484375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.66%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55 epoch: 0|step: 258|ppo_ep: 1|act_loss: 0.003753662109375|cri_loss: 0.0027256011962890625|unsuper_loss: 0.0 average reward score: -5.03515625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.57%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55 [2023-07-01 08:18:34,445] [INFO] [logging.py:96:log_dist] [Rank 0] step=260, skipped=8, lr=[8.915159034156106e-06, 8.915159034156106e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:18:34,627] [INFO] [timer.py:215:stop] epoch=0/micro_step=260/global_step=260, RunningAvgSamplesPerSec=51.033755874636995, CurrSamplesPerSec=50.62441532185822, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:18:34,793] [INFO] [logging.py:96:log_dist] [Rank 0] step=260, skipped=6, lr=[4.609438899557964e-06, 4.609438899557964e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 259|ppo_ep: 1|act_loss: -0.06292724609375|cri_loss: 0.046142578125|unsuper_loss: 0.0 average reward score: -3.861328125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.68%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.55 epoch: 0|step: 260|ppo_ep: 1|act_loss: -0.01197052001953125|cri_loss: 0.003261566162109375|unsuper_loss: 0.0 average reward score: -4.21484375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.45%) |Training time=0.81s (31.56%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.55 epoch: 0|step: 261|ppo_ep: 1|act_loss: 0.002353668212890625|cri_loss: 0.0032176971435546875|unsuper_loss: 0.0 average reward score: -3.95703125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.41%) |Training time=0.80s (31.61%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55 epoch: 0|step: 262|ppo_ep: 1|act_loss: -0.00551605224609375|cri_loss: 0.0008602142333984375|unsuper_loss: 0.0 average reward score: -5.80859375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.52%) |Training time=0.80s (31.57%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55 epoch: 0|step: 263|ppo_ep: 1|act_loss: 0.01611328125|cri_loss: 0.002994537353515625|unsuper_loss: 0.0 average reward score: -4.23046875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.66%) |Training time=0.80s (31.43%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55 epoch: 0|step: 264|ppo_ep: 1|act_loss: 0.01163482666015625|cri_loss: 0.0012903213500976562|unsuper_loss: 0.0 average reward score: -5.0390625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.53%) |Training time=0.80s (31.50%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55 epoch: 0|step: 265|ppo_ep: 1|act_loss: 0.01558685302734375|cri_loss: 0.0017652511596679688|unsuper_loss: 0.0 average reward score: -4.65234375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.52%) |Training time=0.80s (31.54%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55 epoch: 0|step: 266|ppo_ep: 1|act_loss: 0.05804443359375|cri_loss: 0.04339599609375|unsuper_loss: 0.0 average reward score: -4.12109375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.41%) |Training time=0.80s (31.55%) |Others=0.23 (9.04%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.55 epoch: 0|step: 267|ppo_ep: 1|act_loss: 0.0238189697265625|cri_loss: 0.0033740997314453125|unsuper_loss: 0.0 average reward score: -3.916015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.48s (58.42%) |Training time=0.83s (32.67%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55 epoch: 0|step: 268|ppo_ep: 1|act_loss: -0.029754638671875|cri_loss: 0.010528564453125|unsuper_loss: 0.0 average reward score: -3.60546875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.69%) |Training time=0.80s (31.41%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55 [2023-07-01 08:18:59,890] [INFO] [logging.py:96:log_dist] [Rank 0] step=270, skipped=8, lr=[8.818255905938371e-06, 8.818255905938371e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:19:00,067] [INFO] [timer.py:215:stop] epoch=0/micro_step=270/global_step=270, RunningAvgSamplesPerSec=51.022117875042674, CurrSamplesPerSec=50.931628955549996, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:19:00,234] [INFO] [logging.py:96:log_dist] [Rank 0] step=270, skipped=6, lr=[4.558664535734864e-06, 4.558664535734864e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 269|ppo_ep: 1|act_loss: 0.025909423828125|cri_loss: 0.00881195068359375|unsuper_loss: 0.0 average reward score: -3.75390625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.61%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55 epoch: 0|step: 270|ppo_ep: 1|act_loss: -0.041473388671875|cri_loss: 0.005794525146484375|unsuper_loss: 0.0 average reward score: -4.72265625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.56%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55 epoch: 0|step: 271|ppo_ep: 1|act_loss: -0.0340576171875|cri_loss: 0.003322601318359375|unsuper_loss: 0.0 average reward score: -3.287109375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.41%) |Training time=0.80s (31.61%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.55 epoch: 0|step: 272|ppo_ep: 1|act_loss: -0.01102447509765625|cri_loss: 0.0026416778564453125|unsuper_loss: 0.0 average reward score: -5.94921875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.70%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55 epoch: 0|step: 273|ppo_ep: 1|act_loss: -0.007747650146484375|cri_loss: 0.004848480224609375|unsuper_loss: 0.0 average reward score: -5.1015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.76%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.55 epoch: 0|step: 274|ppo_ep: 1|act_loss: 0.04083251953125|cri_loss: 0.007534027099609375|unsuper_loss: 0.0 average reward score: -6.27734375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.81s (31.70%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55 epoch: 0|step: 275|ppo_ep: 1|act_loss: 0.0180511474609375|cri_loss: 0.00904083251953125|unsuper_loss: 0.0 average reward score: -3.54296875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.31%) |Training time=0.81s (31.67%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55 epoch: 0|step: 276|ppo_ep: 1|act_loss: 0.0172882080078125|cri_loss: 0.0026454925537109375|unsuper_loss: 0.0 average reward score: -4.68359375 ------------------------------------------------------------------------------------- |E2E latency=2.56s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.48%) |Training time=0.81s (31.59%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.51 |AvgSamplesPerSec=12.55 epoch: 0|step: 277|ppo_ep: 1|act_loss: -0.0078582763671875|cri_loss: 0.002429962158203125|unsuper_loss: 0.0 average reward score: -3.79296875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.57%) |Training time=0.80s (31.49%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.55 epoch: 0|step: 278|ppo_ep: 1|act_loss: 0.00759124755859375|cri_loss: 0.0021152496337890625|unsuper_loss: 0.0 average reward score: -5.26171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.53%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.55 [2023-07-01 08:19:25,346] [INFO] [logging.py:96:log_dist] [Rank 0] step=280, skipped=8, lr=[8.715949439291823e-06, 8.715949439291823e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:19:25,525] [INFO] [timer.py:215:stop] epoch=0/micro_step=280/global_step=280, RunningAvgSamplesPerSec=51.010485936504146, CurrSamplesPerSec=50.63415543236121, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:19:25,691] [INFO] [logging.py:96:log_dist] [Rank 0] step=280, skipped=6, lr=[4.5051045600050906e-06, 4.5051045600050906e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 279|ppo_ep: 1|act_loss: -0.0251617431640625|cri_loss: 0.005886077880859375|unsuper_loss: 0.0 average reward score: -4.421875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.71%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55 epoch: 0|step: 280|ppo_ep: 1|act_loss: -0.01441192626953125|cri_loss: 0.006072998046875|unsuper_loss: 0.0 average reward score: -4.984375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.50%) |Training time=0.80s (31.54%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.55 epoch: 0|step: 281|ppo_ep: 1|act_loss: -0.037933349609375|cri_loss: 0.0198516845703125|unsuper_loss: 0.0 average reward score: -4.2890625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.56%) |Training time=0.80s (31.54%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55 epoch: 0|step: 282|ppo_ep: 1|act_loss: -0.0017490386962890625|cri_loss: 0.001552581787109375|unsuper_loss: 0.0 average reward score: -3.99609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.61%) |Training time=0.80s (31.45%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55 epoch: 0|step: 283|ppo_ep: 1|act_loss: -0.0034580230712890625|cri_loss: 0.0011396408081054688|unsuper_loss: 0.0 average reward score: -3.802734375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.58%) |Training time=0.80s (31.43%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55 epoch: 0|step: 284|ppo_ep: 1|act_loss: -0.00234222412109375|cri_loss: 0.004055023193359375|unsuper_loss: 0.0 average reward score: -2.943359375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.51%) |Training time=0.80s (31.52%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.55 epoch: 0|step: 285|ppo_ep: 1|act_loss: 0.005573272705078125|cri_loss: 0.002197265625|unsuper_loss: 0.0 average reward score: -6.40625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.51%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.55 epoch: 0|step: 286|ppo_ep: 1|act_loss: 0.02398681640625|cri_loss: 0.006076812744140625|unsuper_loss: 0.0 average reward score: -3.583984375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.69%) |Training time=0.80s (31.44%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55 epoch: 0|step: 287|ppo_ep: 1|act_loss: 0.01427459716796875|cri_loss: 0.001983642578125|unsuper_loss: 0.0 average reward score: -4.96875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.53%) |Training time=0.80s (31.47%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55 epoch: 0|step: 288|ppo_ep: 1|act_loss: -0.00508880615234375|cri_loss: 0.004119873046875|unsuper_loss: 0.0 average reward score: -3.2890625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.20%) |Training time=0.81s (31.81%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.55 [2023-07-01 08:19:50,788] [INFO] [logging.py:96:log_dist] [Rank 0] step=290, skipped=8, lr=[8.608378066732629e-06, 8.608378066732629e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:19:50,970] [INFO] [timer.py:215:stop] epoch=0/micro_step=290/global_step=290, RunningAvgSamplesPerSec=51.00704543891052, CurrSamplesPerSec=50.64934602292882, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:19:51,136] [INFO] [logging.py:96:log_dist] [Rank 0] step=290, skipped=6, lr=[4.448831445228368e-06, 4.448831445228368e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 289|ppo_ep: 1|act_loss: -0.01418304443359375|cri_loss: 0.0033626556396484375|unsuper_loss: 0.0 average reward score: -4.8359375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.70%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55 epoch: 0|step: 290|ppo_ep: 1|act_loss: -0.0270233154296875|cri_loss: 0.00750732421875|unsuper_loss: 0.0 average reward score: -4.3671875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.81s (31.71%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55 epoch: 0|step: 291|ppo_ep: 1|act_loss: -0.026336669921875|cri_loss: 0.00798797607421875|unsuper_loss: 0.0 average reward score: -4.0546875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.54%) |Training time=0.80s (31.50%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55 epoch: 0|step: 292|ppo_ep: 1|act_loss: -0.0012111663818359375|cri_loss: 0.00342559814453125|unsuper_loss: 0.0 average reward score: -5.90234375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.57%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55 epoch: 0|step: 293|ppo_ep: 1|act_loss: -0.0221099853515625|cri_loss: 0.005115509033203125|unsuper_loss: 0.0 average reward score: -5.4375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.81s (31.72%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56 epoch: 0|step: 294|ppo_ep: 1|act_loss: 0.0111846923828125|cri_loss: 0.0013399124145507812|unsuper_loss: 0.0 average reward score: -3.54296875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.66%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56 epoch: 0|step: 295|ppo_ep: 1|act_loss: -0.004726409912109375|cri_loss: 0.001857757568359375|unsuper_loss: 0.0 average reward score: -4.54296875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.69%) |Training time=0.80s (31.45%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56 epoch: 0|step: 296|ppo_ep: 1|act_loss: 0.0183868408203125|cri_loss: 0.00313568115234375|unsuper_loss: 0.0 average reward score: -4.5 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.53%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.56 epoch: 0|step: 297|ppo_ep: 1|act_loss: 0.016632080078125|cri_loss: 0.003253936767578125|unsuper_loss: 0.0 average reward score: -4.171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.51%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56 epoch: 0|step: 298|ppo_ep: 1|act_loss: 0.00775909423828125|cri_loss: 0.0005536079406738281|unsuper_loss: 0.0 average reward score: -3.337890625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.58%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56 [2023-07-01 08:20:16,226] [INFO] [logging.py:96:log_dist] [Rank 0] step=300, skipped=8, lr=[8.495687344805339e-06, 8.495687344805339e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:20:16,404] [INFO] [timer.py:215:stop] epoch=0/micro_step=300/global_step=300, RunningAvgSamplesPerSec=51.00193515656933, CurrSamplesPerSec=50.82687970238014, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:20:16,570] [INFO] [logging.py:96:log_dist] [Rank 0] step=300, skipped=6, lr=[4.389921335456253e-06, 4.389921335456253e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 299|ppo_ep: 1|act_loss: 0.01059722900390625|cri_loss: 0.004077911376953125|unsuper_loss: 0.0 average reward score: -4.43359375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.59%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56 [2023-07-01 08:20:19,107] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, but hysteresis is 2. Reducing hysteresis to 1 epoch: 0|step: 300|ppo_ep: 1|act_loss: -0.0200347900390625|cri_loss: 0.0028228759765625|unsuper_loss: 0.0 average reward score: -3.787109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.52s (60.84%) |Training time=0.80s (31.95%) |Others=0.18 (7.21%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.56 epoch: 0|step: 301|ppo_ep: 1|act_loss: -0.0256195068359375|cri_loss: 0.004558563232421875|unsuper_loss: 0.0 average reward score: -2.978515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.64%) |Training time=0.80s (31.45%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56 epoch: 0|step: 302|ppo_ep: 1|act_loss: -0.0184783935546875|cri_loss: 0.0016565322875976562|unsuper_loss: 0.0 average reward score: -3.796875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.55%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.56 epoch: 0|step: 303|ppo_ep: 1|act_loss: -0.0037708282470703125|cri_loss: 0.003780364990234375|unsuper_loss: 0.0 average reward score: -6.62890625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.52%) |Training time=0.80s (31.46%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.56 epoch: 0|step: 304|ppo_ep: 1|act_loss: -0.03582763671875|cri_loss: 0.00493621826171875|unsuper_loss: 0.0 average reward score: -3.51953125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.45%) |Training time=0.81s (31.62%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.56 epoch: 0|step: 305|ppo_ep: 1|act_loss: 0.00661468505859375|cri_loss: 0.0184326171875|unsuper_loss: 0.0 average reward score: -3.390625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.60%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56 epoch: 0|step: 306|ppo_ep: 1|act_loss: 0.0426025390625|cri_loss: 0.006927490234375|unsuper_loss: 0.0 average reward score: -5.01953125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.45%) |Training time=0.80s (31.58%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.56 epoch: 0|step: 307|ppo_ep: 1|act_loss: 0.003116607666015625|cri_loss: 0.0016851425170898438|unsuper_loss: 0.0 average reward score: -4.0859375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.70%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.56 epoch: 0|step: 308|ppo_ep: 1|act_loss: 0.0036067962646484375|cri_loss: 0.01727294921875|unsuper_loss: 0.0 average reward score: -5.296875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.81s (31.70%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.56 [2023-07-01 08:20:41,657] [INFO] [logging.py:96:log_dist] [Rank 0] step=310, skipped=8, lr=[8.37802975712801e-06, 8.37802975712801e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:20:41,840] [INFO] [timer.py:215:stop] epoch=0/micro_step=310/global_step=310, RunningAvgSamplesPerSec=50.99398029145669, CurrSamplesPerSec=50.74482505177784, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:20:42,004] [INFO] [logging.py:96:log_dist] [Rank 0] step=310, skipped=7, lr=[4.334713416080498e-06, 4.334713416080498e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 309|ppo_ep: 1|act_loss: -0.002155303955078125|cri_loss: 0.0010766983032226562|unsuper_loss: 0.0 average reward score: -4.15625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.63%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56 epoch: 0|step: 310|ppo_ep: 1|act_loss: -0.01367950439453125|cri_loss: 0.0010881423950195312|unsuper_loss: 0.0 average reward score: -3.21875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.54%) |Training time=0.80s (31.54%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56 epoch: 0|step: 311|ppo_ep: 1|act_loss: -0.005771636962890625|cri_loss: 0.0018682479858398438|unsuper_loss: 0.0 average reward score: -4.875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.71%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56 epoch: 0|step: 312|ppo_ep: 1|act_loss: -0.01544952392578125|cri_loss: 0.0027294158935546875|unsuper_loss: 0.0 average reward score: -4.171875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.58%) |Training time=0.80s (31.46%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56 epoch: 0|step: 313|ppo_ep: 1|act_loss: 0.0285797119140625|cri_loss: 0.01299285888671875|unsuper_loss: 0.0 average reward score: -5.2265625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.81s (31.71%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56 epoch: 0|step: 314|ppo_ep: 1|act_loss: 0.00305938720703125|cri_loss: 0.002655029296875|unsuper_loss: 0.0 average reward score: -6.28515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.68%) |Training time=0.80s (31.42%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.56 epoch: 0|step: 315|ppo_ep: 1|act_loss: -0.0283660888671875|cri_loss: 0.003993988037109375|unsuper_loss: 0.0 average reward score: -3.17578125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.64%) |Training time=0.80s (31.47%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56 epoch: 0|step: 316|ppo_ep: 1|act_loss: -0.013427734375|cri_loss: 0.0017595291137695312|unsuper_loss: 0.0 average reward score: -4.53515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.50%) |Training time=0.80s (31.58%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56 epoch: 0|step: 317|ppo_ep: 1|act_loss: -0.0115814208984375|cri_loss: 0.0017938613891601562|unsuper_loss: 0.0 average reward score: -3.041015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.61%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56 epoch: 0|step: 318|ppo_ep: 1|act_loss: 0.007427215576171875|cri_loss: 0.0013904571533203125|unsuper_loss: 0.0 average reward score: -4.140625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.62%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56 [2023-07-01 08:21:07,074] [INFO] [logging.py:96:log_dist] [Rank 0] step=320, skipped=8, lr=[8.25556450806418e-06, 8.25556450806418e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:21:07,252] [INFO] [timer.py:215:stop] epoch=0/micro_step=320/global_step=320, RunningAvgSamplesPerSec=50.99215107673227, CurrSamplesPerSec=51.20947440913713, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:21:07,416] [INFO] [logging.py:96:log_dist] [Rank 0] step=320, skipped=7, lr=[4.271015485202956e-06, 4.271015485202956e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 319|ppo_ep: 1|act_loss: 0.0148773193359375|cri_loss: 0.001949310302734375|unsuper_loss: 0.0 average reward score: -4.453125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.54%) |Training time=0.80s (31.52%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56 epoch: 0|step: 320|ppo_ep: 1|act_loss: -0.01023101806640625|cri_loss: 0.002716064453125|unsuper_loss: 0.0 average reward score: -4.55859375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.54%) |Training time=0.80s (31.50%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.56 epoch: 0|step: 321|ppo_ep: 1|act_loss: 0.02069091796875|cri_loss: 0.004039764404296875|unsuper_loss: 0.0 average reward score: -4.16015625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.46%) |Training time=0.81s (31.60%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.56 epoch: 0|step: 322|ppo_ep: 1|act_loss: -0.01363372802734375|cri_loss: 0.0022792816162109375|unsuper_loss: 0.0 average reward score: -4.55078125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.43%) |Training time=0.81s (31.63%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.56 epoch: 0|step: 323|ppo_ep: 1|act_loss: 0.001903533935546875|cri_loss: 0.0011644363403320312|unsuper_loss: 0.0 average reward score: -4.76953125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.62%) |Training time=0.80s (31.48%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56 epoch: 0|step: 324|ppo_ep: 1|act_loss: -0.005157470703125|cri_loss: 0.01177215576171875|unsuper_loss: 0.0 average reward score: -4.0859375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.56%) |Training time=0.80s (31.51%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.56 epoch: 0|step: 325|ppo_ep: 1|act_loss: 0.01084136962890625|cri_loss: 0.0009832382202148438|unsuper_loss: 0.0 average reward score: -2.939453125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.81s (31.60%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56 epoch: 0|step: 326|ppo_ep: 1|act_loss: 0.0060577392578125|cri_loss: 0.0016355514526367188|unsuper_loss: 0.0 average reward score: -5.07421875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.81s (31.59%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56 epoch: 0|step: 327|ppo_ep: 1|act_loss: 0.01934814453125|cri_loss: 0.0032520294189453125|unsuper_loss: 0.0 average reward score: -5.1640625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.71%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56 epoch: 0|step: 328|ppo_ep: 1|act_loss: 0.015380859375|cri_loss: 0.0012655258178710938|unsuper_loss: 0.0 average reward score: -4.75 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.65%) |Training time=0.80s (31.41%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56 [2023-07-01 08:21:32,548] [INFO] [logging.py:96:log_dist] [Rank 0] step=330, skipped=8, lr=[8.12845730730089e-06, 8.12845730730089e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:21:32,725] [INFO] [timer.py:215:stop] epoch=0/micro_step=330/global_step=330, RunningAvgSamplesPerSec=50.98691248485327, CurrSamplesPerSec=51.31480764267895, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:21:32,891] [INFO] [logging.py:96:log_dist] [Rank 0] step=330, skipped=7, lr=[4.204921164949269e-06, 4.204921164949269e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 329|ppo_ep: 1|act_loss: 0.0101776123046875|cri_loss: 0.0007042884826660156|unsuper_loss: 0.0 average reward score: -3.66796875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.64%) |Training time=0.80s (31.42%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56 epoch: 0|step: 330|ppo_ep: 1|act_loss: -0.0103912353515625|cri_loss: 0.0004925727844238281|unsuper_loss: 0.0 average reward score: -4.38671875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.53%) |Training time=0.80s (31.53%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56 epoch: 0|step: 331|ppo_ep: 1|act_loss: 0.011505126953125|cri_loss: 0.005535125732421875|unsuper_loss: 0.0 average reward score: -4.19921875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.57%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56 epoch: 0|step: 332|ppo_ep: 1|act_loss: -0.0077362060546875|cri_loss: 0.0012226104736328125|unsuper_loss: 0.0 average reward score: -5.90625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.65%) |Training time=0.80s (31.46%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56 epoch: 0|step: 333|ppo_ep: 1|act_loss: 0.0028171539306640625|cri_loss: 0.0014162063598632812|unsuper_loss: 0.0 average reward score: -4.64453125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.56%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.56 epoch: 0|step: 334|ppo_ep: 1|act_loss: -0.0133514404296875|cri_loss: 0.0021381378173828125|unsuper_loss: 0.0 average reward score: -3.041015625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.61%) |Training time=0.80s (31.43%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56 epoch: 0|step: 335|ppo_ep: 1|act_loss: -0.01678466796875|cri_loss: 0.00934600830078125|unsuper_loss: 0.0 average reward score: -4.55078125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.80s (31.62%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56 epoch: 0|step: 336|ppo_ep: 1|act_loss: 0.0301971435546875|cri_loss: 0.005153656005859375|unsuper_loss: 0.0 average reward score: -4.515625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.72%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56 epoch: 0|step: 337|ppo_ep: 1|act_loss: 0.0266571044921875|cri_loss: 0.0028934478759765625|unsuper_loss: 0.0 average reward score: -3.13671875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.22%) |Training time=0.81s (31.85%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56 epoch: 0|step: 338|ppo_ep: 1|act_loss: 0.006450653076171875|cri_loss: 0.0003998279571533203|unsuper_loss: 0.0 average reward score: -3.95703125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.79%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56 [2023-07-01 08:21:57,993] [INFO] [logging.py:96:log_dist] [Rank 0] step=340, skipped=8, lr=[7.996880145624267e-06, 7.996880145624267e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:21:58,175] [INFO] [timer.py:215:stop] epoch=0/micro_step=340/global_step=340, RunningAvgSamplesPerSec=50.97940516109481, CurrSamplesPerSec=50.43255724297354, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:21:58,341] [INFO] [logging.py:96:log_dist] [Rank 0] step=340, skipped=7, lr=[4.136519888601191e-06, 4.136519888601191e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 339|ppo_ep: 1|act_loss: 0.0111236572265625|cri_loss: 0.00136566162109375|unsuper_loss: 0.0 average reward score: -4.484375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.81s (31.73%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.56 epoch: 0|step: 340|ppo_ep: 1|act_loss: -0.007747650146484375|cri_loss: 0.0022678375244140625|unsuper_loss: 0.0 average reward score: -4.28125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.54%) |Training time=0.80s (31.47%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.56 epoch: 0|step: 341|ppo_ep: 1|act_loss: -0.0238037109375|cri_loss: 0.006744384765625|unsuper_loss: 0.0 average reward score: -3.775390625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.47%) |Training time=0.81s (31.63%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56 epoch: 0|step: 342|ppo_ep: 1|act_loss: -0.08868408203125|cri_loss: 0.07049560546875|unsuper_loss: 0.0 average reward score: -3.43359375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.68%) |Training time=0.80s (31.42%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56 epoch: 0|step: 343|ppo_ep: 1|act_loss: -0.0002334117889404297|cri_loss: 0.00783538818359375|unsuper_loss: 0.0 average reward score: -3.173828125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.56%) |Training time=0.80s (31.52%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56 epoch: 0|step: 344|ppo_ep: 1|act_loss: -0.0144195556640625|cri_loss: 0.0038890838623046875|unsuper_loss: 0.0 average reward score: -5.69140625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.50%) |Training time=0.80s (31.56%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56 epoch: 0|step: 345|ppo_ep: 1|act_loss: -0.027252197265625|cri_loss: 0.002620697021484375|unsuper_loss: 0.0 average reward score: -3.32421875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.48s (58.12%) |Training time=0.83s (32.85%) |Others=0.23 (9.03%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56 epoch: 0|step: 346|ppo_ep: 1|act_loss: 0.0034809112548828125|cri_loss: 0.0004315376281738281|unsuper_loss: 0.0 average reward score: -5.625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.85%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56 epoch: 0|step: 347|ppo_ep: 1|act_loss: 0.03582763671875|cri_loss: 0.005672454833984375|unsuper_loss: 0.0 average reward score: -2.66015625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.39%) |Training time=0.80s (31.68%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.56 epoch: 0|step: 348|ppo_ep: 1|act_loss: 0.032073974609375|cri_loss: 0.007068634033203125|unsuper_loss: 0.0 average reward score: -3.88671875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.21%) |Training time=0.81s (31.80%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.56 [2023-07-01 08:22:23,415] [INFO] [logging.py:96:log_dist] [Rank 0] step=350, skipped=8, lr=[7.861011062196035e-06, 7.861011062196035e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:22:23,593] [INFO] [timer.py:215:stop] epoch=0/micro_step=350/global_step=350, RunningAvgSamplesPerSec=50.96653744295602, CurrSamplesPerSec=50.626515813038566, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:22:23,759] [INFO] [logging.py:96:log_dist] [Rank 0] step=350, skipped=7, lr=[4.0659042110196635e-06, 4.0659042110196635e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 349|ppo_ep: 1|act_loss: 0.0369873046875|cri_loss: 0.00560760498046875|unsuper_loss: 0.0 average reward score: -4.5859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.75%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56 epoch: 0|step: 350|ppo_ep: 1|act_loss: 0.031951904296875|cri_loss: 0.0038471221923828125|unsuper_loss: 0.0 average reward score: -5.29296875 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.37%) |Training time=0.80s (31.66%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.56 epoch: 0|step: 351|ppo_ep: 1|act_loss: 0.03955078125|cri_loss: 0.006744384765625|unsuper_loss: 0.0 average reward score: -4.9375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.25%) |Training time=0.81s (31.77%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.56 epoch: 0|step: 352|ppo_ep: 1|act_loss: 0.01468658447265625|cri_loss: 0.0015878677368164062|unsuper_loss: 0.0 average reward score: -3.06640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.29%) |Training time=0.80s (31.73%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.56 epoch: 0|step: 353|ppo_ep: 1|act_loss: 0.01084136962890625|cri_loss: 0.001827239990234375|unsuper_loss: 0.0 average reward score: -5.45703125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (31.93%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56 epoch: 0|step: 354|ppo_ep: 1|act_loss: 0.0125274658203125|cri_loss: 0.0015544891357421875|unsuper_loss: 0.0 average reward score: -4.9296875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.73%) |Training time=0.82s (32.30%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.56 epoch: 0|step: 355|ppo_ep: 1|act_loss: -0.02801513671875|cri_loss: 0.00262451171875|unsuper_loss: 0.0 average reward score: -5.140625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.85%) |Training time=0.82s (32.22%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56 epoch: 0|step: 356|ppo_ep: 1|act_loss: -0.0174407958984375|cri_loss: 0.0144805908203125|unsuper_loss: 0.0 average reward score: -4.74609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.00%) |Training time=0.82s (32.05%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56 epoch: 0|step: 357|ppo_ep: 1|act_loss: -0.053009033203125|cri_loss: 0.01143646240234375|unsuper_loss: 0.0 average reward score: -4.86328125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.03%) |Training time=0.81s (31.99%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56 epoch: 0|step: 358|ppo_ep: 1|act_loss: -0.052337646484375|cri_loss: 0.0169219970703125|unsuper_loss: 0.0 average reward score: -3.61328125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (31.96%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56 [2023-07-01 08:22:48,837] [INFO] [logging.py:96:log_dist] [Rank 0] step=360, skipped=8, lr=[7.721033903645878e-06, 7.721033903645878e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:22:49,017] [INFO] [timer.py:215:stop] epoch=0/micro_step=360/global_step=360, RunningAvgSamplesPerSec=50.94331245280376, CurrSamplesPerSec=50.46872104636192, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:22:49,183] [INFO] [logging.py:96:log_dist] [Rank 0] step=360, skipped=7, lr=[3.993169683407347e-06, 3.993169683407347e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 359|ppo_ep: 1|act_loss: -0.08758544921875|cri_loss: 0.039337158203125|unsuper_loss: 0.0 average reward score: -4.328125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.24%) |Training time=0.81s (31.85%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56 epoch: 0|step: 360|ppo_ep: 1|act_loss: -0.01544952392578125|cri_loss: 0.002696990966796875|unsuper_loss: 0.0 average reward score: -6.09765625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.38%) |Training time=0.80s (31.68%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.56 epoch: 0|step: 361|ppo_ep: 1|act_loss: -0.0236053466796875|cri_loss: 0.01218414306640625|unsuper_loss: 0.0 average reward score: -4.5078125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.41%) |Training time=0.80s (31.66%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56 epoch: 0|step: 362|ppo_ep: 1|act_loss: 0.01107025146484375|cri_loss: 0.0016889572143554688|unsuper_loss: 0.0 average reward score: -5.65234375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.80s (31.64%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56 epoch: 0|step: 363|ppo_ep: 1|act_loss: 0.02008056640625|cri_loss: 0.0062103271484375|unsuper_loss: 0.0 average reward score: -5.75390625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.28%) |Training time=0.81s (31.78%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56 epoch: 0|step: 364|ppo_ep: 1|act_loss: -0.0191650390625|cri_loss: 0.0195770263671875|unsuper_loss: 0.0 average reward score: -3.40234375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.25%) |Training time=0.81s (31.83%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56 epoch: 0|step: 365|ppo_ep: 1|act_loss: 0.028167724609375|cri_loss: 0.00469970703125|unsuper_loss: 0.0 average reward score: -4.2265625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.49%) |Training time=0.80s (31.55%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.66 |AvgSamplesPerSec=12.56 epoch: 0|step: 366|ppo_ep: 1|act_loss: 0.0152435302734375|cri_loss: 0.0066375732421875|unsuper_loss: 0.0 average reward score: -3.33203125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.44%) |Training time=0.80s (31.58%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.56 epoch: 0|step: 367|ppo_ep: 1|act_loss: 0.0253143310546875|cri_loss: 0.00949859619140625|unsuper_loss: 0.0 average reward score: -4.7890625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.16%) |Training time=0.81s (31.82%) |Others=0.23 (9.02%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56 epoch: 0|step: 368|ppo_ep: 1|act_loss: -0.0552978515625|cri_loss: 0.061614990234375|unsuper_loss: 0.0 average reward score: -3.328125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.28%) |Training time=0.81s (31.72%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56 [2023-07-01 08:23:14,206] [INFO] [logging.py:96:log_dist] [Rank 0] step=370, skipped=8, lr=[7.5771380753056264e-06, 7.5771380753056264e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:23:14,388] [INFO] [timer.py:215:stop] epoch=0/micro_step=370/global_step=370, RunningAvgSamplesPerSec=50.93774701231196, CurrSamplesPerSec=50.07629021757177, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:23:14,554] [INFO] [logging.py:96:log_dist] [Rank 0] step=370, skipped=7, lr=[3.918414724016767e-06, 3.918414724016767e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 369|ppo_ep: 1|act_loss: 0.0063323974609375|cri_loss: 0.0034465789794921875|unsuper_loss: 0.0 average reward score: -4.95703125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.07%) |Training time=0.81s (32.00%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56 epoch: 0|step: 370|ppo_ep: 1|act_loss: -0.0552978515625|cri_loss: 0.0390625|unsuper_loss: 0.0 average reward score: -4.578125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.97%) |Training time=0.81s (32.04%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56 epoch: 0|step: 371|ppo_ep: 1|act_loss: 0.01149749755859375|cri_loss: 0.0013828277587890625|unsuper_loss: 0.0 average reward score: -4.96484375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.80s (31.66%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56 epoch: 0|step: 372|ppo_ep: 1|act_loss: -0.0081939697265625|cri_loss: 0.0030345916748046875|unsuper_loss: 0.0 average reward score: -4.76171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.24%) |Training time=0.81s (31.77%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56 epoch: 0|step: 373|ppo_ep: 1|act_loss: 0.0308380126953125|cri_loss: 0.01311492919921875|unsuper_loss: 0.0 average reward score: -3.8203125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.02%) |Training time=0.82s (32.05%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56 epoch: 0|step: 374|ppo_ep: 1|act_loss: -0.01399993896484375|cri_loss: 0.005680084228515625|unsuper_loss: 0.0 average reward score: -4.66015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.15%) |Training time=0.81s (31.92%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56 epoch: 0|step: 375|ppo_ep: 1|act_loss: -0.01334381103515625|cri_loss: 0.002254486083984375|unsuper_loss: 0.0 average reward score: -4.1875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.12%) |Training time=0.81s (32.01%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56 epoch: 0|step: 376|ppo_ep: 1|act_loss: -0.0399169921875|cri_loss: 0.006500244140625|unsuper_loss: 0.0 average reward score: -4.72265625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.17%) |Training time=0.81s (31.85%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56 epoch: 0|step: 377|ppo_ep: 1|act_loss: -0.066162109375|cri_loss: 0.05963134765625|unsuper_loss: 0.0 average reward score: -3.533203125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.14%) |Training time=0.81s (31.87%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56 epoch: 0|step: 378|ppo_ep: 1|act_loss: 0.0028820037841796875|cri_loss: 0.0013561248779296875|unsuper_loss: 0.0 average reward score: -5.28515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.88%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56 [2023-07-01 08:23:39,613] [INFO] [logging.py:96:log_dist] [Rank 0] step=380, skipped=8, lr=[7.429518284921874e-06, 7.429518284921874e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:23:39,791] [INFO] [timer.py:215:stop] epoch=0/micro_step=380/global_step=380, RunningAvgSamplesPerSec=50.92096092837207, CurrSamplesPerSec=50.579792461817675, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:23:39,955] [INFO] [logging.py:96:log_dist] [Rank 0] step=380, skipped=7, lr=[3.841740484979002e-06, 3.841740484979002e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 379|ppo_ep: 1|act_loss: 0.0008597373962402344|cri_loss: 0.0011796951293945312|unsuper_loss: 0.0 average reward score: -4.34765625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.26%) |Training time=0.81s (31.85%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.56 epoch: 0|step: 380|ppo_ep: 1|act_loss: 0.01435089111328125|cri_loss: 0.003143310546875|unsuper_loss: 0.0 average reward score: -4.78515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.58%) |Training time=0.80s (31.48%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.56 epoch: 0|step: 381|ppo_ep: 1|act_loss: 0.0233306884765625|cri_loss: 0.0116119384765625|unsuper_loss: 0.0 average reward score: -6.07421875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.16%) |Training time=0.81s (31.93%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56 epoch: 0|step: 382|ppo_ep: 1|act_loss: 0.047271728515625|cri_loss: 0.0091552734375|unsuper_loss: 0.0 average reward score: -3.08203125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.80s (31.69%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.56 epoch: 0|step: 383|ppo_ep: 1|act_loss: 0.009613037109375|cri_loss: 0.0009813308715820312|unsuper_loss: 0.0 average reward score: -5.50390625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.31%) |Training time=0.80s (31.72%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.66 |AvgSamplesPerSec=12.56 epoch: 0|step: 384|ppo_ep: 1|act_loss: -0.0247039794921875|cri_loss: 0.005100250244140625|unsuper_loss: 0.0 average reward score: -3.05859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.87%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56 epoch: 0|step: 385|ppo_ep: 1|act_loss: -0.032440185546875|cri_loss: 0.0112762451171875|unsuper_loss: 0.0 average reward score: -6.33203125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.80s (31.62%) |Others=0.23 (9.02%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56 epoch: 0|step: 386|ppo_ep: 1|act_loss: -0.008636474609375|cri_loss: 0.0224456787109375|unsuper_loss: 0.0 average reward score: -5.6640625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.19%) |Training time=0.81s (31.82%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56 epoch: 0|step: 387|ppo_ep: 1|act_loss: -0.01513671875|cri_loss: 0.005428314208984375|unsuper_loss: 0.0 average reward score: -5.59375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.18%) |Training time=0.81s (31.86%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.56 epoch: 0|step: 388|ppo_ep: 1|act_loss: -0.0235748291015625|cri_loss: 0.0228729248046875|unsuper_loss: 0.0 average reward score: -4.10546875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.22%) |Training time=0.81s (31.87%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56 [2023-07-01 08:24:05,006] [INFO] [logging.py:96:log_dist] [Rank 0] step=390, skipped=8, lr=[7.278374279193815e-06, 7.278374279193815e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:24:05,189] [INFO] [timer.py:215:stop] epoch=0/micro_step=390/global_step=390, RunningAvgSamplesPerSec=50.91174566441504, CurrSamplesPerSec=50.332623194574396, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:24:05,354] [INFO] [logging.py:96:log_dist] [Rank 0] step=390, skipped=7, lr=[3.763250715433111e-06, 3.763250715433111e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 389|ppo_ep: 1|act_loss: 0.004604339599609375|cri_loss: 0.01288604736328125|unsuper_loss: 0.0 average reward score: -3.111328125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.89%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56 epoch: 0|step: 390|ppo_ep: 1|act_loss: 0.01189422607421875|cri_loss: 0.0021514892578125|unsuper_loss: 0.0 average reward score: -4.796875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.81s (31.72%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56 epoch: 0|step: 391|ppo_ep: 1|act_loss: 0.009765625|cri_loss: 0.00048041343688964844|unsuper_loss: 0.0 average reward score: -4.7578125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.21%) |Training time=0.81s (31.84%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56 epoch: 0|step: 392|ppo_ep: 1|act_loss: 0.021026611328125|cri_loss: 0.0011301040649414062|unsuper_loss: 0.0 average reward score: -3.669921875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.08%) |Training time=0.81s (31.97%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56 epoch: 0|step: 393|ppo_ep: 1|act_loss: 0.0030460357666015625|cri_loss: 0.0036773681640625|unsuper_loss: 0.0 average reward score: -3.9609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.17%) |Training time=0.81s (31.94%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56 epoch: 0|step: 394|ppo_ep: 1|act_loss: -0.05999755859375|cri_loss: 0.07647705078125|unsuper_loss: 0.0 average reward score: -3.859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.14%) |Training time=0.81s (31.95%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56 epoch: 0|step: 395|ppo_ep: 1|act_loss: -0.00983428955078125|cri_loss: 0.0021266937255859375|unsuper_loss: 0.0 average reward score: -5.46875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.16%) |Training time=0.81s (31.91%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56 epoch: 0|step: 396|ppo_ep: 1|act_loss: -0.056915283203125|cri_loss: 0.0280914306640625|unsuper_loss: 0.0 average reward score: -3.609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.56%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56 epoch: 0|step: 397|ppo_ep: 1|act_loss: 0.005809783935546875|cri_loss: 0.003498077392578125|unsuper_loss: 0.0 average reward score: -3.99609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.58%) |Training time=0.80s (31.54%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.56 epoch: 0|step: 398|ppo_ep: 1|act_loss: -0.027435302734375|cri_loss: 0.005130767822265625|unsuper_loss: 0.0 average reward score: -3.90625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.60%) |Training time=0.80s (31.48%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.56 [2023-07-01 08:24:30,402] [INFO] [logging.py:96:log_dist] [Rank 0] step=400, skipped=8, lr=[7.1239105734927765e-06, 7.1239105734927765e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:24:30,579] [INFO] [timer.py:215:stop] epoch=0/micro_step=400/global_step=400, RunningAvgSamplesPerSec=50.90512310167677, CurrSamplesPerSec=51.14447434464079, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:24:30,746] [INFO] [logging.py:96:log_dist] [Rank 0] step=400, skipped=7, lr=[3.6830516211415224e-06, 3.6830516211415224e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 399|ppo_ep: 1|act_loss: -0.00937652587890625|cri_loss: 0.0011844635009765625|unsuper_loss: 0.0 average reward score: -5.375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.58%) |Training time=0.80s (31.44%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56 epoch: 0|step: 400|ppo_ep: 1|act_loss: -0.01446533203125|cri_loss: 0.0028743743896484375|unsuper_loss: 0.0 average reward score: -3.265625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.43%) |Training time=0.81s (31.61%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.56 epoch: 0|step: 401|ppo_ep: 1|act_loss: 0.0172882080078125|cri_loss: 0.01392364501953125|unsuper_loss: 0.0 average reward score: -4.96875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.43%) |Training time=0.81s (31.65%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.56 epoch: 0|step: 402|ppo_ep: 1|act_loss: 0.0172271728515625|cri_loss: 0.010528564453125|unsuper_loss: 0.0 average reward score: -4.82421875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.62%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56 epoch: 0|step: 403|ppo_ep: 1|act_loss: -0.031341552734375|cri_loss: 0.017852783203125|unsuper_loss: 0.0 average reward score: -2.9453125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.60%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56 epoch: 0|step: 404|ppo_ep: 1|act_loss: 0.0006022453308105469|cri_loss: 0.0010471343994140625|unsuper_loss: 0.0 average reward score: -4.4609375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.66%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56 epoch: 0|step: 405|ppo_ep: 1|act_loss: 0.0167999267578125|cri_loss: 0.0020847320556640625|unsuper_loss: 0.0 average reward score: -4.83203125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.64%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.56 epoch: 0|step: 406|ppo_ep: 1|act_loss: 0.0004553794860839844|cri_loss: 0.0025634765625|unsuper_loss: 0.0 average reward score: -5.453125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.81s (31.64%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56 epoch: 0|step: 407|ppo_ep: 1|act_loss: 0.004215240478515625|cri_loss: 0.0004992485046386719|unsuper_loss: 0.0 average reward score: -3.6875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.62%) |Training time=0.80s (31.47%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56 epoch: 0|step: 408|ppo_ep: 1|act_loss: 0.0067291259765625|cri_loss: 0.0009660720825195312|unsuper_loss: 0.0 average reward score: -5.34765625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.60%) |Training time=0.80s (31.41%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56 [2023-07-01 08:24:55,873] [INFO] [logging.py:96:log_dist] [Rank 0] step=410, skipped=8, lr=[6.966336175129223e-06, 6.966336175129223e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:24:56,050] [INFO] [timer.py:215:stop] epoch=0/micro_step=410/global_step=410, RunningAvgSamplesPerSec=50.90245252307474, CurrSamplesPerSec=50.837121499065205, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:24:56,217] [INFO] [logging.py:96:log_dist] [Rank 0] step=410, skipped=7, lr=[3.6012517207813124e-06, 3.6012517207813124e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 409|ppo_ep: 1|act_loss: 0.013397216796875|cri_loss: 0.0010662078857421875|unsuper_loss: 0.0 average reward score: -4.85546875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.53%) |Training time=0.80s (31.53%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.56 epoch: 0|step: 410|ppo_ep: 1|act_loss: 0.00982666015625|cri_loss: 0.0018157958984375|unsuper_loss: 0.0 average reward score: -4.09375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.54%) |Training time=0.80s (31.51%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.56 epoch: 0|step: 411|ppo_ep: 1|act_loss: 0.01397705078125|cri_loss: 0.002384185791015625|unsuper_loss: 0.0 average reward score: -5.1171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.65%) |Training time=0.80s (31.47%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56 epoch: 0|step: 412|ppo_ep: 1|act_loss: -0.0032558441162109375|cri_loss: 0.0014801025390625|unsuper_loss: 0.0 average reward score: -5.265625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.65%) |Training time=0.80s (31.49%) |Others=0.23 (8.86%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56 epoch: 0|step: 413|ppo_ep: 1|act_loss: -0.0190277099609375|cri_loss: 0.001567840576171875|unsuper_loss: 0.0 average reward score: -4.859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.60%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56 epoch: 0|step: 414|ppo_ep: 1|act_loss: -0.006626129150390625|cri_loss: 0.0016469955444335938|unsuper_loss: 0.0 average reward score: -4.3515625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.57%) |Training time=0.80s (31.48%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56 epoch: 0|step: 415|ppo_ep: 1|act_loss: 0.008636474609375|cri_loss: 0.002010345458984375|unsuper_loss: 0.0 average reward score: -4.42578125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.61%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56 epoch: 0|step: 416|ppo_ep: 1|act_loss: 0.0136260986328125|cri_loss: 0.00044918060302734375|unsuper_loss: 0.0 average reward score: -4.2734375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.58%) |Training time=0.80s (31.50%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56 epoch: 0|step: 417|ppo_ep: 1|act_loss: 0.0255889892578125|cri_loss: 0.0036296844482421875|unsuper_loss: 0.0 average reward score: -3.576171875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.71%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56 epoch: 0|step: 418|ppo_ep: 1|act_loss: 0.02801513671875|cri_loss: 0.0038089752197265625|unsuper_loss: 0.0 average reward score: -3.234375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.81s (31.71%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.56 [2023-07-01 08:25:21,325] [INFO] [logging.py:96:log_dist] [Rank 0] step=420, skipped=8, lr=[6.805864300541598e-06, 6.805864300541598e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:25:21,507] [INFO] [timer.py:215:stop] epoch=0/micro_step=420/global_step=420, RunningAvgSamplesPerSec=50.90029937105875, CurrSamplesPerSec=50.523415940629654, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:25:21,673] [INFO] [logging.py:96:log_dist] [Rank 0] step=420, skipped=7, lr=[3.5179616991058513e-06, 3.5179616991058513e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 419|ppo_ep: 1|act_loss: 0.029449462890625|cri_loss: 0.00270843505859375|unsuper_loss: 0.0 average reward score: -4.0625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.81s (31.72%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56 epoch: 0|step: 420|ppo_ep: 1|act_loss: 0.0204925537109375|cri_loss: 0.00555419921875|unsuper_loss: 0.0 average reward score: -4.6328125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.80s (31.64%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56 epoch: 0|step: 421|ppo_ep: 1|act_loss: 0.0174560546875|cri_loss: 0.0080413818359375|unsuper_loss: 0.0 average reward score: -4.3359375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.54%) |Training time=0.80s (31.55%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56 epoch: 0|step: 422|ppo_ep: 1|act_loss: 0.055419921875|cri_loss: 0.02764892578125|unsuper_loss: 0.0 average reward score: -4.83203125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.56%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56 epoch: 0|step: 423|ppo_ep: 1|act_loss: -0.01543426513671875|cri_loss: 0.0007390975952148438|unsuper_loss: 0.0 average reward score: -4.95703125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.44%) |Training time=0.80s (31.62%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56 epoch: 0|step: 424|ppo_ep: 1|act_loss: 0.0019207000732421875|cri_loss: 0.023956298828125|unsuper_loss: 0.0 average reward score: -4.03125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.60%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56 [2023-07-01 08:25:36,587] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, but hysteresis is 2. Reducing hysteresis to 1 epoch: 0|step: 425|ppo_ep: 1|act_loss: -0.0114898681640625|cri_loss: 0.0016374588012695312|unsuper_loss: 0.0 average reward score: -3.88671875 ------------------------------------------------------------------------------------- |E2E latency=2.36s |Gather latency=0.00s (0.00%) |Generate time=1.52s (64.44%) |Training time=0.61s (26.02%) |Others=0.22 (9.54%)|CurSamplesPerSec=13.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 426|ppo_ep: 1|act_loss: -0.022979736328125|cri_loss: 0.00567626953125|unsuper_loss: 0.0 average reward score: -4.6640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.72%) |Training time=0.80s (31.38%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 427|ppo_ep: 1|act_loss: -0.0080718994140625|cri_loss: 0.0008549690246582031|unsuper_loss: 0.0 average reward score: -4.078125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.61%) |Training time=0.80s (31.49%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 428|ppo_ep: 1|act_loss: -0.01349639892578125|cri_loss: 0.0019235610961914062|unsuper_loss: 0.0 average reward score: -6.19921875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.66%) |Training time=0.80s (31.42%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57 [2023-07-01 08:25:46,595] [INFO] [logging.py:96:log_dist] [Rank 0] step=430, skipped=9, lr=[6.659141658731728e-06, 6.659141658731728e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:25:46,773] [INFO] [timer.py:215:stop] epoch=0/micro_step=430/global_step=430, RunningAvgSamplesPerSec=50.936617309465866, CurrSamplesPerSec=50.9904122573574, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:25:46,938] [INFO] [logging.py:96:log_dist] [Rank 0] step=430, skipped=7, lr=[3.43329425717549e-06, 3.43329425717549e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 429|ppo_ep: 1|act_loss: -0.012054443359375|cri_loss: 0.0003750324249267578|unsuper_loss: 0.0 average reward score: -3.53125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.54%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 430|ppo_ep: 1|act_loss: -0.00502777099609375|cri_loss: 0.004634857177734375|unsuper_loss: 0.0 average reward score: -5.0703125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.87%) |Training time=0.79s (31.26%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 epoch: 0|step: 431|ppo_ep: 1|act_loss: 0.0088653564453125|cri_loss: 0.0036773681640625|unsuper_loss: 0.0 average reward score: -4.76953125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.69%) |Training time=0.80s (31.38%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 432|ppo_ep: 1|act_loss: 0.0074310302734375|cri_loss: 0.0007243156433105469|unsuper_loss: 0.0 average reward score: -4.0546875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.46%) |Training time=0.80s (31.58%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 433|ppo_ep: 1|act_loss: 0.0251007080078125|cri_loss: 0.0047454833984375|unsuper_loss: 0.0 average reward score: -4.16015625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.81s (31.66%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 434|ppo_ep: 1|act_loss: 0.0069580078125|cri_loss: 0.0017385482788085938|unsuper_loss: 0.0 average reward score: -5.0078125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.57%) |Training time=0.80s (31.51%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 435|ppo_ep: 1|act_loss: -0.0178985595703125|cri_loss: 0.0014028549194335938|unsuper_loss: 0.0 average reward score: -4.87890625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.55%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 epoch: 0|step: 436|ppo_ep: 1|act_loss: -0.015350341796875|cri_loss: 0.00339508056640625|unsuper_loss: 0.0 average reward score: -4.8984375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.80s (31.56%) |Others=0.23 (9.04%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 epoch: 0|step: 437|ppo_ep: 1|act_loss: 0.01360321044921875|cri_loss: 0.0025768280029296875|unsuper_loss: 0.0 average reward score: -4.46875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.80s (31.59%) |Others=0.23 (9.02%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 438|ppo_ep: 1|act_loss: 0.01806640625|cri_loss: 0.0021915435791015625|unsuper_loss: 0.0 average reward score: -3.462890625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.66%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 [2023-07-01 08:26:12,029] [INFO] [logging.py:96:log_dist] [Rank 0] step=440, skipped=9, lr=[6.493765795627752e-06, 6.493765795627752e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:26:12,209] [INFO] [timer.py:215:stop] epoch=0/micro_step=440/global_step=440, RunningAvgSamplesPerSec=50.93680599438725, CurrSamplesPerSec=51.115958488051035, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:26:12,373] [INFO] [logging.py:96:log_dist] [Rank 0] step=440, skipped=7, lr=[3.3473639598599567e-06, 3.3473639598599567e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 439|ppo_ep: 1|act_loss: 0.025421142578125|cri_loss: 0.00484466552734375|unsuper_loss: 0.0 average reward score: -7.24609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.57%) |Training time=0.80s (31.51%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57 epoch: 0|step: 440|ppo_ep: 1|act_loss: 0.0246124267578125|cri_loss: 0.0032405853271484375|unsuper_loss: 0.0 average reward score: -3.83984375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.61%) |Training time=0.80s (31.47%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.57 epoch: 0|step: 441|ppo_ep: 1|act_loss: 0.0003368854522705078|cri_loss: 0.005702972412109375|unsuper_loss: 0.0 average reward score: -5.40625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.59%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 epoch: 0|step: 442|ppo_ep: 1|act_loss: -0.00249481201171875|cri_loss: 0.0018873214721679688|unsuper_loss: 0.0 average reward score: -4.359375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.62%) |Training time=0.80s (31.45%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 443|ppo_ep: 1|act_loss: 0.003620147705078125|cri_loss: 0.00045180320739746094|unsuper_loss: 0.0 average reward score: -4.2578125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.61%) |Training time=0.80s (31.50%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 444|ppo_ep: 1|act_loss: 0.008819580078125|cri_loss: 0.00811004638671875|unsuper_loss: 0.0 average reward score: -6.05859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.64%) |Training time=0.80s (31.43%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57 epoch: 0|step: 445|ppo_ep: 1|act_loss: -0.057220458984375|cri_loss: 0.039154052734375|unsuper_loss: 0.0 average reward score: -4.6015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.60%) |Training time=0.80s (31.40%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57 epoch: 0|step: 446|ppo_ep: 1|act_loss: 0.0115966796875|cri_loss: 0.002429962158203125|unsuper_loss: 0.0 average reward score: -6.0625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.57%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 epoch: 0|step: 447|ppo_ep: 1|act_loss: 0.051116943359375|cri_loss: 0.021514892578125|unsuper_loss: 0.0 average reward score: -4.27734375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.81s (31.63%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 448|ppo_ep: 1|act_loss: 0.04486083984375|cri_loss: 0.0269927978515625|unsuper_loss: 0.0 average reward score: -4.484375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.50%) |Training time=0.80s (31.54%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 [2023-07-01 08:26:37,454] [INFO] [logging.py:96:log_dist] [Rank 0] step=450, skipped=9, lr=[6.326131898837833e-06, 6.326131898837833e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:26:37,636] [INFO] [timer.py:215:stop] epoch=0/micro_step=450/global_step=450, RunningAvgSamplesPerSec=50.939151605231494, CurrSamplesPerSec=51.02418006191282, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:26:37,801] [INFO] [logging.py:96:log_dist] [Rank 0] step=450, skipped=7, lr=[3.2602870808187955e-06, 3.2602870808187955e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 449|ppo_ep: 1|act_loss: -0.01153564453125|cri_loss: 0.0019741058349609375|unsuper_loss: 0.0 average reward score: -3.67578125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.55%) |Training time=0.80s (31.51%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 epoch: 0|step: 450|ppo_ep: 1|act_loss: -0.0297088623046875|cri_loss: 0.00568389892578125|unsuper_loss: 0.0 average reward score: -4.6328125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.81s (31.60%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 451|ppo_ep: 1|act_loss: 0.006378173828125|cri_loss: 0.0003757476806640625|unsuper_loss: 0.0 average reward score: -5.140625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.81s (31.64%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57 epoch: 0|step: 452|ppo_ep: 1|act_loss: 0.006011962890625|cri_loss: 0.0011262893676757812|unsuper_loss: 0.0 average reward score: -3.2734375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.81s (31.68%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57 epoch: 0|step: 453|ppo_ep: 1|act_loss: 0.22802734375|cri_loss: 0.79931640625|unsuper_loss: 0.0 average reward score: -3.765625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.49%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 454|ppo_ep: 1|act_loss: -0.008758544921875|cri_loss: 0.0018739700317382812|unsuper_loss: 0.0 average reward score: -3.94921875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.48%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57 epoch: 0|step: 455|ppo_ep: 1|act_loss: -0.015228271484375|cri_loss: 0.0011606216430664062|unsuper_loss: 0.0 average reward score: -4.4609375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.50%) |Training time=0.80s (31.56%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 456|ppo_ep: 1|act_loss: -0.0037593841552734375|cri_loss: 0.002017974853515625|unsuper_loss: 0.0 average reward score: -3.4296875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.52%) |Training time=0.80s (31.52%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 457|ppo_ep: 1|act_loss: -0.00904083251953125|cri_loss: 0.0014896392822265625|unsuper_loss: 0.0 average reward score: -5.22265625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.80s (31.60%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57 epoch: 0|step: 458|ppo_ep: 1|act_loss: 0.00726318359375|cri_loss: 0.0018491744995117188|unsuper_loss: 0.0 average reward score: -3.9140625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.75%) |Training time=0.80s (31.36%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 [2023-07-01 08:27:02,920] [INFO] [logging.py:96:log_dist] [Rank 0] step=460, skipped=9, lr=[6.1564667964686156e-06, 6.1564667964686156e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:27:03,098] [INFO] [timer.py:215:stop] epoch=0/micro_step=460/global_step=460, RunningAvgSamplesPerSec=50.93740045981754, CurrSamplesPerSec=51.238622863303334, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:27:03,264] [INFO] [logging.py:96:log_dist] [Rank 0] step=460, skipped=7, lr=[3.1721814451696215e-06, 3.1721814451696215e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 459|ppo_ep: 1|act_loss: 0.01074981689453125|cri_loss: 0.0009255409240722656|unsuper_loss: 0.0 average reward score: -3.005859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.62%) |Training time=0.80s (31.44%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57 epoch: 0|step: 460|ppo_ep: 1|act_loss: 0.0157012939453125|cri_loss: 0.0010919570922851562|unsuper_loss: 0.0 average reward score: -4.234375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.58%) |Training time=0.80s (31.44%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57 epoch: 0|step: 461|ppo_ep: 1|act_loss: 0.023651123046875|cri_loss: 0.0022792816162109375|unsuper_loss: 0.0 average reward score: -4.36328125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.52%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 462|ppo_ep: 1|act_loss: 0.0156097412109375|cri_loss: 0.000743865966796875|unsuper_loss: 0.0 average reward score: -4.14453125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.62%) |Training time=0.80s (31.46%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 epoch: 0|step: 463|ppo_ep: 1|act_loss: -0.00859832763671875|cri_loss: 0.0013189315795898438|unsuper_loss: 0.0 average reward score: -4.00390625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.63%) |Training time=0.80s (31.46%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57 epoch: 0|step: 464|ppo_ep: 1|act_loss: -0.00014662742614746094|cri_loss: 0.0007381439208984375|unsuper_loss: 0.0 average reward score: -3.0 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.56%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 465|ppo_ep: 1|act_loss: -0.020477294921875|cri_loss: 0.00197601318359375|unsuper_loss: 0.0 average reward score: -4.125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.43%) |Training time=0.81s (31.60%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57 epoch: 0|step: 466|ppo_ep: 1|act_loss: -0.004871368408203125|cri_loss: 0.0019664764404296875|unsuper_loss: 0.0 average reward score: -4.66015625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.81s (31.72%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 467|ppo_ep: 1|act_loss: -0.0090179443359375|cri_loss: 0.0015735626220703125|unsuper_loss: 0.0 average reward score: -4.17578125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.53%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 468|ppo_ep: 1|act_loss: 0.00807952880859375|cri_loss: 0.0014333724975585938|unsuper_loss: 0.0 average reward score: -2.8203125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.45%) |Training time=0.80s (31.58%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 [2023-07-01 08:27:28,380] [INFO] [logging.py:96:log_dist] [Rank 0] step=470, skipped=9, lr=[5.9850000650835e-06, 5.9850000650835e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:27:28,562] [INFO] [timer.py:215:stop] epoch=0/micro_step=470/global_step=470, RunningAvgSamplesPerSec=50.93503363221945, CurrSamplesPerSec=50.447987364861795, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:27:28,728] [INFO] [logging.py:96:log_dist] [Rank 0] step=470, skipped=7, lr=[3.0831662700570695e-06, 3.0831662700570695e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 469|ppo_ep: 1|act_loss: -0.00909423828125|cri_loss: 0.0036182403564453125|unsuper_loss: 0.0 average reward score: -3.822265625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.34%) |Training time=0.81s (31.73%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.57 epoch: 0|step: 470|ppo_ep: 1|act_loss: 0.0087432861328125|cri_loss: 0.0015554428100585938|unsuper_loss: 0.0 average reward score: -4.84375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.46%) |Training time=0.81s (31.57%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57 epoch: 0|step: 471|ppo_ep: 1|act_loss: 0.0158233642578125|cri_loss: 0.0010433197021484375|unsuper_loss: 0.0 average reward score: -3.849609375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.54%) |Training time=0.80s (31.54%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 472|ppo_ep: 1|act_loss: 0.01094818115234375|cri_loss: 0.0017919540405273438|unsuper_loss: 0.0 average reward score: -4.44140625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.67%) |Training time=0.80s (31.43%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 473|ppo_ep: 1|act_loss: 0.004558563232421875|cri_loss: 0.002330780029296875|unsuper_loss: 0.0 average reward score: -5.546875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.48%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 474|ppo_ep: 1|act_loss: -0.01329803466796875|cri_loss: 0.00305938720703125|unsuper_loss: 0.0 average reward score: -4.5625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.50%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57 epoch: 0|step: 475|ppo_ep: 1|act_loss: -0.0081024169921875|cri_loss: 0.0024166107177734375|unsuper_loss: 0.0 average reward score: -5.08984375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.81s (31.63%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 476|ppo_ep: 1|act_loss: -0.006214141845703125|cri_loss: 0.0010242462158203125|unsuper_loss: 0.0 average reward score: -2.517578125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.58%) |Training time=0.80s (31.50%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 477|ppo_ep: 1|act_loss: -0.01202392578125|cri_loss: 0.000881195068359375|unsuper_loss: 0.0 average reward score: -5.2734375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.75%) |Training time=0.80s (31.38%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 478|ppo_ep: 1|act_loss: -0.0006918907165527344|cri_loss: 0.00208282470703125|unsuper_loss: 0.0 average reward score: -4.828125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.49%) |Training time=0.81s (31.59%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 [2023-07-01 08:27:53,860] [INFO] [logging.py:96:log_dist] [Rank 0] step=480, skipped=9, lr=[5.81196371905892e-06, 5.81196371905892e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:27:54,038] [INFO] [timer.py:215:stop] epoch=0/micro_step=480/global_step=480, RunningAvgSamplesPerSec=50.9348513563475, CurrSamplesPerSec=50.923126479449614, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:27:54,203] [INFO] [logging.py:96:log_dist] [Rank 0] step=480, skipped=7, lr=[2.993362003338167e-06, 2.993362003338167e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 479|ppo_ep: 1|act_loss: -0.0224151611328125|cri_loss: 0.003749847412109375|unsuper_loss: 0.0 average reward score: -4.49609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.55%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 480|ppo_ep: 1|act_loss: 0.00965118408203125|cri_loss: 0.00128173828125|unsuper_loss: 0.0 average reward score: -5.75 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.68%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 481|ppo_ep: 1|act_loss: 0.0204010009765625|cri_loss: 0.00751495361328125|unsuper_loss: 0.0 average reward score: -3.404296875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.57%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 epoch: 0|step: 482|ppo_ep: 1|act_loss: -0.0557861328125|cri_loss: 0.09521484375|unsuper_loss: 0.0 average reward score: -4.0390625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.62%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 483|ppo_ep: 1|act_loss: 0.01477813720703125|cri_loss: 0.002376556396484375|unsuper_loss: 0.0 average reward score: -5.59765625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.38%) |Training time=0.81s (31.72%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57 epoch: 0|step: 484|ppo_ep: 1|act_loss: -0.055084228515625|cri_loss: 0.04400634765625|unsuper_loss: 0.0 average reward score: -5.10546875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.66%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57 epoch: 0|step: 485|ppo_ep: 1|act_loss: -0.001514434814453125|cri_loss: 0.0013093948364257812|unsuper_loss: 0.0 average reward score: -4.3671875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.80s (31.60%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 epoch: 0|step: 486|ppo_ep: 1|act_loss: -0.004360198974609375|cri_loss: 0.0009250640869140625|unsuper_loss: 0.0 average reward score: -2.64453125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.53%) |Training time=0.80s (31.56%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 487|ppo_ep: 1|act_loss: -0.006107330322265625|cri_loss: 0.0053558349609375|unsuper_loss: 0.0 average reward score: -6.04296875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.52%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 488|ppo_ep: 1|act_loss: -0.025787353515625|cri_loss: 0.003879547119140625|unsuper_loss: 0.0 average reward score: -4.171875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.81s (31.63%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 [2023-07-01 08:28:19,322] [INFO] [logging.py:96:log_dist] [Rank 0] step=490, skipped=9, lr=[5.637591896641978e-06, 5.637591896641978e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:28:19,499] [INFO] [timer.py:215:stop] epoch=0/micro_step=490/global_step=490, RunningAvgSamplesPerSec=50.93059549496716, CurrSamplesPerSec=50.77863326665892, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:28:19,664] [INFO] [logging.py:96:log_dist] [Rank 0] step=490, skipped=7, lr=[2.902890160602413e-06, 2.902890160602413e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 489|ppo_ep: 1|act_loss: -0.0225677490234375|cri_loss: 0.01079559326171875|unsuper_loss: 0.0 average reward score: -3.841796875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.63%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 490|ppo_ep: 1|act_loss: -0.012939453125|cri_loss: 0.00162506103515625|unsuper_loss: 0.0 average reward score: -3.4921875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.53%) |Training time=0.80s (31.57%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57 epoch: 0|step: 491|ppo_ep: 1|act_loss: 0.0004925727844238281|cri_loss: 0.0001983642578125|unsuper_loss: 0.0 average reward score: -4.359375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.58%) |Training time=0.80s (31.52%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57 epoch: 0|step: 492|ppo_ep: 1|act_loss: -0.011993408203125|cri_loss: 0.003360748291015625|unsuper_loss: 0.0 average reward score: -2.62109375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.44%) |Training time=0.80s (31.62%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 493|ppo_ep: 1|act_loss: 0.014007568359375|cri_loss: 0.00295257568359375|unsuper_loss: 0.0 average reward score: -3.912109375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.47%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 494|ppo_ep: 1|act_loss: 0.02337646484375|cri_loss: 0.00600433349609375|unsuper_loss: 0.0 average reward score: -4.39453125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.54%) |Training time=0.80s (31.56%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 495|ppo_ep: 1|act_loss: 0.005413055419921875|cri_loss: 0.0016698837280273438|unsuper_loss: 0.0 average reward score: -4.7421875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.63%) |Training time=0.80s (31.48%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 epoch: 0|step: 496|ppo_ep: 1|act_loss: 0.018341064453125|cri_loss: 0.0029125213623046875|unsuper_loss: 0.0 average reward score: -4.15234375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.56%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 497|ppo_ep: 1|act_loss: -0.00753021240234375|cri_loss: 0.002887725830078125|unsuper_loss: 0.0 average reward score: -5.26171875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.81s (31.65%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 498|ppo_ep: 1|act_loss: -0.01403045654296875|cri_loss: 0.0005021095275878906|unsuper_loss: 0.0 average reward score: -5.21875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.45%) |Training time=0.81s (31.58%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.57 [2023-07-01 08:28:44,760] [INFO] [logging.py:96:log_dist] [Rank 0] step=500, skipped=9, lr=[5.462120543134245e-06, 5.462120543134245e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:28:44,942] [INFO] [timer.py:215:stop] epoch=0/micro_step=500/global_step=500, RunningAvgSamplesPerSec=50.92924885306585, CurrSamplesPerSec=50.780919489957526, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:28:45,106] [INFO] [logging.py:96:log_dist] [Rank 0] step=500, skipped=7, lr=[2.811873160747093e-06, 2.811873160747093e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 499|ppo_ep: 1|act_loss: -0.031036376953125|cri_loss: 0.00760650634765625|unsuper_loss: 0.0 average reward score: -4.28515625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.61%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 500|ppo_ep: 1|act_loss: -0.00673675537109375|cri_loss: 0.0014362335205078125|unsuper_loss: 0.0 average reward score: -4.22265625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.49%) |Training time=0.81s (31.57%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57 epoch: 0|step: 501|ppo_ep: 1|act_loss: -0.0005049705505371094|cri_loss: 0.0007486343383789062|unsuper_loss: 0.0 average reward score: -4.8125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.51%) |Training time=0.80s (31.51%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57 epoch: 0|step: 502|ppo_ep: 1|act_loss: -0.00399017333984375|cri_loss: 0.0021114349365234375|unsuper_loss: 0.0 average reward score: -4.7109375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.46%) |Training time=0.81s (31.59%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57 epoch: 0|step: 503|ppo_ep: 1|act_loss: 0.01094818115234375|cri_loss: 0.00034165382385253906|unsuper_loss: 0.0 average reward score: -3.673828125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.81s (31.67%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 504|ppo_ep: 1|act_loss: 0.0125732421875|cri_loss: 0.002452850341796875|unsuper_loss: 0.0 average reward score: -5.03125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.63%) |Training time=0.80s (31.50%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 [2023-07-01 08:29:00,036] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048 epoch: 0|step: 505|ppo_ep: 1|act_loss: 0.005146026611328125|cri_loss: 0.0011034011840820312|unsuper_loss: 0.0 average reward score: -4.06640625 ------------------------------------------------------------------------------------- |E2E latency=2.35s |Gather latency=0.00s (0.00%) |Generate time=1.51s (64.21%) |Training time=0.62s (26.19%) |Others=0.23 (9.60%)|CurSamplesPerSec=13.62 |AvgSamplesPerSec=12.57 epoch: 0|step: 506|ppo_ep: 1|act_loss: 0.00792694091796875|cri_loss: 0.0005488395690917969|unsuper_loss: 0.0 average reward score: -5.7578125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.58%) |Training time=0.80s (31.50%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 507|ppo_ep: 1|act_loss: 0.0279541015625|cri_loss: 0.0025577545166015625|unsuper_loss: 0.0 average reward score: -4.765625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.47%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 508|ppo_ep: 1|act_loss: 0.0224609375|cri_loss: 0.001071929931640625|unsuper_loss: 0.0 average reward score: -5.28515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.59%) |Training time=0.80s (31.51%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 [2023-07-01 08:29:10,025] [INFO] [logging.py:96:log_dist] [Rank 0] step=510, skipped=10, lr=[5.30345243877873e-06, 5.30345243877873e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:29:10,202] [INFO] [timer.py:215:stop] epoch=0/micro_step=510/global_step=510, RunningAvgSamplesPerSec=50.95858837805958, CurrSamplesPerSec=51.39470244980967, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:29:10,367] [INFO] [logging.py:96:log_dist] [Rank 0] step=510, skipped=7, lr=[2.720434160330307e-06, 2.720434160330307e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 509|ppo_ep: 1|act_loss: -0.0016717910766601562|cri_loss: 0.00014770030975341797|unsuper_loss: 0.0 average reward score: -3.9296875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.69%) |Training time=0.80s (31.41%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.57 epoch: 0|step: 510|ppo_ep: 1|act_loss: -0.0111541748046875|cri_loss: 0.0006170272827148438|unsuper_loss: 0.0 average reward score: -3.5234375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.66%) |Training time=0.80s (31.40%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57 epoch: 0|step: 511|ppo_ep: 1|act_loss: -0.011077880859375|cri_loss: 0.0015201568603515625|unsuper_loss: 0.0 average reward score: -5.00390625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.54%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 epoch: 0|step: 512|ppo_ep: 1|act_loss: -0.0259552001953125|cri_loss: 0.005645751953125|unsuper_loss: 0.0 average reward score: -3.05859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.80s (31.61%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57 epoch: 0|step: 513|ppo_ep: 1|act_loss: 0.0014553070068359375|cri_loss: 0.0010042190551757812|unsuper_loss: 0.0 average reward score: -3.88671875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.63%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 514|ppo_ep: 1|act_loss: 0.0207061767578125|cri_loss: 0.002655029296875|unsuper_loss: 0.0 average reward score: -3.0 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.51%) |Training time=0.80s (31.57%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 515|ppo_ep: 1|act_loss: 0.01091766357421875|cri_loss: 0.0009918212890625|unsuper_loss: 0.0 average reward score: -5.46484375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.44%) |Training time=0.80s (31.56%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 516|ppo_ep: 1|act_loss: 0.019775390625|cri_loss: 0.00229644775390625|unsuper_loss: 0.0 average reward score: -4.6015625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.40%) |Training time=0.81s (31.64%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.57 epoch: 0|step: 517|ppo_ep: 1|act_loss: 0.0154876708984375|cri_loss: 0.0035495758056640625|unsuper_loss: 0.0 average reward score: -3.611328125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.42%) |Training time=0.81s (31.66%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57 epoch: 0|step: 518|ppo_ep: 1|act_loss: 0.0266876220703125|cri_loss: 0.00441741943359375|unsuper_loss: 0.0 average reward score: -4.6953125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.53%) |Training time=0.80s (31.56%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 [2023-07-01 08:29:35,481] [INFO] [logging.py:96:log_dist] [Rank 0] step=520, skipped=10, lr=[5.126547075166989e-06, 5.126547075166989e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:29:35,661] [INFO] [timer.py:215:stop] epoch=0/micro_step=520/global_step=520, RunningAvgSamplesPerSec=50.956135782198245, CurrSamplesPerSec=50.88289419461918, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:29:35,828] [INFO] [logging.py:96:log_dist] [Rank 0] step=520, skipped=7, lr=[2.6286968869258666e-06, 2.6286968869258666e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 519|ppo_ep: 1|act_loss: 0.018218994140625|cri_loss: 0.0027561187744140625|unsuper_loss: 0.0 average reward score: -3.3984375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.52%) |Training time=0.80s (31.51%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57 epoch: 0|step: 520|ppo_ep: 1|act_loss: 0.01316070556640625|cri_loss: 0.0021457672119140625|unsuper_loss: 0.0 average reward score: -4.69921875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.81s (31.71%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 521|ppo_ep: 1|act_loss: 0.002925872802734375|cri_loss: 0.0039520263671875|unsuper_loss: 0.0 average reward score: -3.34765625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.50%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57 epoch: 0|step: 522|ppo_ep: 1|act_loss: 0.0268402099609375|cri_loss: 0.007488250732421875|unsuper_loss: 0.0 average reward score: -4.5703125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.61%) |Training time=0.80s (31.49%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 epoch: 0|step: 523|ppo_ep: 1|act_loss: 0.0007557868957519531|cri_loss: 0.0024166107177734375|unsuper_loss: 0.0 average reward score: -4.74609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.60%) |Training time=0.80s (31.52%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 epoch: 0|step: 524|ppo_ep: 1|act_loss: -0.0075225830078125|cri_loss: 0.0025806427001953125|unsuper_loss: 0.0 average reward score: -4.14453125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.57%) |Training time=0.80s (31.47%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 525|ppo_ep: 1|act_loss: -0.0187225341796875|cri_loss: 0.005298614501953125|unsuper_loss: 0.0 average reward score: -4.40625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.53%) |Training time=0.80s (31.58%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 526|ppo_ep: 1|act_loss: -0.0318603515625|cri_loss: 0.00554656982421875|unsuper_loss: 0.0 average reward score: -2.92578125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.60%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57 epoch: 0|step: 527|ppo_ep: 1|act_loss: -0.02935791015625|cri_loss: 0.004497528076171875|unsuper_loss: 0.0 average reward score: -4.2265625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.50%) |Training time=0.80s (31.61%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.57 epoch: 0|step: 528|ppo_ep: 1|act_loss: -0.0176849365234375|cri_loss: 0.002193450927734375|unsuper_loss: 0.0 average reward score: -4.234375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.67%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 [2023-07-01 08:30:00,913] [INFO] [logging.py:96:log_dist] [Rank 0] step=530, skipped=10, lr=[4.949233683385321e-06, 4.949233683385321e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:30:01,095] [INFO] [timer.py:215:stop] epoch=0/micro_step=530/global_step=530, RunningAvgSamplesPerSec=50.95421906134459, CurrSamplesPerSec=50.46234547477968, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:30:01,261] [INFO] [logging.py:96:log_dist] [Rank 0] step=530, skipped=7, lr=[2.5367854717055305e-06, 2.5367854717055305e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 529|ppo_ep: 1|act_loss: -0.005146026611328125|cri_loss: 0.001399993896484375|unsuper_loss: 0.0 average reward score: -2.9296875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.73%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57 epoch: 0|step: 530|ppo_ep: 1|act_loss: -0.00983428955078125|cri_loss: 0.0006809234619140625|unsuper_loss: 0.0 average reward score: -5.578125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.70%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 531|ppo_ep: 1|act_loss: 0.0021076202392578125|cri_loss: 0.00106048583984375|unsuper_loss: 0.0 average reward score: -3.740234375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.79%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 532|ppo_ep: 1|act_loss: -0.01316070556640625|cri_loss: 0.006561279296875|unsuper_loss: 0.0 average reward score: -5.375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.80s (31.61%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57 epoch: 0|step: 533|ppo_ep: 1|act_loss: -0.01398468017578125|cri_loss: 0.0008006095886230469|unsuper_loss: 0.0 average reward score: -5.0859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.62%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57 epoch: 0|step: 534|ppo_ep: 1|act_loss: 0.0012187957763671875|cri_loss: 0.0017404556274414062|unsuper_loss: 0.0 average reward score: -5.42578125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.81s (31.73%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 535|ppo_ep: 1|act_loss: -0.003604888916015625|cri_loss: 0.0013761520385742188|unsuper_loss: 0.0 average reward score: -4.6171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.65%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 536|ppo_ep: 1|act_loss: 0.01251983642578125|cri_loss: 0.001312255859375|unsuper_loss: 0.0 average reward score: -5.01171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.55%) |Training time=0.80s (31.55%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57 epoch: 0|step: 537|ppo_ep: 1|act_loss: 0.0182342529296875|cri_loss: 0.0034637451171875|unsuper_loss: 0.0 average reward score: -4.8203125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.72%) |Training time=0.80s (31.40%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 538|ppo_ep: 1|act_loss: 0.0146484375|cri_loss: 0.0020542144775390625|unsuper_loss: 0.0 average reward score: -3.134765625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.62%) |Training time=0.80s (31.46%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 [2023-07-01 08:30:26,353] [INFO] [logging.py:96:log_dist] [Rank 0] step=540, skipped=10, lr=[4.771752189019846e-06, 4.771752189019846e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:30:26,530] [INFO] [timer.py:215:stop] epoch=0/micro_step=540/global_step=540, RunningAvgSamplesPerSec=50.9513624248314, CurrSamplesPerSec=50.93793034873452, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:30:26,696] [INFO] [logging.py:96:log_dist] [Rank 0] step=540, skipped=7, lr=[2.4448242814751353e-06, 2.4448242814751353e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 539|ppo_ep: 1|act_loss: 0.0062103271484375|cri_loss: 0.0006608963012695312|unsuper_loss: 0.0 average reward score: -3.763671875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.54%) |Training time=0.80s (31.55%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 540|ppo_ep: 1|act_loss: 0.02484130859375|cri_loss: 0.0029354095458984375|unsuper_loss: 0.0 average reward score: -6.2265625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.81s (31.64%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 541|ppo_ep: 1|act_loss: -0.00518035888671875|cri_loss: 0.0001233816146850586|unsuper_loss: 0.0 average reward score: -5.3984375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.65%) |Training time=0.80s (31.46%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.57 epoch: 0|step: 542|ppo_ep: 1|act_loss: 0.0022792816162109375|cri_loss: 0.0005393028259277344|unsuper_loss: 0.0 average reward score: -6.28515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.62%) |Training time=0.80s (31.48%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57 epoch: 0|step: 543|ppo_ep: 1|act_loss: -0.008697509765625|cri_loss: 0.0011472702026367188|unsuper_loss: 0.0 average reward score: -3.859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.80s (31.61%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 544|ppo_ep: 1|act_loss: -0.01306915283203125|cri_loss: 0.0025920867919921875|unsuper_loss: 0.0 average reward score: -5.07421875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.57%) |Training time=0.80s (31.49%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 545|ppo_ep: 1|act_loss: -0.0258636474609375|cri_loss: 0.00591278076171875|unsuper_loss: 0.0 average reward score: -5.7421875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.52%) |Training time=0.81s (31.57%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57 epoch: 0|step: 546|ppo_ep: 1|act_loss: -0.007015228271484375|cri_loss: 0.0006990432739257812|unsuper_loss: 0.0 average reward score: -4.92578125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.52%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 547|ppo_ep: 1|act_loss: -0.007061004638671875|cri_loss: 0.0008282661437988281|unsuper_loss: 0.0 average reward score: -4.7265625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.52%) |Training time=0.80s (31.50%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57 epoch: 0|step: 548|ppo_ep: 1|act_loss: -0.0014696121215820312|cri_loss: 0.0002682209014892578|unsuper_loss: 0.0 average reward score: -3.357421875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.81s (31.72%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 [2023-07-01 08:30:51,790] [INFO] [logging.py:96:log_dist] [Rank 0] step=550, skipped=10, lr=[4.594342745118979e-06, 4.594342745118979e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:30:51,972] [INFO] [timer.py:215:stop] epoch=0/micro_step=550/global_step=550, RunningAvgSamplesPerSec=50.94977825227625, CurrSamplesPerSec=50.522579141899634, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:30:52,138] [INFO] [logging.py:96:log_dist] [Rank 0] step=550, skipped=7, lr=[2.352937750391878e-06, 2.352937750391878e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 549|ppo_ep: 1|act_loss: 0.00908660888671875|cri_loss: 0.00087738037109375|unsuper_loss: 0.0 average reward score: -5.015625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.81s (31.74%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 550|ppo_ep: 1|act_loss: 0.0301055908203125|cri_loss: 0.005985260009765625|unsuper_loss: 0.0 average reward score: -4.109375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.58%) |Training time=0.80s (31.52%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 551|ppo_ep: 1|act_loss: 0.0014286041259765625|cri_loss: 0.001789093017578125|unsuper_loss: 0.0 average reward score: -4.1796875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.55%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 552|ppo_ep: 1|act_loss: 0.01788330078125|cri_loss: 0.0032596588134765625|unsuper_loss: 0.0 average reward score: -4.9140625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.61%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 553|ppo_ep: 1|act_loss: 0.004871368408203125|cri_loss: 0.006023406982421875|unsuper_loss: 0.0 average reward score: -3.810546875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.44%) |Training time=0.80s (31.62%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 554|ppo_ep: 1|act_loss: -0.00543975830078125|cri_loss: 0.0013408660888671875|unsuper_loss: 0.0 average reward score: -4.1015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.60%) |Training time=0.80s (31.53%) |Others=0.23 (8.86%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 epoch: 0|step: 555|ppo_ep: 1|act_loss: -0.00647735595703125|cri_loss: 0.001308441162109375|unsuper_loss: 0.0 average reward score: -3.7578125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.67%) |Training time=0.80s (31.44%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.57 epoch: 0|step: 556|ppo_ep: 1|act_loss: 0.0111541748046875|cri_loss: 0.004581451416015625|unsuper_loss: 0.0 average reward score: -3.681640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.57%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57 epoch: 0|step: 557|ppo_ep: 1|act_loss: -0.01230621337890625|cri_loss: 0.0010156631469726562|unsuper_loss: 0.0 average reward score: -3.62890625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.80s (31.66%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57 epoch: 0|step: 558|ppo_ep: 1|act_loss: -0.0097808837890625|cri_loss: 0.0012044906616210938|unsuper_loss: 0.0 average reward score: -4.1015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.58%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 [2023-07-01 08:31:17,211] [INFO] [logging.py:96:log_dist] [Rank 0] step=560, skipped=10, lr=[4.417245407238497e-06, 4.417245407238497e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:31:17,388] [INFO] [timer.py:215:stop] epoch=0/micro_step=560/global_step=560, RunningAvgSamplesPerSec=50.94992960737238, CurrSamplesPerSec=51.052651838755786, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:31:17,553] [INFO] [logging.py:96:log_dist] [Rank 0] step=560, skipped=7, lr=[2.261250211590471e-06, 2.261250211590471e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 559|ppo_ep: 1|act_loss: -0.0055694580078125|cri_loss: 0.0004534721374511719|unsuper_loss: 0.0 average reward score: -3.7265625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.53%) |Training time=0.80s (31.56%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57 epoch: 0|step: 560|ppo_ep: 1|act_loss: 0.0032482147216796875|cri_loss: 0.0005564689636230469|unsuper_loss: 0.0 average reward score: -3.51953125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.67%) |Training time=0.80s (31.39%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57 epoch: 0|step: 561|ppo_ep: 1|act_loss: 0.004161834716796875|cri_loss: 0.0013608932495117188|unsuper_loss: 0.0 average reward score: -4.4921875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.57%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 562|ppo_ep: 1|act_loss: -0.0016088485717773438|cri_loss: 0.0002428293228149414|unsuper_loss: 0.0 average reward score: -3.6328125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.48%) |Training time=0.81s (31.59%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57 epoch: 0|step: 563|ppo_ep: 1|act_loss: -0.0179595947265625|cri_loss: 0.0032367706298828125|unsuper_loss: 0.0 average reward score: -5.16015625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.40%) |Training time=0.81s (31.66%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.57 epoch: 0|step: 564|ppo_ep: 1|act_loss: 0.003131866455078125|cri_loss: 0.00016367435455322266|unsuper_loss: 0.0 average reward score: -5.4296875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.62%) |Training time=0.80s (31.49%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 565|ppo_ep: 1|act_loss: 0.0007748603820800781|cri_loss: 0.0004220008850097656|unsuper_loss: 0.0 average reward score: -4.3515625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.52%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57 epoch: 0|step: 566|ppo_ep: 1|act_loss: -0.0001862049102783203|cri_loss: 0.0009765625|unsuper_loss: 0.0 average reward score: -3.986328125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.49%) |Training time=0.81s (31.57%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57 epoch: 0|step: 567|ppo_ep: 1|act_loss: -0.0013790130615234375|cri_loss: 0.0017213821411132812|unsuper_loss: 0.0 average reward score: -3.708984375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.46%) |Training time=0.81s (31.59%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57 epoch: 0|step: 568|ppo_ep: 1|act_loss: -0.00388336181640625|cri_loss: 0.0007061958312988281|unsuper_loss: 0.0 average reward score: -4.8671875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.49%) |Training time=0.81s (31.62%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 [2023-07-01 08:31:42,687] [INFO] [logging.py:96:log_dist] [Rank 0] step=570, skipped=10, lr=[4.2406998086185315e-06, 4.2406998086185315e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:31:42,865] [INFO] [timer.py:215:stop] epoch=0/micro_step=570/global_step=570, RunningAvgSamplesPerSec=50.94778923762274, CurrSamplesPerSec=51.40765520514193, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:31:43,029] [INFO] [logging.py:96:log_dist] [Rank 0] step=570, skipped=7, lr=[2.1698857289459872e-06, 2.1698857289459872e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 569|ppo_ep: 1|act_loss: -0.01505279541015625|cri_loss: 0.0009059906005859375|unsuper_loss: 0.0 average reward score: -3.9609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.78%) |Training time=0.80s (31.36%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57 epoch: 0|step: 570|ppo_ep: 1|act_loss: -0.0038356781005859375|cri_loss: 0.0009098052978515625|unsuper_loss: 0.0 average reward score: -4.7109375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.74%) |Training time=0.80s (31.37%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 571|ppo_ep: 1|act_loss: -0.0022602081298828125|cri_loss: 0.0014858245849609375|unsuper_loss: 0.0 average reward score: -4.484375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.60%) |Training time=0.80s (31.47%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57 epoch: 0|step: 572|ppo_ep: 1|act_loss: -0.00765228271484375|cri_loss: 0.0003981590270996094|unsuper_loss: 0.0 average reward score: -3.5 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.49s (58.83%) |Training time=0.82s (32.15%) |Others=0.23 (9.02%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.57 epoch: 0|step: 573|ppo_ep: 1|act_loss: -0.0052032470703125|cri_loss: 0.0011606216430664062|unsuper_loss: 0.0 average reward score: -5.69140625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.38%) |Training time=0.80s (31.70%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.57 epoch: 0|step: 574|ppo_ep: 1|act_loss: 0.014617919921875|cri_loss: 0.0018224716186523438|unsuper_loss: 0.0 average reward score: -6.078125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.59%) |Training time=0.80s (31.46%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.66 |AvgSamplesPerSec=12.57 epoch: 0|step: 575|ppo_ep: 1|act_loss: 0.010711669921875|cri_loss: 0.00122833251953125|unsuper_loss: 0.0 average reward score: -6.5078125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.80s (31.75%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.57 epoch: 0|step: 576|ppo_ep: 1|act_loss: -0.0308074951171875|cri_loss: 0.02569580078125|unsuper_loss: 0.0 average reward score: -3.326171875 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.08%) |Training time=0.81s (31.95%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.57 epoch: 0|step: 577|ppo_ep: 1|act_loss: 0.0074462890625|cri_loss: 0.0010700225830078125|unsuper_loss: 0.0 average reward score: -4.09375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.04%) |Training time=0.81s (32.00%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 [2023-07-01 08:32:05,492] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024 epoch: 0|step: 578|ppo_ep: 1|act_loss: -0.0223846435546875|cri_loss: 0.002262115478515625|unsuper_loss: 0.0 average reward score: -3.275390625 ------------------------------------------------------------------------------------- |E2E latency=2.34s |Gather latency=0.00s (0.00%) |Generate time=1.50s (64.10%) |Training time=0.61s (26.24%) |Others=0.23 (9.66%)|CurSamplesPerSec=13.66 |AvgSamplesPerSec=12.57 [2023-07-01 08:32:07,846] [INFO] [logging.py:96:log_dist] [Rank 0] step=580, skipped=11, lr=[4.082477967402902e-06, 4.082477967402902e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:32:08,027] [INFO] [timer.py:215:stop] epoch=0/micro_step=580/global_step=580, RunningAvgSamplesPerSec=50.97001604666447, CurrSamplesPerSec=50.35826863975969, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:32:08,192] [INFO] [logging.py:96:log_dist] [Rank 0] step=580, skipped=7, lr=[2.0789679292010483e-06, 2.0789679292010483e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 579|ppo_ep: 1|act_loss: 0.01136016845703125|cri_loss: 0.0022869110107421875|unsuper_loss: 0.0 average reward score: -5.0390625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.11%) |Training time=0.81s (31.94%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.57 epoch: 0|step: 580|ppo_ep: 1|act_loss: 0.018829345703125|cri_loss: 0.0036773681640625|unsuper_loss: 0.0 average reward score: -4.0234375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.28%) |Training time=0.81s (31.74%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57 epoch: 0|step: 581|ppo_ep: 1|act_loss: -0.001506805419921875|cri_loss: 0.002010345458984375|unsuper_loss: 0.0 average reward score: -5.78515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.83%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 epoch: 0|step: 582|ppo_ep: 1|act_loss: -0.0027065277099609375|cri_loss: 0.0007162094116210938|unsuper_loss: 0.0 average reward score: -3.376953125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.81s (31.73%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57 epoch: 0|step: 583|ppo_ep: 1|act_loss: -0.00507354736328125|cri_loss: 0.0033092498779296875|unsuper_loss: 0.0 average reward score: -3.1640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.81%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57 epoch: 0|step: 584|ppo_ep: 1|act_loss: -0.0029296875|cri_loss: 0.0007195472717285156|unsuper_loss: 0.0 average reward score: -5.140625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.26%) |Training time=0.81s (31.75%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57 epoch: 0|step: 585|ppo_ep: 1|act_loss: -0.020263671875|cri_loss: 0.0017242431640625|unsuper_loss: 0.0 average reward score: -4.1875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.08%) |Training time=0.81s (31.99%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57 epoch: 0|step: 586|ppo_ep: 1|act_loss: 0.00838470458984375|cri_loss: 0.002552032470703125|unsuper_loss: 0.0 average reward score: -3.76171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.20%) |Training time=0.81s (31.84%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57 epoch: 0|step: 587|ppo_ep: 1|act_loss: 0.0025005340576171875|cri_loss: 0.001628875732421875|unsuper_loss: 0.0 average reward score: -7.19921875 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.31%) |Training time=0.80s (31.81%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.57 epoch: 0|step: 588|ppo_ep: 1|act_loss: 0.00838470458984375|cri_loss: 0.0011529922485351562|unsuper_loss: 0.0 average reward score: -5.2734375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.24%) |Training time=0.81s (31.82%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.57 [2023-07-01 08:32:33,230] [INFO] [logging.py:96:log_dist] [Rank 0] step=590, skipped=11, lr=[3.907637928621924e-06, 3.907637928621924e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:32:33,408] [INFO] [timer.py:215:stop] epoch=0/micro_step=590/global_step=590, RunningAvgSamplesPerSec=50.96221074638222, CurrSamplesPerSec=50.53928232401794, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:32:33,573] [INFO] [logging.py:96:log_dist] [Rank 0] step=590, skipped=7, lr=[1.988619834684499e-06, 1.988619834684499e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 589|ppo_ep: 1|act_loss: 0.00032448768615722656|cri_loss: 0.0007734298706054688|unsuper_loss: 0.0 average reward score: -3.94140625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.22%) |Training time=0.81s (31.84%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.57 epoch: 0|step: 590|ppo_ep: 1|act_loss: -0.009735107421875|cri_loss: 0.0010528564453125|unsuper_loss: 0.0 average reward score: -3.126953125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.78%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57 epoch: 0|step: 591|ppo_ep: 1|act_loss: 0.00969696044921875|cri_loss: 0.0013589859008789062|unsuper_loss: 0.0 average reward score: -6.26171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.28%) |Training time=0.81s (31.77%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.57 epoch: 0|step: 592|ppo_ep: 1|act_loss: -0.004749298095703125|cri_loss: 0.0004584789276123047|unsuper_loss: 0.0 average reward score: -5.48828125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.30%) |Training time=0.81s (31.79%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.57 epoch: 0|step: 593|ppo_ep: 1|act_loss: -0.006870269775390625|cri_loss: 0.0007867813110351562|unsuper_loss: 0.0 average reward score: -4.80078125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.24%) |Training time=0.81s (31.78%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.57 epoch: 0|step: 594|ppo_ep: 1|act_loss: -0.0190582275390625|cri_loss: 0.0018129348754882812|unsuper_loss: 0.0 average reward score: -3.66015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.13%) |Training time=0.81s (31.89%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 epoch: 0|step: 595|ppo_ep: 1|act_loss: -0.006244659423828125|cri_loss: 0.001369476318359375|unsuper_loss: 0.0 average reward score: -4.5234375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.79%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57 epoch: 0|step: 596|ppo_ep: 1|act_loss: -0.0032024383544921875|cri_loss: 0.0005779266357421875|unsuper_loss: 0.0 average reward score: -5.19921875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.27%) |Training time=0.81s (31.80%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 epoch: 0|step: 597|ppo_ep: 1|act_loss: -0.0027294158935546875|cri_loss: 0.0002753734588623047|unsuper_loss: 0.0 average reward score: -4.9609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.80s (31.70%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57 epoch: 0|step: 598|ppo_ep: 1|act_loss: 0.0278472900390625|cri_loss: 0.0089569091796875|unsuper_loss: 0.0 average reward score: -4.53515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.45s (57.30%) |Training time=0.85s (33.63%) |Others=0.23 (9.07%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57 [2023-07-01 08:32:58,614] [INFO] [logging.py:96:log_dist] [Rank 0] step=600, skipped=11, lr=[3.734039187130717e-06, 3.734039187130717e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:32:58,796] [INFO] [timer.py:215:stop] epoch=0/micro_step=600/global_step=600, RunningAvgSamplesPerSec=50.948322092265094, CurrSamplesPerSec=49.88978045077008, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:32:58,961] [INFO] [logging.py:96:log_dist] [Rank 0] step=600, skipped=7, lr=[1.8989636968479282e-06, 1.8989636968479282e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 599|ppo_ep: 1|act_loss: 0.0027713775634765625|cri_loss: 0.0027790069580078125|unsuper_loss: 0.0 average reward score: -3.732421875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.05%) |Training time=0.82s (32.02%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57 epoch: 0|step: 600|ppo_ep: 1|act_loss: -0.00141143798828125|cri_loss: 0.000244140625|unsuper_loss: 0.0 average reward score: -3.482421875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.84%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57 [2023-07-01 08:33:04,034] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 epoch: 0|step: 601|ppo_ep: 1|act_loss: -0.0098876953125|cri_loss: 0.0004558563232421875|unsuper_loss: 0.0 average reward score: -4.42578125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.35%) |Training time=0.81s (32.53%) |Others=0.18 (7.12%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.57 [2023-07-01 08:33:06,517] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 epoch: 0|step: 602|ppo_ep: 1|act_loss: -0.005390167236328125|cri_loss: 0.0046844482421875|unsuper_loss: 0.0 average reward score: -4.44140625 ------------------------------------------------------------------------------------- |E2E latency=2.48s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.43%) |Training time=0.80s (32.42%) |Others=0.18 (7.15%)|CurSamplesPerSec=12.89 |AvgSamplesPerSec=12.57 epoch: 0|step: 603|ppo_ep: 1|act_loss: 0.0072479248046875|cri_loss: 0.0007405281066894531|unsuper_loss: 0.0 average reward score: -6.33203125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.82%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58 epoch: 0|step: 604|ppo_ep: 1|act_loss: -0.039642333984375|cri_loss: 0.0145111083984375|unsuper_loss: 0.0 average reward score: -3.31640625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.38%) |Training time=0.80s (31.66%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58 epoch: 0|step: 605|ppo_ep: 1|act_loss: 0.00989532470703125|cri_loss: 0.0006589889526367188|unsuper_loss: 0.0 average reward score: -4.94921875 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.33%) |Training time=0.80s (31.78%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 [2023-07-01 08:33:16,592] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384 epoch: 0|step: 606|ppo_ep: 1|act_loss: -0.01064300537109375|cri_loss: 0.002391815185546875|unsuper_loss: 0.0 average reward score: -2.67578125 ------------------------------------------------------------------------------------- |E2E latency=2.48s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.57%) |Training time=0.80s (32.23%) |Others=0.18 (7.21%)|CurSamplesPerSec=12.91 |AvgSamplesPerSec=12.58 epoch: 0|step: 607|ppo_ep: 1|act_loss: -0.01898193359375|cri_loss: 0.00228118896484375|unsuper_loss: 0.0 average reward score: -3.490234375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.55%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58 epoch: 0|step: 608|ppo_ep: 1|act_loss: 0.01812744140625|cri_loss: 0.0009660720825195312|unsuper_loss: 0.0 average reward score: -4.953125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.80s (31.69%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 [2023-07-01 08:33:23,816] [INFO] [logging.py:96:log_dist] [Rank 0] step=610, skipped=11, lr=[3.5619166421626894e-06, 3.5619166421626894e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:33:23,994] [INFO] [timer.py:215:stop] epoch=0/micro_step=610/global_step=610, RunningAvgSamplesPerSec=50.94524731646184, CurrSamplesPerSec=50.586941322752935, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:33:24,160] [INFO] [logging.py:96:log_dist] [Rank 0] step=610, skipped=10, lr=[1.8366811213437092e-06, 1.8366811213437092e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 609|ppo_ep: 1|act_loss: 0.010986328125|cri_loss: 0.00179290771484375|unsuper_loss: 0.0 average reward score: -5.8125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.26%) |Training time=0.81s (31.79%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 610|ppo_ep: 1|act_loss: 0.0160675048828125|cri_loss: 0.0012903213500976562|unsuper_loss: 0.0 average reward score: -3.703125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.53%) |Training time=0.80s (31.54%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 611|ppo_ep: 1|act_loss: -0.011260986328125|cri_loss: 0.00144195556640625|unsuper_loss: 0.0 average reward score: -5.78515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.83%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 612|ppo_ep: 1|act_loss: 0.018280029296875|cri_loss: 0.00244903564453125|unsuper_loss: 0.0 average reward score: -4.6640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (31.93%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 613|ppo_ep: 1|act_loss: 0.02001953125|cri_loss: 0.002246856689453125|unsuper_loss: 0.0 average reward score: -4.7109375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.01%) |Training time=0.81s (32.00%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 614|ppo_ep: 1|act_loss: 0.023193359375|cri_loss: 0.0021038055419921875|unsuper_loss: 0.0 average reward score: -3.83984375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.10%) |Training time=0.81s (31.95%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 615|ppo_ep: 1|act_loss: 0.0146331787109375|cri_loss: 0.0013818740844726562|unsuper_loss: 0.0 average reward score: -4.68359375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.71%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 616|ppo_ep: 1|act_loss: -0.01386260986328125|cri_loss: 0.0007419586181640625|unsuper_loss: 0.0 average reward score: -4.14453125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.32%) |Training time=0.80s (31.69%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 617|ppo_ep: 1|act_loss: -0.0098724365234375|cri_loss: 0.0007734298706054688|unsuper_loss: 0.0 average reward score: -5.48828125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.83%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 618|ppo_ep: 1|act_loss: -0.01043701171875|cri_loss: 0.0005331039428710938|unsuper_loss: 0.0 average reward score: -3.630859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.07%) |Training time=0.81s (31.98%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 [2023-07-01 08:33:49,227] [INFO] [logging.py:96:log_dist] [Rank 0] step=620, skipped=11, lr=[3.3915031954861193e-06, 3.3915031954861193e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:33:49,405] [INFO] [timer.py:215:stop] epoch=0/micro_step=620/global_step=620, RunningAvgSamplesPerSec=50.93655557448013, CurrSamplesPerSec=50.39513957647391, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:33:49,569] [INFO] [logging.py:96:log_dist] [Rank 0] step=620, skipped=10, lr=[1.7484791453998007e-06, 1.7484791453998007e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 619|ppo_ep: 1|act_loss: 0.0256500244140625|cri_loss: 0.01168060302734375|unsuper_loss: 0.0 average reward score: -4.5 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.31%) |Training time=0.81s (31.84%) |Others=0.23 (8.85%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 620|ppo_ep: 1|act_loss: -0.0011425018310546875|cri_loss: 0.00040268898010253906|unsuper_loss: 0.0 average reward score: -3.52734375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.40%) |Training time=0.80s (31.72%) |Others=0.22 (8.89%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.58 epoch: 0|step: 621|ppo_ep: 1|act_loss: 0.0155029296875|cri_loss: 0.0032711029052734375|unsuper_loss: 0.0 average reward score: -5.140625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.23%) |Training time=0.81s (31.85%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 622|ppo_ep: 1|act_loss: -0.042724609375|cri_loss: 0.027496337890625|unsuper_loss: 0.0 average reward score: -4.27734375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.32%) |Training time=0.80s (31.73%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 623|ppo_ep: 1|act_loss: 0.0227813720703125|cri_loss: 0.002086639404296875|unsuper_loss: 0.0 average reward score: -3.197265625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.22%) |Training time=0.81s (31.84%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 624|ppo_ep: 1|act_loss: -0.0032520294189453125|cri_loss: 0.0003867149353027344|unsuper_loss: 0.0 average reward score: -4.68359375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.76%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 625|ppo_ep: 1|act_loss: 0.00916290283203125|cri_loss: 0.0017423629760742188|unsuper_loss: 0.0 average reward score: -3.10546875 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.80s (31.60%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58 epoch: 0|step: 626|ppo_ep: 1|act_loss: -0.0003085136413574219|cri_loss: 0.005645751953125|unsuper_loss: 0.0 average reward score: -2.931640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.80s (31.60%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 627|ppo_ep: 1|act_loss: -0.003177642822265625|cri_loss: 0.0014371871948242188|unsuper_loss: 0.0 average reward score: -5.79296875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.96%) |Training time=0.82s (32.09%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58 epoch: 0|step: 628|ppo_ep: 1|act_loss: -0.0007586479187011719|cri_loss: 0.0005908012390136719|unsuper_loss: 0.0 average reward score: -3.56640625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.07%) |Training time=0.81s (31.95%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 [2023-07-01 08:34:14,601] [INFO] [logging.py:96:log_dist] [Rank 0] step=630, skipped=11, lr=[3.223029436261057e-06, 3.223029436261057e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:34:14,783] [INFO] [timer.py:215:stop] epoch=0/micro_step=630/global_step=630, RunningAvgSamplesPerSec=50.926843009517505, CurrSamplesPerSec=48.30053792926741, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:34:14,947] [INFO] [logging.py:96:log_dist] [Rank 0] step=630, skipped=10, lr=[1.6612940643430136e-06, 1.6612940643430136e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 629|ppo_ep: 1|act_loss: 0.06500244140625|cri_loss: 0.02996826171875|unsuper_loss: 0.0 average reward score: -3.935546875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.48s (58.14%) |Training time=0.84s (32.93%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 630|ppo_ep: 1|act_loss: 0.01120758056640625|cri_loss: 0.0019474029541015625|unsuper_loss: 0.0 average reward score: -4.61328125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.24%) |Training time=0.81s (31.84%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 631|ppo_ep: 1|act_loss: 0.01428985595703125|cri_loss: 0.0036029815673828125|unsuper_loss: 0.0 average reward score: -5.41796875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.07%) |Training time=0.81s (31.93%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 632|ppo_ep: 1|act_loss: 0.003864288330078125|cri_loss: 0.00029730796813964844|unsuper_loss: 0.0 average reward score: -4.07421875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.10%) |Training time=0.81s (31.90%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 633|ppo_ep: 1|act_loss: -0.0056915283203125|cri_loss: 0.0002338886260986328|unsuper_loss: 0.0 average reward score: -5.984375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.11%) |Training time=0.81s (31.95%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 634|ppo_ep: 1|act_loss: 0.006618499755859375|cri_loss: 0.0003190040588378906|unsuper_loss: 0.0 average reward score: -5.23828125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.58%) |Training time=0.80s (31.51%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 635|ppo_ep: 1|act_loss: 0.0169830322265625|cri_loss: 0.0018148422241210938|unsuper_loss: 0.0 average reward score: -4.13671875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.80s (31.67%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 636|ppo_ep: 1|act_loss: -0.0096435546875|cri_loss: 0.0003941059112548828|unsuper_loss: 0.0 average reward score: -4.30078125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.28%) |Training time=0.81s (31.77%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 637|ppo_ep: 1|act_loss: -0.04632568359375|cri_loss: 0.0241241455078125|unsuper_loss: 0.0 average reward score: -3.640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.80s (31.64%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 638|ppo_ep: 1|act_loss: 0.024505615234375|cri_loss: 0.0047760009765625|unsuper_loss: 0.0 average reward score: -4.50390625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.54%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.58 [2023-07-01 08:34:39,992] [INFO] [logging.py:96:log_dist] [Rank 0] step=640, skipped=11, lr=[3.056723329025442e-06, 3.056723329025442e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:34:40,169] [INFO] [timer.py:215:stop] epoch=0/micro_step=640/global_step=640, RunningAvgSamplesPerSec=50.92260273955374, CurrSamplesPerSec=51.20208992905518, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:34:40,334] [INFO] [logging.py:96:log_dist] [Rank 0] step=640, skipped=10, lr=[1.5752438497008405e-06, 1.5752438497008405e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 639|ppo_ep: 1|act_loss: -0.00301361083984375|cri_loss: 0.0005254745483398438|unsuper_loss: 0.0 average reward score: -4.51953125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.45%) |Training time=0.80s (31.59%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58 epoch: 0|step: 640|ppo_ep: 1|act_loss: 0.0145721435546875|cri_loss: 0.0082244873046875|unsuper_loss: 0.0 average reward score: -7.609375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.80s (31.57%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 641|ppo_ep: 1|act_loss: 0.021484375|cri_loss: 0.0023136138916015625|unsuper_loss: 0.0 average reward score: -4.48046875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.28%) |Training time=0.81s (31.75%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 642|ppo_ep: 1|act_loss: 0.0064849853515625|cri_loss: 0.0008287429809570312|unsuper_loss: 0.0 average reward score: -4.3046875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.05%) |Training time=0.82s (32.03%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 643|ppo_ep: 1|act_loss: 0.006534576416015625|cri_loss: 0.000553131103515625|unsuper_loss: 0.0 average reward score: -3.30859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (31.98%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 644|ppo_ep: 1|act_loss: 0.0166015625|cri_loss: 0.00087738037109375|unsuper_loss: 0.0 average reward score: -3.4296875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.30%) |Training time=0.81s (31.75%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 645|ppo_ep: 1|act_loss: -0.0167694091796875|cri_loss: 0.0013189315795898438|unsuper_loss: 0.0 average reward score: -5.46875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.94%) |Training time=0.82s (32.07%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 646|ppo_ep: 1|act_loss: -0.041290283203125|cri_loss: 0.012176513671875|unsuper_loss: 0.0 average reward score: -3.31640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.93%) |Training time=0.82s (32.05%) |Others=0.23 (9.02%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 647|ppo_ep: 1|act_loss: -0.0020999908447265625|cri_loss: 0.00032782554626464844|unsuper_loss: 0.0 average reward score: -4.26953125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.00%) |Training time=0.82s (32.04%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 648|ppo_ep: 1|act_loss: 0.0006113052368164062|cri_loss: 0.00011986494064331055|unsuper_loss: 0.0 average reward score: -4.09765625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.11%) |Training time=0.81s (31.93%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 [2023-07-01 08:35:05,402] [INFO] [logging.py:96:log_dist] [Rank 0] step=650, skipped=11, lr=[2.8928099052326388e-06, 2.8928099052326388e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:35:05,582] [INFO] [timer.py:215:stop] epoch=0/micro_step=650/global_step=650, RunningAvgSamplesPerSec=50.9126533891848, CurrSamplesPerSec=50.71060198984486, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:35:05,748] [INFO] [logging.py:96:log_dist] [Rank 0] step=650, skipped=10, lr=[1.490444937394879e-06, 1.490444937394879e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 649|ppo_ep: 1|act_loss: -0.0006499290466308594|cri_loss: 0.0006866455078125|unsuper_loss: 0.0 average reward score: -5.53125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.71%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 650|ppo_ep: 1|act_loss: -0.0176544189453125|cri_loss: 0.005168914794921875|unsuper_loss: 0.0 average reward score: -4.953125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.81s (31.71%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 651|ppo_ep: 1|act_loss: 0.01107025146484375|cri_loss: 0.001972198486328125|unsuper_loss: 0.0 average reward score: -2.615234375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.80s (31.70%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 652|ppo_ep: 1|act_loss: -0.0716552734375|cri_loss: 0.04205322265625|unsuper_loss: 0.0 average reward score: -4.55078125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.60%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 653|ppo_ep: 1|act_loss: -0.00029969215393066406|cri_loss: 0.00037932395935058594|unsuper_loss: 0.0 average reward score: -4.4453125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.80s (31.61%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58 epoch: 0|step: 654|ppo_ep: 1|act_loss: 0.0179901123046875|cri_loss: 0.0026454925537109375|unsuper_loss: 0.0 average reward score: -4.171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.80s (31.67%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 655|ppo_ep: 1|act_loss: 0.0028209686279296875|cri_loss: 0.00038552284240722656|unsuper_loss: 0.0 average reward score: -4.82421875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.20%) |Training time=0.81s (31.80%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 656|ppo_ep: 1|act_loss: -0.0028934478759765625|cri_loss: 0.0019588470458984375|unsuper_loss: 0.0 average reward score: -4.00390625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.49s (58.72%) |Training time=0.82s (32.31%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 657|ppo_ep: 1|act_loss: -0.008941650390625|cri_loss: 0.0005779266357421875|unsuper_loss: 0.0 average reward score: -3.626953125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.31%) |Training time=0.80s (31.74%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 658|ppo_ep: 1|act_loss: -0.0026035308837890625|cri_loss: 0.0007138252258300781|unsuper_loss: 0.0 average reward score: -4.234375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.36%) |Training time=0.80s (31.66%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 [2023-07-01 08:35:30,780] [INFO] [logging.py:96:log_dist] [Rank 0] step=660, skipped=11, lr=[2.7315109587577825e-06, 2.7315109587577825e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:35:30,962] [INFO] [timer.py:215:stop] epoch=0/micro_step=660/global_step=660, RunningAvgSamplesPerSec=50.90763828588423, CurrSamplesPerSec=49.94958363477759, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:35:31,128] [INFO] [logging.py:96:log_dist] [Rank 0] step=660, skipped=10, lr=[1.407012070189524e-06, 1.407012070189524e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 659|ppo_ep: 1|act_loss: -0.003997802734375|cri_loss: 0.0002703666687011719|unsuper_loss: 0.0 average reward score: -2.380859375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.04%) |Training time=0.82s (32.00%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58 epoch: 0|step: 660|ppo_ep: 1|act_loss: -0.0269317626953125|cri_loss: 0.008026123046875|unsuper_loss: 0.0 average reward score: -4.421875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.89%) |Training time=0.82s (32.12%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58 epoch: 0|step: 661|ppo_ep: 1|act_loss: -0.0170135498046875|cri_loss: 0.0014743804931640625|unsuper_loss: 0.0 average reward score: -3.12109375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.22%) |Training time=0.81s (31.80%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 662|ppo_ep: 1|act_loss: 0.01515960693359375|cri_loss: 0.0032825469970703125|unsuper_loss: 0.0 average reward score: -3.9140625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.80s (31.62%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 663|ppo_ep: 1|act_loss: -0.00861358642578125|cri_loss: 0.002689361572265625|unsuper_loss: 0.0 average reward score: -3.32421875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.00%) |Training time=0.81s (31.99%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 664|ppo_ep: 1|act_loss: 0.0018520355224609375|cri_loss: 0.0006785392761230469|unsuper_loss: 0.0 average reward score: -3.875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.74%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.58 epoch: 0|step: 665|ppo_ep: 1|act_loss: 0.01678466796875|cri_loss: 0.0006666183471679688|unsuper_loss: 0.0 average reward score: -5.4921875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.90%) |Training time=0.82s (32.17%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58 epoch: 0|step: 666|ppo_ep: 1|act_loss: 0.01038360595703125|cri_loss: 0.0011816024780273438|unsuper_loss: 0.0 average reward score: -3.482421875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.81s (31.72%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 667|ppo_ep: 1|act_loss: -0.005947113037109375|cri_loss: 0.0005092620849609375|unsuper_loss: 0.0 average reward score: -2.95703125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.56%) |Training time=0.80s (31.52%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 668|ppo_ep: 1|act_loss: -0.020172119140625|cri_loss: 0.001651763916015625|unsuper_loss: 0.0 average reward score: -4.99609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.22%) |Training time=0.81s (31.79%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 [2023-07-01 08:35:56,209] [INFO] [logging.py:96:log_dist] [Rank 0] step=670, skipped=11, lr=[2.573044745784934e-06, 2.573044745784934e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:35:56,387] [INFO] [timer.py:215:stop] epoch=0/micro_step=670/global_step=670, RunningAvgSamplesPerSec=50.899503712786164, CurrSamplesPerSec=50.331698331698334, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:35:56,553] [INFO] [logging.py:96:log_dist] [Rank 0] step=670, skipped=10, lr=[1.3250581424317012e-06, 1.3250581424317012e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 669|ppo_ep: 1|act_loss: -0.00519561767578125|cri_loss: 0.00554656982421875|unsuper_loss: 0.0 average reward score: -3.53515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.17%) |Training time=0.81s (31.88%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 670|ppo_ep: 1|act_loss: -0.016693115234375|cri_loss: 0.00327301025390625|unsuper_loss: 0.0 average reward score: -4.60546875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.80s (31.70%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 671|ppo_ep: 1|act_loss: -0.0176544189453125|cri_loss: 0.0012159347534179688|unsuper_loss: 0.0 average reward score: -4.28515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (32.00%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 672|ppo_ep: 1|act_loss: -0.004608154296875|cri_loss: 0.0005364418029785156|unsuper_loss: 0.0 average reward score: -3.412109375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.46s (57.29%) |Training time=0.86s (33.73%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 673|ppo_ep: 1|act_loss: 0.0150604248046875|cri_loss: 0.00801849365234375|unsuper_loss: 0.0 average reward score: -3.40234375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.17%) |Training time=0.81s (31.90%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 674|ppo_ep: 1|act_loss: -0.00829315185546875|cri_loss: 0.0011472702026367188|unsuper_loss: 0.0 average reward score: -4.3984375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.24%) |Training time=0.81s (31.78%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 675|ppo_ep: 1|act_loss: 0.004817962646484375|cri_loss: 0.0008697509765625|unsuper_loss: 0.0 average reward score: -5.7578125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.75%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 676|ppo_ep: 1|act_loss: 0.005290985107421875|cri_loss: 0.0013189315795898438|unsuper_loss: 0.0 average reward score: -3.876953125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.15%) |Training time=0.81s (31.88%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 677|ppo_ep: 1|act_loss: 0.0023670196533203125|cri_loss: 0.0018939971923828125|unsuper_loss: 0.0 average reward score: -3.224609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.81s (31.69%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 678|ppo_ep: 1|act_loss: 0.007465362548828125|cri_loss: 0.0005240440368652344|unsuper_loss: 0.0 average reward score: -4.48046875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.93%) |Training time=0.82s (32.10%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 [2023-07-01 08:36:21,625] [INFO] [logging.py:96:log_dist] [Rank 0] step=680, skipped=11, lr=[2.4176256894811497e-06, 2.4176256894811497e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:36:21,807] [INFO] [timer.py:215:stop] epoch=0/micro_step=680/global_step=680, RunningAvgSamplesPerSec=50.88510715740166, CurrSamplesPerSec=50.29265861640763, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:36:21,974] [INFO] [logging.py:96:log_dist] [Rank 0] step=680, skipped=10, lr=[1.24469404729171e-06, 1.24469404729171e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 679|ppo_ep: 1|act_loss: 0.0241851806640625|cri_loss: 0.0012350082397460938|unsuper_loss: 0.0 average reward score: -5.125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.23%) |Training time=0.81s (31.81%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58 epoch: 0|step: 680|ppo_ep: 1|act_loss: -0.00664520263671875|cri_loss: 0.0006580352783203125|unsuper_loss: 0.0 average reward score: -3.478515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.21%) |Training time=0.81s (31.86%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 681|ppo_ep: 1|act_loss: 0.00940704345703125|cri_loss: 0.0007810592651367188|unsuper_loss: 0.0 average reward score: -3.75390625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.61%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 682|ppo_ep: 1|act_loss: 0.00467681884765625|cri_loss: 0.0007023811340332031|unsuper_loss: 0.0 average reward score: -5.06640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.87%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 683|ppo_ep: 1|act_loss: -0.0011186599731445312|cri_loss: 0.0002536773681640625|unsuper_loss: 0.0 average reward score: -4.2421875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.07%) |Training time=0.81s (32.00%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 684|ppo_ep: 1|act_loss: -0.0073089599609375|cri_loss: 0.0005397796630859375|unsuper_loss: 0.0 average reward score: -3.705078125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.27%) |Training time=0.81s (31.84%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 685|ppo_ep: 1|act_loss: -0.030120849609375|cri_loss: 0.0026912689208984375|unsuper_loss: 0.0 average reward score: -4.69140625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.41%) |Training time=0.80s (31.66%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58 epoch: 0|step: 686|ppo_ep: 1|act_loss: -0.0035877227783203125|cri_loss: 0.0005083084106445312|unsuper_loss: 0.0 average reward score: -4.41796875 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.36%) |Training time=0.80s (31.67%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58 epoch: 0|step: 687|ppo_ep: 1|act_loss: -0.0120391845703125|cri_loss: 0.004276275634765625|unsuper_loss: 0.0 average reward score: -3.0703125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (32.01%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 688|ppo_ep: 1|act_loss: 0.00687408447265625|cri_loss: 0.00045228004455566406|unsuper_loss: 0.0 average reward score: -5.22265625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.23%) |Training time=0.81s (31.80%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 [2023-07-01 08:36:47,009] [INFO] [logging.py:96:log_dist] [Rank 0] step=690, skipped=11, lr=[2.265464089857071e-06, 2.265464089857071e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:36:47,187] [INFO] [timer.py:215:stop] epoch=0/micro_step=690/global_step=690, RunningAvgSamplesPerSec=50.88019675638302, CurrSamplesPerSec=51.0109932793722, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:36:47,351] [INFO] [logging.py:96:log_dist] [Rank 0] step=690, skipped=10, lr=[1.1660285267119167e-06, 1.1660285267119167e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 689|ppo_ep: 1|act_loss: 0.010711669921875|cri_loss: 0.0005822181701660156|unsuper_loss: 0.0 average reward score: -5.53515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.60%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 690|ppo_ep: 1|act_loss: 0.004749298095703125|cri_loss: 0.0013265609741210938|unsuper_loss: 0.0 average reward score: -5.6171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.28%) |Training time=0.81s (31.77%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 691|ppo_ep: 1|act_loss: 0.0156707763671875|cri_loss: 0.001941680908203125|unsuper_loss: 0.0 average reward score: -3.5546875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.14%) |Training time=0.81s (31.86%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 692|ppo_ep: 1|act_loss: -0.0091552734375|cri_loss: 0.002437591552734375|unsuper_loss: 0.0 average reward score: -2.82421875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.93%) |Training time=0.82s (32.12%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58 epoch: 0|step: 693|ppo_ep: 1|act_loss: -0.0129852294921875|cri_loss: 0.002227783203125|unsuper_loss: 0.0 average reward score: -3.982421875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.94%) |Training time=0.82s (32.09%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 694|ppo_ep: 1|act_loss: -0.0082855224609375|cri_loss: 0.0012445449829101562|unsuper_loss: 0.0 average reward score: -4.4609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.28%) |Training time=0.81s (31.78%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 695|ppo_ep: 1|act_loss: -0.00780487060546875|cri_loss: 0.0009775161743164062|unsuper_loss: 0.0 average reward score: -4.7109375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.04%) |Training time=0.81s (32.01%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 696|ppo_ep: 1|act_loss: -0.0004851818084716797|cri_loss: 0.000576019287109375|unsuper_loss: 0.0 average reward score: -3.759765625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.13%) |Training time=0.81s (31.91%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.58 epoch: 0|step: 697|ppo_ep: 1|act_loss: -6.580352783203125e-05|cri_loss: 0.0011091232299804688|unsuper_loss: 0.0 average reward score: -4.53515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.49s (58.55%) |Training time=0.83s (32.48%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 698|ppo_ep: 1|act_loss: -0.017791748046875|cri_loss: 0.0018568038940429688|unsuper_loss: 0.0 average reward score: -4.72265625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.30%) |Training time=0.81s (31.78%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 [2023-07-01 08:37:12,426] [INFO] [logging.py:96:log_dist] [Rank 0] step=700, skipped=11, lr=[2.116765839206601e-06, 2.116765839206601e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:37:12,604] [INFO] [timer.py:215:stop] epoch=0/micro_step=700/global_step=700, RunningAvgSamplesPerSec=50.86950985719415, CurrSamplesPerSec=51.042187146445485, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:37:12,769] [INFO] [logging.py:96:log_dist] [Rank 0] step=700, skipped=10, lr=[1.0891680242662836e-06, 1.0891680242662836e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 699|ppo_ep: 1|act_loss: -0.007610321044921875|cri_loss: 0.00084686279296875|unsuper_loss: 0.0 average reward score: -3.6171875 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.40%) |Training time=0.80s (31.67%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.58 epoch: 0|step: 700|ppo_ep: 1|act_loss: 0.0019073486328125|cri_loss: 0.00036907196044921875|unsuper_loss: 0.0 average reward score: -5.52734375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.28%) |Training time=0.80s (31.74%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 701|ppo_ep: 1|act_loss: -0.0003910064697265625|cri_loss: 0.00396728515625|unsuper_loss: 0.0 average reward score: -3.80078125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.99%) |Training time=0.82s (32.09%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 702|ppo_ep: 1|act_loss: -0.00685882568359375|cri_loss: 0.0008416175842285156|unsuper_loss: 0.0 average reward score: -3.650390625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.00%) |Training time=0.81s (32.05%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 703|ppo_ep: 1|act_loss: -0.0419921875|cri_loss: 0.043670654296875|unsuper_loss: 0.0 average reward score: -4.02734375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.20%) |Training time=0.81s (31.87%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 704|ppo_ep: 1|act_loss: 0.0086212158203125|cri_loss: 0.0016756057739257812|unsuper_loss: 0.0 average reward score: -5.3828125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.80%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 705|ppo_ep: 1|act_loss: 0.009490966796875|cri_loss: 0.0016946792602539062|unsuper_loss: 0.0 average reward score: -4.59765625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.33%) |Training time=0.80s (31.69%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 706|ppo_ep: 1|act_loss: -0.007720947265625|cri_loss: 0.0004642009735107422|unsuper_loss: 0.0 average reward score: -4.6875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.41%) |Training time=0.80s (31.57%) |Others=0.23 (9.02%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 707|ppo_ep: 1|act_loss: -0.00814056396484375|cri_loss: 0.0027980804443359375|unsuper_loss: 0.0 average reward score: -3.408203125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.48s (58.03%) |Training time=0.84s (33.04%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58 epoch: 0|step: 708|ppo_ep: 1|act_loss: -0.0350341796875|cri_loss: 0.013916015625|unsuper_loss: 0.0 average reward score: -4.26171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.80s (31.65%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 [2023-07-01 08:37:37,811] [INFO] [logging.py:96:log_dist] [Rank 0] step=710, skipped=11, lr=[1.971732143510771e-06, 1.971732143510771e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:37:37,994] [INFO] [timer.py:215:stop] epoch=0/micro_step=710/global_step=710, RunningAvgSamplesPerSec=50.86127151176703, CurrSamplesPerSec=50.825936586546234, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:37:38,159] [INFO] [logging.py:96:log_dist] [Rank 0] step=710, skipped=10, lr=[1.0142165411298664e-06, 1.0142165411298664e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 709|ppo_ep: 1|act_loss: -0.00101470947265625|cri_loss: 0.00041413307189941406|unsuper_loss: 0.0 average reward score: -5.30859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.80s (31.66%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 710|ppo_ep: 1|act_loss: 0.0036792755126953125|cri_loss: 0.0013628005981445312|unsuper_loss: 0.0 average reward score: -3.939453125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.14%) |Training time=0.81s (31.87%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58 epoch: 0|step: 711|ppo_ep: 1|act_loss: 0.0159912109375|cri_loss: 0.0019550323486328125|unsuper_loss: 0.0 average reward score: -4.71875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.16%) |Training time=0.81s (31.88%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 712|ppo_ep: 1|act_loss: 0.01277923583984375|cri_loss: 0.0009074211120605469|unsuper_loss: 0.0 average reward score: -3.8125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.80%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 713|ppo_ep: 1|act_loss: -0.0029850006103515625|cri_loss: 0.00010502338409423828|unsuper_loss: 0.0 average reward score: -3.578125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.15%) |Training time=0.81s (31.91%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 714|ppo_ep: 1|act_loss: 0.00995635986328125|cri_loss: 0.001277923583984375|unsuper_loss: 0.0 average reward score: -4.32421875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.74%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 715|ppo_ep: 1|act_loss: -0.01531219482421875|cri_loss: 0.0022068023681640625|unsuper_loss: 0.0 average reward score: -5.65625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.85%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 716|ppo_ep: 1|act_loss: -0.004917144775390625|cri_loss: 0.0011529922485351562|unsuper_loss: 0.0 average reward score: -3.8515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.14%) |Training time=0.81s (31.92%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 717|ppo_ep: 1|act_loss: -0.00514984130859375|cri_loss: 0.0010976791381835938|unsuper_loss: 0.0 average reward score: -5.02734375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.48s (58.24%) |Training time=0.83s (32.87%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 718|ppo_ep: 1|act_loss: -0.001148223876953125|cri_loss: 0.000560760498046875|unsuper_loss: 0.0 average reward score: -4.53515625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.32%) |Training time=0.80s (31.75%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58 [2023-07-01 08:38:03,223] [INFO] [logging.py:96:log_dist] [Rank 0] step=720, skipped=11, lr=[1.830559250182685e-06, 1.830559250182685e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:38:03,401] [INFO] [timer.py:215:stop] epoch=0/micro_step=720/global_step=720, RunningAvgSamplesPerSec=50.851094873295665, CurrSamplesPerSec=49.903581317406946, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:38:03,567] [INFO] [logging.py:96:log_dist] [Rank 0] step=720, skipped=10, lr=[9.412754953531664e-07, 9.412754953531664e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 719|ppo_ep: 1|act_loss: -0.00861358642578125|cri_loss: 0.0006313323974609375|unsuper_loss: 0.0 average reward score: -4.81640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.97%) |Training time=0.82s (32.08%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 720|ppo_ep: 1|act_loss: 0.0033512115478515625|cri_loss: 0.00061798095703125|unsuper_loss: 0.0 average reward score: -4.5859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.01%) |Training time=0.81s (32.04%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 721|ppo_ep: 1|act_loss: -0.02392578125|cri_loss: 0.00179290771484375|unsuper_loss: 0.0 average reward score: -3.888671875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.97%) |Training time=0.82s (32.16%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 722|ppo_ep: 1|act_loss: 0.000568389892578125|cri_loss: 0.00021326541900634766|unsuper_loss: 0.0 average reward score: -3.6953125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.14%) |Training time=0.81s (31.85%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.58 epoch: 0|step: 723|ppo_ep: 1|act_loss: 0.0014905929565429688|cri_loss: 0.0007548332214355469|unsuper_loss: 0.0 average reward score: -4.53125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.25%) |Training time=0.81s (31.78%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 724|ppo_ep: 1|act_loss: -0.008270263671875|cri_loss: 0.00160980224609375|unsuper_loss: 0.0 average reward score: -4.4296875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.25%) |Training time=0.81s (31.79%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 725|ppo_ep: 1|act_loss: 0.00617218017578125|cri_loss: 0.0007119178771972656|unsuper_loss: 0.0 average reward score: -4.375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.11%) |Training time=0.81s (31.92%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 726|ppo_ep: 1|act_loss: 0.005893707275390625|cri_loss: 0.0007605552673339844|unsuper_loss: 0.0 average reward score: -6.3515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.78%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 727|ppo_ep: 1|act_loss: -0.0147247314453125|cri_loss: 0.0007014274597167969|unsuper_loss: 0.0 average reward score: -5.3359375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.26%) |Training time=0.81s (31.77%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 728|ppo_ep: 1|act_loss: 0.00397491455078125|cri_loss: 0.0003597736358642578|unsuper_loss: 0.0 average reward score: -4.04296875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.14%) |Training time=0.81s (31.87%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 [2023-07-01 08:38:28,605] [INFO] [logging.py:96:log_dist] [Rank 0] step=730, skipped=11, lr=[1.693438182522029e-06, 1.693438182522029e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:38:28,786] [INFO] [timer.py:215:stop] epoch=0/micro_step=730/global_step=730, RunningAvgSamplesPerSec=50.844022869884185, CurrSamplesPerSec=50.363465119088985, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:38:28,951] [INFO] [logging.py:96:log_dist] [Rank 0] step=730, skipped=10, lr=[8.704435846317385e-07, 8.704435846317385e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 729|ppo_ep: 1|act_loss: -0.00041675567626953125|cri_loss: 0.0006022453308105469|unsuper_loss: 0.0 average reward score: -4.05078125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.13%) |Training time=0.81s (31.90%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 730|ppo_ep: 1|act_loss: 4.595518112182617e-05|cri_loss: 0.002178192138671875|unsuper_loss: 0.0 average reward score: -5.625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.75%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 731|ppo_ep: 1|act_loss: 0.0128021240234375|cri_loss: 0.001041412353515625|unsuper_loss: 0.0 average reward score: -3.54296875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.75%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 732|ppo_ep: 1|act_loss: -0.01190185546875|cri_loss: 0.0007181167602539062|unsuper_loss: 0.0 average reward score: -4.59765625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.31%) |Training time=0.81s (31.80%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 733|ppo_ep: 1|act_loss: 0.007289886474609375|cri_loss: 0.0006923675537109375|unsuper_loss: 0.0 average reward score: -3.78515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.80s (31.66%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 734|ppo_ep: 1|act_loss: -0.0007567405700683594|cri_loss: 0.000946044921875|unsuper_loss: 0.0 average reward score: -4.7265625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.80s (31.59%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 735|ppo_ep: 1|act_loss: -0.00862884521484375|cri_loss: 0.0006604194641113281|unsuper_loss: 0.0 average reward score: -3.1015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.21%) |Training time=0.81s (31.85%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 736|ppo_ep: 1|act_loss: -0.00714111328125|cri_loss: 0.0013666152954101562|unsuper_loss: 0.0 average reward score: -3.96875 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.46%) |Others=0.23 (9.02%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 737|ppo_ep: 1|act_loss: -0.01313018798828125|cri_loss: 0.0015802383422851562|unsuper_loss: 0.0 average reward score: -3.1640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.55%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 738|ppo_ep: 1|act_loss: -0.007770538330078125|cri_loss: 0.0007648468017578125|unsuper_loss: 0.0 average reward score: -4.6953125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.25%) |Training time=0.81s (31.76%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 [2023-07-01 08:38:53,998] [INFO] [logging.py:96:log_dist] [Rank 0] step=740, skipped=11, lr=[1.5605544812383717e-06, 1.5605544812383717e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:38:54,181] [INFO] [timer.py:215:stop] epoch=0/micro_step=740/global_step=740, RunningAvgSamplesPerSec=50.84138854080732, CurrSamplesPerSec=49.77470021379526, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:38:54,347] [INFO] [logging.py:96:log_dist] [Rank 0] step=740, skipped=10, lr=[8.018166527567672e-07, 8.018166527567672e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 739|ppo_ep: 1|act_loss: 0.0006642341613769531|cri_loss: 0.0006761550903320312|unsuper_loss: 0.0 average reward score: -4.125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.00%) |Training time=0.82s (32.03%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.58 epoch: 0|step: 740|ppo_ep: 1|act_loss: -0.019683837890625|cri_loss: 0.0009579658508300781|unsuper_loss: 0.0 average reward score: -4.09375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.26%) |Training time=0.81s (31.80%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 741|ppo_ep: 1|act_loss: 0.01412200927734375|cri_loss: 0.008697509765625|unsuper_loss: 0.0 average reward score: -4.5 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.28%) |Training time=0.81s (31.75%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 742|ppo_ep: 1|act_loss: -0.0027027130126953125|cri_loss: 0.0007004737854003906|unsuper_loss: 0.0 average reward score: -5.11328125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.81%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 743|ppo_ep: 1|act_loss: -0.0011072158813476562|cri_loss: 0.0011377334594726562|unsuper_loss: 0.0 average reward score: -4.46875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.92%) |Training time=0.82s (32.08%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58 epoch: 0|step: 744|ppo_ep: 1|act_loss: -0.010162353515625|cri_loss: 0.0020160675048828125|unsuper_loss: 0.0 average reward score: -3.71875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.49s (58.67%) |Training time=0.82s (32.42%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 745|ppo_ep: 1|act_loss: 0.001995086669921875|cri_loss: 0.0003790855407714844|unsuper_loss: 0.0 average reward score: -5.296875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.87%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 746|ppo_ep: 1|act_loss: -0.007068634033203125|cri_loss: 0.0006475448608398438|unsuper_loss: 0.0 average reward score: -4.09765625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.24%) |Training time=0.81s (31.84%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 747|ppo_ep: 1|act_loss: -0.01476287841796875|cri_loss: 0.0010480880737304688|unsuper_loss: 0.0 average reward score: -5.875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.80s (31.63%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 748|ppo_ep: 1|act_loss: -0.01001739501953125|cri_loss: 0.0020313262939453125|unsuper_loss: 0.0 average reward score: -3.400390625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.41%) |Training time=0.80s (31.63%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 [2023-07-01 08:39:19,397] [INFO] [logging.py:96:log_dist] [Rank 0] step=750, skipped=11, lr=[1.432087953393078e-06, 1.432087953393078e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:39:19,575] [INFO] [timer.py:215:stop] epoch=0/micro_step=750/global_step=750, RunningAvgSamplesPerSec=50.83529230409125, CurrSamplesPerSec=50.3979212664654, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:39:19,740] [INFO] [logging.py:96:log_dist] [Rank 0] step=750, skipped=10, lr=[7.354875599272929e-07, 7.354875599272929e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 749|ppo_ep: 1|act_loss: -0.006053924560546875|cri_loss: 0.00122833251953125|unsuper_loss: 0.0 average reward score: -4.73046875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.27%) |Training time=0.81s (31.84%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 750|ppo_ep: 1|act_loss: -0.0158843994140625|cri_loss: 0.00801849365234375|unsuper_loss: 0.0 average reward score: -4.4140625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.44%) |Training time=0.80s (31.65%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 751|ppo_ep: 1|act_loss: 0.0097198486328125|cri_loss: 0.00035858154296875|unsuper_loss: 0.0 average reward score: -3.943359375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.53%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58 epoch: 0|step: 752|ppo_ep: 1|act_loss: -0.0006928443908691406|cri_loss: 0.0007891654968261719|unsuper_loss: 0.0 average reward score: -4.296875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.59%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 753|ppo_ep: 1|act_loss: 0.0091094970703125|cri_loss: 0.00044417381286621094|unsuper_loss: 0.0 average reward score: -4.78515625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.87%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 754|ppo_ep: 1|act_loss: -0.01537322998046875|cri_loss: 0.0004978179931640625|unsuper_loss: 0.0 average reward score: -4.9296875 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.48s (58.25%) |Training time=0.83s (32.79%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 755|ppo_ep: 1|act_loss: 0.009246826171875|cri_loss: 0.001781463623046875|unsuper_loss: 0.0 average reward score: -4.62890625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.28%) |Training time=0.81s (31.79%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 756|ppo_ep: 1|act_loss: -0.007114410400390625|cri_loss: 0.0013494491577148438|unsuper_loss: 0.0 average reward score: -4.40234375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.21%) |Training time=0.81s (31.82%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 757|ppo_ep: 1|act_loss: -0.002605438232421875|cri_loss: 0.0009036064147949219|unsuper_loss: 0.0 average reward score: -4.49609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.04%) |Training time=0.81s (31.97%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 758|ppo_ep: 1|act_loss: 0.004314422607421875|cri_loss: 0.0006737709045410156|unsuper_loss: 0.0 average reward score: -4.6875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.04%) |Training time=0.81s (32.03%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 [2023-07-01 08:39:44,777] [INFO] [logging.py:96:log_dist] [Rank 0] step=760, skipped=11, lr=[1.308212429099484e-06, 1.308212429099484e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:39:44,960] [INFO] [timer.py:215:stop] epoch=0/micro_step=760/global_step=760, RunningAvgSamplesPerSec=50.82850130253925, CurrSamplesPerSec=50.211905159171664, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:39:45,125] [INFO] [logging.py:96:log_dist] [Rank 0] step=760, skipped=10, lr=[6.715460570995988e-07, 6.715460570995988e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 759|ppo_ep: 1|act_loss: 4.684925079345703e-05|cri_loss: 0.0006060600280761719|unsuper_loss: 0.0 average reward score: -5.1171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.10%) |Training time=0.81s (31.94%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 760|ppo_ep: 1|act_loss: -0.001979827880859375|cri_loss: 0.0007104873657226562|unsuper_loss: 0.0 average reward score: -4.86328125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.16%) |Training time=0.81s (31.92%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 761|ppo_ep: 1|act_loss: 0.008544921875|cri_loss: 0.01041412353515625|unsuper_loss: 0.0 average reward score: -5.0546875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.94%) |Training time=0.82s (32.11%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 762|ppo_ep: 1|act_loss: 0.005626678466796875|cri_loss: 0.00041794776916503906|unsuper_loss: 0.0 average reward score: -4.87109375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.04%) |Training time=0.81s (32.00%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 763|ppo_ep: 1|act_loss: 0.0004391670227050781|cri_loss: 0.0004982948303222656|unsuper_loss: 0.0 average reward score: -3.91796875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.11%) |Training time=0.81s (31.94%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 764|ppo_ep: 1|act_loss: -0.015869140625|cri_loss: 0.006748199462890625|unsuper_loss: 0.0 average reward score: -3.833984375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.29%) |Training time=0.80s (31.77%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58 epoch: 0|step: 765|ppo_ep: 1|act_loss: 0.0147857666015625|cri_loss: 0.0009660720825195312|unsuper_loss: 0.0 average reward score: -4.5625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.82%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58 epoch: 0|step: 766|ppo_ep: 1|act_loss: 0.00940704345703125|cri_loss: 0.0006909370422363281|unsuper_loss: 0.0 average reward score: -3.091796875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.26%) |Training time=0.81s (31.79%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 767|ppo_ep: 1|act_loss: 0.024932861328125|cri_loss: 0.0034046173095703125|unsuper_loss: 0.0 average reward score: -4.703125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.23%) |Training time=0.81s (31.81%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 768|ppo_ep: 1|act_loss: -0.00765228271484375|cri_loss: 0.00034332275390625|unsuper_loss: 0.0 average reward score: -5.49609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.30%) |Training time=0.81s (31.79%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 [2023-07-01 08:40:10,166] [INFO] [logging.py:96:log_dist] [Rank 0] step=770, skipped=11, lr=[1.1890955263106013e-06, 1.1890955263106013e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:40:10,344] [INFO] [timer.py:215:stop] epoch=0/micro_step=770/global_step=770, RunningAvgSamplesPerSec=50.82270925096035, CurrSamplesPerSec=50.63325765755162, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:40:10,510] [INFO] [logging.py:96:log_dist] [Rank 0] step=770, skipped=10, lr=[6.100786645437481e-07, 6.100786645437481e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 769|ppo_ep: 1|act_loss: 0.0008969306945800781|cri_loss: 0.0002808570861816406|unsuper_loss: 0.0 average reward score: -4.953125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.29%) |Training time=0.81s (31.78%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 770|ppo_ep: 1|act_loss: -0.0043182373046875|cri_loss: 0.0003192424774169922|unsuper_loss: 0.0 average reward score: -5.0234375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.80s (31.62%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 771|ppo_ep: 1|act_loss: 0.007114410400390625|cri_loss: 0.0010290145874023438|unsuper_loss: 0.0 average reward score: -2.98828125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.15%) |Training time=0.81s (31.88%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 772|ppo_ep: 1|act_loss: 0.003932952880859375|cri_loss: 0.00020432472229003906|unsuper_loss: 0.0 average reward score: -6.9453125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.13%) |Training time=0.81s (31.94%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 773|ppo_ep: 1|act_loss: -0.0207061767578125|cri_loss: 0.0019989013671875|unsuper_loss: 0.0 average reward score: -6.3125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.89%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 774|ppo_ep: 1|act_loss: -0.00439453125|cri_loss: 0.0006737709045410156|unsuper_loss: 0.0 average reward score: -3.359375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.27%) |Training time=0.81s (31.78%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 775|ppo_ep: 1|act_loss: -0.00978851318359375|cri_loss: 0.0010223388671875|unsuper_loss: 0.0 average reward score: -4.4609375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.06%) |Training time=0.81s (32.01%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 776|ppo_ep: 1|act_loss: -0.002338409423828125|cri_loss: 0.00022780895233154297|unsuper_loss: 0.0 average reward score: -5.55078125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.27%) |Training time=0.81s (31.76%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 777|ppo_ep: 1|act_loss: 0.007259368896484375|cri_loss: 0.0019664764404296875|unsuper_loss: 0.0 average reward score: -4.9375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.80s (31.69%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 778|ppo_ep: 1|act_loss: 0.01006317138671875|cri_loss: 0.0007495880126953125|unsuper_loss: 0.0 average reward score: -3.859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.22%) |Training time=0.80s (31.74%) |Others=0.23 (9.04%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 [2023-07-01 08:40:35,570] [INFO] [logging.py:96:log_dist] [Rank 0] step=780, skipped=11, lr=[1.0748984240125836e-06, 1.0748984240125836e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:40:35,749] [INFO] [timer.py:215:stop] epoch=0/micro_step=780/global_step=780, RunningAvgSamplesPerSec=50.8188052236844, CurrSamplesPerSec=51.103794393412976, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:40:35,915] [INFO] [logging.py:96:log_dist] [Rank 0] step=780, skipped=10, lr=[5.511685547716328e-07, 5.511685547716328e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 779|ppo_ep: 1|act_loss: -0.0020656585693359375|cri_loss: 0.000621795654296875|unsuper_loss: 0.0 average reward score: -3.849609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.60%) |Training time=0.80s (31.44%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 780|ppo_ep: 1|act_loss: 0.0164031982421875|cri_loss: 0.0018396377563476562|unsuper_loss: 0.0 average reward score: -4.9140625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.58%) |Training time=0.80s (31.51%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58 epoch: 0|step: 781|ppo_ep: 1|act_loss: -0.0005102157592773438|cri_loss: 0.0007472038269042969|unsuper_loss: 0.0 average reward score: -3.876953125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.54%) |Training time=0.80s (31.54%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58 epoch: 0|step: 782|ppo_ep: 1|act_loss: 0.01511383056640625|cri_loss: 0.007110595703125|unsuper_loss: 0.0 average reward score: -3.287109375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.55%) |Training time=0.80s (31.55%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 783|ppo_ep: 1|act_loss: -0.011383056640625|cri_loss: 0.0003991127014160156|unsuper_loss: 0.0 average reward score: -6.77734375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.54%) |Training time=0.80s (31.55%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 784|ppo_ep: 1|act_loss: 0.004100799560546875|cri_loss: 0.0004754066467285156|unsuper_loss: 0.0 average reward score: -5.359375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.51%) |Training time=0.80s (31.59%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58 epoch: 0|step: 785|ppo_ep: 1|act_loss: 0.010894775390625|cri_loss: 0.0007996559143066406|unsuper_loss: 0.0 average reward score: -4.60546875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.47s (57.90%) |Training time=0.84s (33.12%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 786|ppo_ep: 1|act_loss: -0.0063629150390625|cri_loss: 0.0010843276977539062|unsuper_loss: 0.0 average reward score: -6.421875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.91%) |Training time=0.82s (32.13%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 787|ppo_ep: 1|act_loss: 0.01013946533203125|cri_loss: 0.00337982177734375|unsuper_loss: 0.0 average reward score: -4.49609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.06%) |Training time=0.81s (31.98%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 788|ppo_ep: 1|act_loss: -0.005931854248046875|cri_loss: 0.0003879070281982422|unsuper_loss: 0.0 average reward score: -6.0390625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.05%) |Training time=0.81s (31.93%) |Others=0.23 (9.02%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 [2023-07-01 08:41:00,990] [INFO] [logging.py:96:log_dist] [Rank 0] step=790, skipped=11, lr=[9.657756441308542e-07, 9.657756441308542e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:41:01,173] [INFO] [timer.py:215:stop] epoch=0/micro_step=790/global_step=790, RunningAvgSamplesPerSec=50.81225974869159, CurrSamplesPerSec=50.43987305261852, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:41:01,339] [INFO] [logging.py:96:log_dist] [Rank 0] step=790, skipped=10, lr=[4.948954399949105e-07, 4.948954399949105e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 789|ppo_ep: 1|act_loss: 0.0230560302734375|cri_loss: 0.00936126708984375|unsuper_loss: 0.0 average reward score: -3.810546875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.86%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 [2023-07-01 08:41:03,879] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, but hysteresis is 2. Reducing hysteresis to 1 epoch: 0|step: 790|ppo_ep: 1|act_loss: -0.009185791015625|cri_loss: 0.00069427490234375|unsuper_loss: 0.0 average reward score: -4.87109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.51s (60.35%) |Training time=0.81s (32.39%) |Others=0.18 (7.26%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.58 epoch: 0|step: 791|ppo_ep: 1|act_loss: -0.0019931793212890625|cri_loss: 0.001560211181640625|unsuper_loss: 0.0 average reward score: -4.86328125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.26%) |Training time=0.81s (31.82%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 792|ppo_ep: 1|act_loss: -0.0013027191162109375|cri_loss: 0.0004451274871826172|unsuper_loss: 0.0 average reward score: -4.37109375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.80s (31.68%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 793|ppo_ep: 1|act_loss: -0.00081634521484375|cri_loss: 0.00010085105895996094|unsuper_loss: 0.0 average reward score: -4.42578125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.80s (31.68%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 794|ppo_ep: 1|act_loss: 0.00760650634765625|cri_loss: 0.0010881423950195312|unsuper_loss: 0.0 average reward score: -3.1796875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.97%) |Training time=0.82s (32.07%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 795|ppo_ep: 1|act_loss: 0.041595458984375|cri_loss: 0.0131683349609375|unsuper_loss: 0.0 average reward score: -6.375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.21%) |Training time=0.81s (31.88%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 796|ppo_ep: 1|act_loss: -0.01038360595703125|cri_loss: 0.0006246566772460938|unsuper_loss: 0.0 average reward score: -4.77734375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.39%) |Training time=0.80s (31.69%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.58 epoch: 0|step: 797|ppo_ep: 1|act_loss: -0.0028705596923828125|cri_loss: 0.0006313323974609375|unsuper_loss: 0.0 average reward score: -3.50390625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.21%) |Training time=0.81s (31.88%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 798|ppo_ep: 1|act_loss: 0.01387786865234375|cri_loss: 0.0022983551025390625|unsuper_loss: 0.0 average reward score: -4.75390625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.80s (31.68%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 [2023-07-01 08:41:26,352] [INFO] [logging.py:96:log_dist] [Rank 0] step=800, skipped=11, lr=[8.618748424440287e-07, 8.618748424440287e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:41:26,529] [INFO] [timer.py:215:stop] epoch=0/micro_step=800/global_step=800, RunningAvgSamplesPerSec=50.80810776358231, CurrSamplesPerSec=50.14485840245087, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:41:26,695] [INFO] [logging.py:96:log_dist] [Rank 0] step=800, skipped=11, lr=[4.4656727587773506e-07, 4.4656727587773506e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 799|ppo_ep: 1|act_loss: -0.0024509429931640625|cri_loss: 0.00016570091247558594|unsuper_loss: 0.0 average reward score: -4.5 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.14%) |Training time=0.81s (31.94%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 800|ppo_ep: 1|act_loss: 0.00510406494140625|cri_loss: 0.0002956390380859375|unsuper_loss: 0.0 average reward score: -4.1640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.80s (31.73%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 801|ppo_ep: 1|act_loss: 0.01212310791015625|cri_loss: 0.0028591156005859375|unsuper_loss: 0.0 average reward score: -3.73828125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.42%) |Training time=0.80s (31.63%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.58 epoch: 0|step: 802|ppo_ep: 1|act_loss: 0.00460052490234375|cri_loss: 0.0011653900146484375|unsuper_loss: 0.0 average reward score: -5.00390625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.29%) |Training time=0.80s (31.73%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 803|ppo_ep: 1|act_loss: -0.04107666015625|cri_loss: 0.019622802734375|unsuper_loss: 0.0 average reward score: -5.35546875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.81s (31.71%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 804|ppo_ep: 1|act_loss: -0.00540924072265625|cri_loss: 0.001010894775390625|unsuper_loss: 0.0 average reward score: -4.6328125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.21%) |Training time=0.81s (31.83%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.58 epoch: 0|step: 805|ppo_ep: 1|act_loss: 0.0260162353515625|cri_loss: 0.003101348876953125|unsuper_loss: 0.0 average reward score: -4.1640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.28%) |Training time=0.81s (31.78%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 806|ppo_ep: 1|act_loss: -0.0008225440979003906|cri_loss: 0.003002166748046875|unsuper_loss: 0.0 average reward score: -3.19921875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.81s (31.73%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 807|ppo_ep: 1|act_loss: -0.00559234619140625|cri_loss: 0.0009756088256835938|unsuper_loss: 0.0 average reward score: -4.20703125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.80s (31.69%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 808|ppo_ep: 1|act_loss: -0.00569915771484375|cri_loss: 0.0011138916015625|unsuper_loss: 0.0 average reward score: -4.203125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.44%) |Training time=0.81s (31.60%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58 [2023-07-01 08:41:51,762] [INFO] [logging.py:96:log_dist] [Rank 0] step=810, skipped=11, lr=[7.633366087885105e-07, 7.633366087885105e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:41:51,941] [INFO] [timer.py:215:stop] epoch=0/micro_step=810/global_step=810, RunningAvgSamplesPerSec=50.8059958537766, CurrSamplesPerSec=50.650301710637045, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:41:52,105] [INFO] [logging.py:96:log_dist] [Rank 0] step=810, skipped=11, lr=[3.9551119626347693e-07, 3.9551119626347693e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 809|ppo_ep: 1|act_loss: 0.01210784912109375|cri_loss: 0.002460479736328125|unsuper_loss: 0.0 average reward score: -4.80859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.81s (31.72%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 810|ppo_ep: 1|act_loss: 0.00421142578125|cri_loss: 0.0005979537963867188|unsuper_loss: 0.0 average reward score: -6.25390625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.58%) |Training time=0.80s (31.49%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 811|ppo_ep: 1|act_loss: -0.00463104248046875|cri_loss: 0.0008816719055175781|unsuper_loss: 0.0 average reward score: -4.390625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.47s (57.93%) |Training time=0.84s (33.15%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 812|ppo_ep: 1|act_loss: -0.0014476776123046875|cri_loss: 0.002285003662109375|unsuper_loss: 0.0 average reward score: -5.42578125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.80s (31.64%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 813|ppo_ep: 1|act_loss: -0.0010023117065429688|cri_loss: 0.0009264945983886719|unsuper_loss: 0.0 average reward score: -3.80859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.25%) |Training time=0.81s (31.79%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 814|ppo_ep: 1|act_loss: 0.00547027587890625|cri_loss: 0.0005512237548828125|unsuper_loss: 0.0 average reward score: -2.65234375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.80s (31.59%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 815|ppo_ep: 1|act_loss: -0.0011491775512695312|cri_loss: 0.0001633167266845703|unsuper_loss: 0.0 average reward score: -4.66796875 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.32%) |Training time=0.80s (31.71%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 816|ppo_ep: 1|act_loss: 0.00296783447265625|cri_loss: 0.00028014183044433594|unsuper_loss: 0.0 average reward score: -4.8984375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.23%) |Training time=0.81s (31.81%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 817|ppo_ep: 1|act_loss: -0.00771331787109375|cri_loss: 0.0005936622619628906|unsuper_loss: 0.0 average reward score: -3.818359375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.96%) |Training time=0.82s (32.10%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 818|ppo_ep: 1|act_loss: 0.01326751708984375|cri_loss: 0.002292633056640625|unsuper_loss: 0.0 average reward score: -4.2421875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.88%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 [2023-07-01 08:42:17,131] [INFO] [logging.py:96:log_dist] [Rank 0] step=820, skipped=11, lr=[6.702942768241414e-07, 6.702942768241414e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:42:17,313] [INFO] [timer.py:215:stop] epoch=0/micro_step=820/global_step=820, RunningAvgSamplesPerSec=50.800591528592534, CurrSamplesPerSec=50.43989200824971, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:42:17,478] [INFO] [logging.py:96:log_dist] [Rank 0] step=820, skipped=11, lr=[3.473027341057728e-07, 3.473027341057728e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 819|ppo_ep: 1|act_loss: -0.0131378173828125|cri_loss: 0.0010347366333007812|unsuper_loss: 0.0 average reward score: -4.12109375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.22%) |Training time=0.81s (31.85%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 [2023-07-01 08:42:20,008] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384 epoch: 0|step: 820|ppo_ep: 1|act_loss: -0.0321044921875|cri_loss: 0.01514434814453125|unsuper_loss: 0.0 average reward score: -4.50390625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.40%) |Training time=0.81s (32.41%) |Others=0.18 (7.18%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.58 epoch: 0|step: 821|ppo_ep: 1|act_loss: -0.01032257080078125|cri_loss: 0.006351470947265625|unsuper_loss: 0.0 average reward score: -3.16796875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.45s (57.05%) |Training time=0.86s (33.95%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58 epoch: 0|step: 822|ppo_ep: 1|act_loss: 0.002361297607421875|cri_loss: 0.0009703636169433594|unsuper_loss: 0.0 average reward score: -5.15234375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.93%) |Training time=0.82s (32.09%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 823|ppo_ep: 1|act_loss: -0.0052490234375|cri_loss: 0.001323699951171875|unsuper_loss: 0.0 average reward score: -4.75 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.87%) |Training time=0.82s (32.18%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 824|ppo_ep: 1|act_loss: -0.0278167724609375|cri_loss: 0.0184173583984375|unsuper_loss: 0.0 average reward score: -5.17578125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.57%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 [2023-07-01 08:42:32,324] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, but hysteresis is 2. Reducing hysteresis to 1 epoch: 0|step: 825|ppo_ep: 1|act_loss: -0.0045013427734375|cri_loss: 0.00031685829162597656|unsuper_loss: 0.0 average reward score: -5.07421875 ------------------------------------------------------------------------------------- |E2E latency=2.35s |Gather latency=0.00s (0.00%) |Generate time=1.51s (64.15%) |Training time=0.62s (26.18%) |Others=0.23 (9.68%)|CurSamplesPerSec=13.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 826|ppo_ep: 1|act_loss: 0.01226806640625|cri_loss: 0.000789642333984375|unsuper_loss: 0.0 average reward score: -4.375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.11%) |Training time=0.81s (31.97%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 827|ppo_ep: 1|act_loss: -0.007415771484375|cri_loss: 0.0002620220184326172|unsuper_loss: 0.0 average reward score: -3.84375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.93%) |Training time=0.82s (32.10%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 828|ppo_ep: 1|act_loss: 0.002338409423828125|cri_loss: 0.0010023117065429688|unsuper_loss: 0.0 average reward score: -4.01953125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.00%) |Training time=0.82s (32.12%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 [2023-07-01 08:42:42,316] [INFO] [logging.py:96:log_dist] [Rank 0] step=830, skipped=12, lr=[5.913593843626703e-07, 5.913593843626703e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:42:42,494] [INFO] [timer.py:215:stop] epoch=0/micro_step=830/global_step=830, RunningAvgSamplesPerSec=50.80620896325542, CurrSamplesPerSec=49.97083979759643, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:42:42,658] [INFO] [logging.py:96:log_dist] [Rank 0] step=830, skipped=12, lr=[3.064038260946478e-07, 3.064038260946478e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 829|ppo_ep: 1|act_loss: 0.0083770751953125|cri_loss: 0.0008697509765625|unsuper_loss: 0.0 average reward score: -4.5234375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.05%) |Training time=0.81s (32.08%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 830|ppo_ep: 1|act_loss: 0.012237548828125|cri_loss: 0.0024394989013671875|unsuper_loss: 0.0 average reward score: -4.0625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (31.96%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 831|ppo_ep: 1|act_loss: 0.0014257431030273438|cri_loss: 0.00044035911560058594|unsuper_loss: 0.0 average reward score: -4.24609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.47s (57.77%) |Training time=0.84s (33.25%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 832|ppo_ep: 1|act_loss: 0.003421783447265625|cri_loss: 0.0006437301635742188|unsuper_loss: 0.0 average reward score: -5.921875 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.23%) |Training time=0.81s (31.82%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 833|ppo_ep: 1|act_loss: 0.005580902099609375|cri_loss: 0.0007596015930175781|unsuper_loss: 0.0 average reward score: -4.3359375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.22%) |Training time=0.81s (31.91%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 834|ppo_ep: 1|act_loss: -0.01206207275390625|cri_loss: 0.00634765625|unsuper_loss: 0.0 average reward score: -3.984375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.82%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 835|ppo_ep: 1|act_loss: 0.002063751220703125|cri_loss: 0.0020275115966796875|unsuper_loss: 0.0 average reward score: -4.0546875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.95%) |Training time=0.82s (32.11%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 836|ppo_ep: 1|act_loss: 0.0113983154296875|cri_loss: 0.0008807182312011719|unsuper_loss: 0.0 average reward score: -3.90625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.90%) |Training time=0.82s (32.14%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 837|ppo_ep: 1|act_loss: -0.002044677734375|cri_loss: 0.00042629241943359375|unsuper_loss: 0.0 average reward score: -3.69921875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.21%) |Training time=0.81s (31.86%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 838|ppo_ep: 1|act_loss: -0.00036835670471191406|cri_loss: 0.0003173351287841797|unsuper_loss: 0.0 average reward score: -5.38671875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.80s (31.70%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 [2023-07-01 08:43:07,699] [INFO] [logging.py:96:log_dist] [Rank 0] step=840, skipped=12, lr=[5.090998282460625e-07, 5.090998282460625e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:43:07,882] [INFO] [timer.py:215:stop] epoch=0/micro_step=840/global_step=840, RunningAvgSamplesPerSec=50.79660626428774, CurrSamplesPerSec=50.462269584820305, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:43:08,047] [INFO] [logging.py:96:log_dist] [Rank 0] step=840, skipped=12, lr=[2.6378229442801163e-07, 2.6378229442801163e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 839|ppo_ep: 1|act_loss: 0.0015211105346679688|cri_loss: 0.0004911422729492188|unsuper_loss: 0.0 average reward score: -4.1875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.21%) |Training time=0.81s (31.80%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 840|ppo_ep: 1|act_loss: 0.018218994140625|cri_loss: 0.0022373199462890625|unsuper_loss: 0.0 average reward score: -3.208984375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.17%) |Training time=0.81s (31.89%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 841|ppo_ep: 1|act_loss: -0.0058441162109375|cri_loss: 0.000354766845703125|unsuper_loss: 0.0 average reward score: -2.947265625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.81%) |Training time=0.82s (32.23%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58 epoch: 0|step: 842|ppo_ep: 1|act_loss: -0.020782470703125|cri_loss: 0.0105743408203125|unsuper_loss: 0.0 average reward score: -4.51953125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.49s (58.47%) |Training time=0.83s (32.61%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 843|ppo_ep: 1|act_loss: -0.0130767822265625|cri_loss: 0.0010137557983398438|unsuper_loss: 0.0 average reward score: -5.4375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.14%) |Training time=0.81s (31.94%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 844|ppo_ep: 1|act_loss: -0.0033321380615234375|cri_loss: 0.0001571178436279297|unsuper_loss: 0.0 average reward score: -3.626953125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.35%) |Training time=0.80s (31.70%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 845|ppo_ep: 1|act_loss: 0.0133819580078125|cri_loss: 0.0014791488647460938|unsuper_loss: 0.0 average reward score: -4.5390625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.29%) |Training time=0.81s (31.77%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 846|ppo_ep: 1|act_loss: 0.007793426513671875|cri_loss: 0.00146484375|unsuper_loss: 0.0 average reward score: -3.703125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.26%) |Training time=0.81s (31.84%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 847|ppo_ep: 1|act_loss: 0.00125885009765625|cri_loss: 0.0009150505065917969|unsuper_loss: 0.0 average reward score: -3.9921875 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.28%) |Training time=0.81s (31.79%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 848|ppo_ep: 1|act_loss: 0.0070648193359375|cri_loss: 0.0006608963012695312|unsuper_loss: 0.0 average reward score: -5.9609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.11%) |Training time=0.81s (31.94%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 [2023-07-01 08:43:33,094] [INFO] [logging.py:96:log_dist] [Rank 0] step=850, skipped=12, lr=[4.326801856742557e-07, 4.326801856742557e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:43:33,272] [INFO] [timer.py:215:stop] epoch=0/micro_step=850/global_step=850, RunningAvgSamplesPerSec=50.789375356614926, CurrSamplesPerSec=50.198571586723595, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:43:33,438] [INFO] [logging.py:96:log_dist] [Rank 0] step=850, skipped=12, lr=[2.241866247016869e-07, 2.241866247016869e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 849|ppo_ep: 1|act_loss: -0.002407073974609375|cri_loss: 0.00022208690643310547|unsuper_loss: 0.0 average reward score: -4.05859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.06%) |Training time=0.81s (32.00%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 850|ppo_ep: 1|act_loss: 0.01123809814453125|cri_loss: 0.0011377334594726562|unsuper_loss: 0.0 average reward score: -3.6015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.13%) |Training time=0.81s (31.91%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 851|ppo_ep: 1|act_loss: -0.01062774658203125|cri_loss: 0.0006108283996582031|unsuper_loss: 0.0 average reward score: -4.453125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.80s (31.73%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 852|ppo_ep: 1|act_loss: -0.006153106689453125|cri_loss: 0.0008974075317382812|unsuper_loss: 0.0 average reward score: -3.8984375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.76%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 853|ppo_ep: 1|act_loss: 0.0023746490478515625|cri_loss: 0.00080108642578125|unsuper_loss: 0.0 average reward score: -4.0625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.84%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 854|ppo_ep: 1|act_loss: 0.005146026611328125|cri_loss: 0.0017766952514648438|unsuper_loss: 0.0 average reward score: -7.328125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.19%) |Training time=0.81s (31.87%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 855|ppo_ep: 1|act_loss: -0.005863189697265625|cri_loss: 0.00016427040100097656|unsuper_loss: 0.0 average reward score: -5.59375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.21%) |Training time=0.81s (31.81%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 856|ppo_ep: 1|act_loss: 0.0037174224853515625|cri_loss: 0.0004315376281738281|unsuper_loss: 0.0 average reward score: -5.390625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.07%) |Training time=0.81s (31.94%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 857|ppo_ep: 1|act_loss: 0.009246826171875|cri_loss: 0.0009007453918457031|unsuper_loss: 0.0 average reward score: -2.759765625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (31.94%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 858|ppo_ep: 1|act_loss: 0.024627685546875|cri_loss: 0.0086517333984375|unsuper_loss: 0.0 average reward score: -5.80859375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.87%) |Training time=0.82s (32.17%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.58 [2023-07-01 08:43:58,505] [INFO] [logging.py:96:log_dist] [Rank 0] step=860, skipped=12, lr=[3.6220386128776603e-07, 3.6220386128776603e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:43:58,686] [INFO] [timer.py:215:stop] epoch=0/micro_step=860/global_step=860, RunningAvgSamplesPerSec=50.78290749061954, CurrSamplesPerSec=49.63691729629178, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:43:58,852] [INFO] [logging.py:96:log_dist] [Rank 0] step=860, skipped=12, lr=[1.876703944496197e-07, 1.876703944496197e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 859|ppo_ep: 1|act_loss: 0.017669677734375|cri_loss: 0.00274658203125|unsuper_loss: 0.0 average reward score: -5.40234375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.92%) |Training time=0.82s (32.13%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58 epoch: 0|step: 860|ppo_ep: 1|act_loss: -0.0022125244140625|cri_loss: 0.0007147789001464844|unsuper_loss: 0.0 average reward score: -4.6796875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.95%) |Training time=0.82s (32.16%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 861|ppo_ep: 1|act_loss: 0.0033016204833984375|cri_loss: 0.0009131431579589844|unsuper_loss: 0.0 average reward score: -5.96484375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.38%) |Training time=0.80s (31.66%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.58 epoch: 0|step: 862|ppo_ep: 1|act_loss: 0.006336212158203125|cri_loss: 0.00039076805114746094|unsuper_loss: 0.0 average reward score: -4.375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.36%) |Training time=0.80s (31.65%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58 epoch: 0|step: 863|ppo_ep: 1|act_loss: -0.016876220703125|cri_loss: 0.0005512237548828125|unsuper_loss: 0.0 average reward score: -3.08984375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.10%) |Training time=0.81s (31.93%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 864|ppo_ep: 1|act_loss: 0.006755828857421875|cri_loss: 0.00051116943359375|unsuper_loss: 0.0 average reward score: -5.0625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.99%) |Training time=0.81s (32.03%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 865|ppo_ep: 1|act_loss: 0.0220489501953125|cri_loss: 0.004367828369140625|unsuper_loss: 0.0 average reward score: -5.58203125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.25%) |Training time=0.81s (31.82%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 866|ppo_ep: 1|act_loss: -0.01568603515625|cri_loss: 0.0013265609741210938|unsuper_loss: 0.0 average reward score: -4.46484375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.47s (57.90%) |Training time=0.84s (33.18%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 867|ppo_ep: 1|act_loss: 0.020904541015625|cri_loss: 0.00174713134765625|unsuper_loss: 0.0 average reward score: -4.6640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.76%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 868|ppo_ep: 1|act_loss: 0.005001068115234375|cri_loss: 0.0016937255859375|unsuper_loss: 0.0 average reward score: -4.2734375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.06%) |Training time=0.81s (31.95%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 [2023-07-01 08:44:23,887] [INFO] [logging.py:96:log_dist] [Rank 0] step=870, skipped=12, lr=[2.9776621772821655e-07, 2.9776621772821655e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:44:24,070] [INFO] [timer.py:215:stop] epoch=0/micro_step=870/global_step=870, RunningAvgSamplesPerSec=50.7752857768912, CurrSamplesPerSec=50.22530220075268, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:44:24,236] [INFO] [logging.py:96:log_dist] [Rank 0] step=870, skipped=12, lr=[1.542830143669516e-07, 1.542830143669516e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 869|ppo_ep: 1|act_loss: 0.0167388916015625|cri_loss: 0.002178192138671875|unsuper_loss: 0.0 average reward score: -4.32421875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.11%) |Training time=0.81s (31.93%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 870|ppo_ep: 1|act_loss: 0.020416259765625|cri_loss: 0.0031757354736328125|unsuper_loss: 0.0 average reward score: -6.64453125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.89%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 epoch: 0|step: 871|ppo_ep: 1|act_loss: -0.001644134521484375|cri_loss: 0.0006380081176757812|unsuper_loss: 0.0 average reward score: -3.47265625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.05%) |Training time=0.82s (32.02%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58 epoch: 0|step: 872|ppo_ep: 1|act_loss: -0.004741668701171875|cri_loss: 0.00035190582275390625|unsuper_loss: 0.0 average reward score: -3.298828125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.90%) |Training time=0.82s (32.11%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.58 epoch: 0|step: 873|ppo_ep: 1|act_loss: 0.012420654296875|cri_loss: 0.002422332763671875|unsuper_loss: 0.0 average reward score: -3.5234375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.00%) |Training time=0.81s (31.99%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 874|ppo_ep: 1|act_loss: 0.0034027099609375|cri_loss: 0.0008091926574707031|unsuper_loss: 0.0 average reward score: -3.046875 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.29%) |Training time=0.80s (31.75%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58 epoch: 0|step: 875|ppo_ep: 1|act_loss: 0.0013265609741210938|cri_loss: 0.0019321441650390625|unsuper_loss: 0.0 average reward score: -4.76953125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.17%) |Training time=0.81s (31.90%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 876|ppo_ep: 1|act_loss: -0.01454925537109375|cri_loss: 0.0009756088256835938|unsuper_loss: 0.0 average reward score: -3.947265625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.28%) |Training time=0.81s (31.74%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 877|ppo_ep: 1|act_loss: 0.01018524169921875|cri_loss: 0.0010223388671875|unsuper_loss: 0.0 average reward score: -3.416015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.91%) |Training time=0.82s (32.14%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 878|ppo_ep: 1|act_loss: 0.0021953582763671875|cri_loss: 0.0004718303680419922|unsuper_loss: 0.0 average reward score: -6.7890625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.06%) |Training time=0.81s (31.96%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58 [2023-07-01 08:44:49,297] [INFO] [logging.py:96:log_dist] [Rank 0] step=880, skipped=12, lr=[2.3945444660163493e-07, 2.3945444660163493e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:44:49,475] [INFO] [timer.py:215:stop] epoch=0/micro_step=880/global_step=880, RunningAvgSamplesPerSec=50.769397429373015, CurrSamplesPerSec=50.89804140475208, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:44:49,641] [INFO] [logging.py:96:log_dist] [Rank 0] step=880, skipped=12, lr=[1.240696614516243e-07, 1.240696614516243e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 879|ppo_ep: 1|act_loss: 0.00214385986328125|cri_loss: 0.0010223388671875|unsuper_loss: 0.0 average reward score: -4.98046875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.80s (31.70%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 880|ppo_ep: 1|act_loss: 0.00743865966796875|cri_loss: 0.0005936622619628906|unsuper_loss: 0.0 average reward score: -4.19140625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.50%) |Training time=0.80s (31.57%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.58 epoch: 0|step: 881|ppo_ep: 1|act_loss: -0.00836181640625|cri_loss: 0.009185791015625|unsuper_loss: 0.0 average reward score: -2.8359375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.26%) |Training time=0.81s (31.78%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 882|ppo_ep: 1|act_loss: -0.00812530517578125|cri_loss: 0.0003821849822998047|unsuper_loss: 0.0 average reward score: -4.09765625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.88%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 883|ppo_ep: 1|act_loss: -0.0146636962890625|cri_loss: 0.002384185791015625|unsuper_loss: 0.0 average reward score: -4.5546875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.71%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 884|ppo_ep: 1|act_loss: 0.0037288665771484375|cri_loss: 0.0004286766052246094|unsuper_loss: 0.0 average reward score: -7.01171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.12%) |Training time=0.81s (31.97%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 885|ppo_ep: 1|act_loss: -0.006732940673828125|cri_loss: 0.0052947998046875|unsuper_loss: 0.0 average reward score: -4.6640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.34%) |Training time=0.80s (31.68%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 886|ppo_ep: 1|act_loss: -0.0012540817260742188|cri_loss: 0.0004513263702392578|unsuper_loss: 0.0 average reward score: -5.1171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.14%) |Training time=0.81s (31.84%) |Others=0.23 (9.03%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 887|ppo_ep: 1|act_loss: -0.003856658935546875|cri_loss: 0.0034160614013671875|unsuper_loss: 0.0 average reward score: -3.458984375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.96%) |Training time=0.81s (31.99%) |Others=0.23 (9.05%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 888|ppo_ep: 1|act_loss: 0.0005216598510742188|cri_loss: 9.97781753540039e-05|unsuper_loss: 0.0 average reward score: -6.09375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.97%) |Training time=0.82s (32.08%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 [2023-07-01 08:45:14,697] [INFO] [logging.py:96:log_dist] [Rank 0] step=890, skipped=12, lr=[1.8734745049808622e-07, 1.8734745049808622e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:45:14,879] [INFO] [timer.py:215:stop] epoch=0/micro_step=890/global_step=890, RunningAvgSamplesPerSec=50.76462231494663, CurrSamplesPerSec=49.69917014829651, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:45:15,044] [INFO] [logging.py:96:log_dist] [Rank 0] step=890, skipped=12, lr=[9.707121787465607e-08, 9.707121787465607e-08], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 889|ppo_ep: 1|act_loss: 0.0171966552734375|cri_loss: 0.0038604736328125|unsuper_loss: 0.0 average reward score: -5.51171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.96%) |Training time=0.82s (32.14%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 890|ppo_ep: 1|act_loss: 0.039337158203125|cri_loss: 0.008758544921875|unsuper_loss: 0.0 average reward score: -4.7265625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.24%) |Training time=0.81s (31.79%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58 epoch: 0|step: 891|ppo_ep: 1|act_loss: 0.002552032470703125|cri_loss: 0.00045037269592285156|unsuper_loss: 0.0 average reward score: -4.2109375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.01%) |Training time=0.81s (32.02%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58 epoch: 0|step: 892|ppo_ep: 1|act_loss: 0.007457733154296875|cri_loss: 0.0010309219360351562|unsuper_loss: 0.0 average reward score: -5.6171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.75%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 893|ppo_ep: 1|act_loss: 0.007785797119140625|cri_loss: 0.0020313262939453125|unsuper_loss: 0.0 average reward score: -4.9453125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.08%) |Training time=0.81s (32.00%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58 epoch: 0|step: 894|ppo_ep: 1|act_loss: 0.03369140625|cri_loss: 0.005992889404296875|unsuper_loss: 0.0 average reward score: -3.333984375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.65%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58 epoch: 0|step: 895|ppo_ep: 1|act_loss: 0.0099334716796875|cri_loss: 0.0010776519775390625|unsuper_loss: 0.0 average reward score: -5.23828125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.13%) |Training time=0.81s (31.93%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58 epoch: 0|step: 896|ppo_ep: 1|act_loss: 0.032257080078125|cri_loss: 0.0107879638671875|unsuper_loss: 0.0 average reward score: -3.556640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.26%) |Training time=0.81s (31.77%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58 epoch: 0|step: 897|ppo_ep: 1|act_loss: 0.004215240478515625|cri_loss: 0.0020122528076171875|unsuper_loss: 0.0 average reward score: -6.625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.80s (31.68%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59 epoch: 0|step: 898|ppo_ep: 1|act_loss: -0.006298065185546875|cri_loss: 0.0010805130004882812|unsuper_loss: 0.0 average reward score: -4.6015625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.41%) |Training time=0.80s (31.71%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.59 [2023-07-01 08:45:40,082] [INFO] [logging.py:96:log_dist] [Rank 0] step=900, skipped=12, lr=[1.4151573622732538e-07, 1.4151573622732538e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:45:40,260] [INFO] [timer.py:215:stop] epoch=0/micro_step=900/global_step=900, RunningAvgSamplesPerSec=50.76217522860796, CurrSamplesPerSec=50.62653490921244, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:45:40,425] [INFO] [logging.py:96:log_dist] [Rank 0] step=900, skipped=12, lr=[7.332421566182663e-08, 7.332421566182663e-08], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 899|ppo_ep: 1|act_loss: 0.004848480224609375|cri_loss: 0.00037550926208496094|unsuper_loss: 0.0 average reward score: -4.27734375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.29%) |Training time=0.81s (31.79%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.59 epoch: 0|step: 900|ppo_ep: 1|act_loss: 0.0011148452758789062|cri_loss: 0.00035572052001953125|unsuper_loss: 0.0 average reward score: -5.921875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.96%) |Training time=0.82s (32.07%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.59 epoch: 0|step: 901|ppo_ep: 1|act_loss: -0.00963592529296875|cri_loss: 0.0016508102416992188|unsuper_loss: 0.0 average reward score: -4.36328125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.79%) |Training time=0.82s (32.22%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.59 epoch: 0|step: 902|ppo_ep: 1|act_loss: -1.7344951629638672e-05|cri_loss: 0.0008778572082519531|unsuper_loss: 0.0 average reward score: -4.42578125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.92%) |Training time=0.82s (32.18%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.59 epoch: 0|step: 903|ppo_ep: 1|act_loss: -0.006374359130859375|cri_loss: 0.0014057159423828125|unsuper_loss: 0.0 average reward score: -3.521484375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.17%) |Training time=0.81s (31.88%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.59 epoch: 0|step: 904|ppo_ep: 1|act_loss: 0.0097808837890625|cri_loss: 0.0011816024780273438|unsuper_loss: 0.0 average reward score: -4.6875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.07%) |Training time=0.81s (31.93%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.59 epoch: 0|step: 905|ppo_ep: 1|act_loss: 0.004337310791015625|cri_loss: 0.0012025833129882812|unsuper_loss: 0.0 average reward score: -5.5078125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.95%) |Training time=0.82s (32.10%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.59 epoch: 0|step: 906|ppo_ep: 1|act_loss: -0.0035610198974609375|cri_loss: 0.0003063678741455078|unsuper_loss: 0.0 average reward score: -3.875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.88%) |Training time=0.82s (32.17%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.59 epoch: 0|step: 907|ppo_ep: 1|act_loss: 0.001010894775390625|cri_loss: 0.00040459632873535156|unsuper_loss: 0.0 average reward score: -5.0390625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.04%) |Training time=0.81s (32.05%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.59 epoch: 0|step: 908|ppo_ep: 1|act_loss: 0.03399658203125|cri_loss: 0.010284423828125|unsuper_loss: 0.0 average reward score: -3.48828125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.94%) |Training time=0.81s (32.07%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59 [2023-07-01 08:46:05,508] [INFO] [logging.py:96:log_dist] [Rank 0] step=910, skipped=12, lr=[1.0202131941489858e-07, 1.0202131941489858e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:46:05,687] [INFO] [timer.py:215:stop] epoch=0/micro_step=910/global_step=910, RunningAvgSamplesPerSec=50.75345093850162, CurrSamplesPerSec=50.96021034446754, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:46:05,852] [INFO] [logging.py:96:log_dist] [Rank 0] step=910, skipped=12, lr=[5.2860787261605485e-08, 5.2860787261605485e-08], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 909|ppo_ep: 1|act_loss: -0.00942230224609375|cri_loss: 0.0011739730834960938|unsuper_loss: 0.0 average reward score: -4.11328125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.60%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59 epoch: 0|step: 910|ppo_ep: 1|act_loss: 4.76837158203125e-05|cri_loss: 0.0003173351287841797|unsuper_loss: 0.0 average reward score: -4.015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.79%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59 epoch: 0|step: 911|ppo_ep: 1|act_loss: 0.0053863525390625|cri_loss: 0.000133514404296875|unsuper_loss: 0.0 average reward score: -4.31640625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.15%) |Training time=0.81s (31.97%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.59 epoch: 0|step: 912|ppo_ep: 1|act_loss: 0.002010345458984375|cri_loss: 0.0004036426544189453|unsuper_loss: 0.0 average reward score: -5.0234375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.23%) |Training time=0.81s (31.86%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.59 epoch: 0|step: 913|ppo_ep: 1|act_loss: 0.01419830322265625|cri_loss: 0.0026874542236328125|unsuper_loss: 0.0 average reward score: -6.05859375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.15%) |Training time=0.81s (31.93%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59 epoch: 0|step: 914|ppo_ep: 1|act_loss: -0.0244293212890625|cri_loss: 0.00307464599609375|unsuper_loss: 0.0 average reward score: -5.015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.11%) |Training time=0.81s (31.96%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.59 epoch: 0|step: 915|ppo_ep: 1|act_loss: 0.006687164306640625|cri_loss: 0.00048828125|unsuper_loss: 0.0 average reward score: -2.95703125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.99%) |Training time=0.82s (32.07%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.59 epoch: 0|step: 916|ppo_ep: 1|act_loss: 0.01415252685546875|cri_loss: 0.001434326171875|unsuper_loss: 0.0 average reward score: -3.84375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.15%) |Training time=0.81s (31.90%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59 epoch: 0|step: 917|ppo_ep: 1|act_loss: -0.004100799560546875|cri_loss: 0.0016088485717773438|unsuper_loss: 0.0 average reward score: -3.88671875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.13%) |Training time=0.81s (31.92%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.59 epoch: 0|step: 918|ppo_ep: 1|act_loss: -0.0036163330078125|cri_loss: 0.0006704330444335938|unsuper_loss: 0.0 average reward score: -4.64453125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.16%) |Training time=0.81s (31.83%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59 [2023-07-01 08:46:30,911] [INFO] [logging.py:96:log_dist] [Rank 0] step=920, skipped=12, lr=[6.891764058781328e-08, 6.891764058781328e-08], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:46:31,094] [INFO] [timer.py:215:stop] epoch=0/micro_step=920/global_step=920, RunningAvgSamplesPerSec=50.74695745130744, CurrSamplesPerSec=49.34110677850644, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:46:31,260] [INFO] [logging.py:96:log_dist] [Rank 0] step=920, skipped=12, lr=[3.5708622066224494e-08, 3.5708622066224494e-08], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 919|ppo_ep: 1|act_loss: 0.004119873046875|cri_loss: 0.0008497238159179688|unsuper_loss: 0.0 average reward score: -4.56640625 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.49s (58.68%) |Training time=0.82s (32.31%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.59 epoch: 0|step: 920|ppo_ep: 1|act_loss: 0.00536346435546875|cri_loss: 0.0008797645568847656|unsuper_loss: 0.0 average reward score: -3.58203125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.88%) |Training time=0.82s (32.14%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.59 epoch: 0|step: 921|ppo_ep: 1|act_loss: 0.001399993896484375|cri_loss: 0.001434326171875|unsuper_loss: 0.0 average reward score: -5.1015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.10%) |Training time=0.81s (31.95%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.59 epoch: 0|step: 922|ppo_ep: 1|act_loss: -0.00794219970703125|cri_loss: 0.0005536079406738281|unsuper_loss: 0.0 average reward score: -3.990234375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.28%) |Training time=0.81s (31.79%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59 epoch: 0|step: 923|ppo_ep: 1|act_loss: -0.0015001296997070312|cri_loss: 0.0007758140563964844|unsuper_loss: 0.0 average reward score: -3.361328125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.20%) |Training time=0.81s (31.84%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.59 epoch: 0|step: 924|ppo_ep: 1|act_loss: 0.006153106689453125|cri_loss: 0.00045418739318847656|unsuper_loss: 0.0 average reward score: -4.60546875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.80s (31.69%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59 epoch: 0|step: 925|ppo_ep: 1|act_loss: -0.0109405517578125|cri_loss: 0.001224517822265625|unsuper_loss: 0.0 average reward score: -4.4609375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.81%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.59 epoch: 0|step: 926|ppo_ep: 1|act_loss: -0.0303497314453125|cri_loss: 0.02313232421875|unsuper_loss: 0.0 average reward score: -3.689453125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.31%) |Training time=0.80s (31.76%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.59 epoch: 0|step: 927|ppo_ep: 1|act_loss: -0.004962921142578125|cri_loss: 0.000957489013671875|unsuper_loss: 0.0 average reward score: -4.859375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.30%) |Training time=0.80s (31.72%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.59 epoch: 0|step: 928|ppo_ep: 1|act_loss: -0.00824737548828125|cri_loss: 0.00037479400634765625|unsuper_loss: 0.0 average reward score: -5.828125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.15%) |Training time=0.81s (31.92%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59 [2023-07-01 08:46:56,300] [INFO] [logging.py:96:log_dist] [Rank 0] step=930, skipped=12, lr=[4.2249492863304246e-08, 4.2249492863304246e-08], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:46:56,478] [INFO] [timer.py:215:stop] epoch=0/micro_step=930/global_step=930, RunningAvgSamplesPerSec=50.74082119613501, CurrSamplesPerSec=48.05202967948661, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:46:56,644] [INFO] [logging.py:96:log_dist] [Rank 0] step=930, skipped=12, lr=[2.1890928944717228e-08, 2.1890928944717228e-08], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 929|ppo_ep: 1|act_loss: -0.0035228729248046875|cri_loss: 0.00042128562927246094|unsuper_loss: 0.0 average reward score: -5.26171875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.47s (57.99%) |Training time=0.84s (33.08%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59 epoch: 0|step: 930|ppo_ep: 1|act_loss: -0.007450103759765625|cri_loss: 0.0005202293395996094|unsuper_loss: 0.0 average reward score: -3.787109375 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.20%) |Training time=0.81s (31.89%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.59 epoch: 0|step: 931|ppo_ep: 1|act_loss: -0.0017108917236328125|cri_loss: 0.0004513263702392578|unsuper_loss: 0.0 average reward score: -4.46484375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.23%) |Training time=0.81s (31.87%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.59 epoch: 0|step: 932|ppo_ep: 1|act_loss: 0.00266265869140625|cri_loss: 0.0014085769653320312|unsuper_loss: 0.0 average reward score: -5.07421875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.98%) |Training time=0.82s (32.02%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.59 epoch: 0|step: 933|ppo_ep: 1|act_loss: -0.0191192626953125|cri_loss: 0.0049591064453125|unsuper_loss: 0.0 average reward score: -3.88671875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.97%) |Training time=0.82s (32.07%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.59 epoch: 0|step: 934|ppo_ep: 1|act_loss: 0.004848480224609375|cri_loss: 0.00034737586975097656|unsuper_loss: 0.0 average reward score: -4.71484375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (31.94%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.59 epoch: 0|step: 935|ppo_ep: 1|act_loss: 0.002735137939453125|cri_loss: 0.00042557716369628906|unsuper_loss: 0.0 average reward score: -6.6796875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.79%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59 epoch: 0|step: 936|ppo_ep: 1|act_loss: -0.00638580322265625|cri_loss: 0.0005917549133300781|unsuper_loss: 0.0 average reward score: -3.703125 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.92%) |Training time=0.82s (32.17%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.59 epoch: 0|step: 937|ppo_ep: 1|act_loss: -0.0035610198974609375|cri_loss: 0.000263214111328125|unsuper_loss: 0.0 average reward score: -5.13671875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.95%) |Training time=0.82s (32.11%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.59 epoch: 0|step: 938|ppo_ep: 1|act_loss: 0.04254150390625|cri_loss: 0.01024627685546875|unsuper_loss: 0.0 average reward score: -4.6015625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.94%) |Training time=0.82s (32.12%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.59 [2023-07-01 08:47:21,724] [INFO] [logging.py:96:log_dist] [Rank 0] step=940, skipped=12, lr=[2.205296133854851e-08, 2.205296133854851e-08], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:47:21,905] [INFO] [timer.py:215:stop] epoch=0/micro_step=940/global_step=940, RunningAvgSamplesPerSec=50.731099212683596, CurrSamplesPerSec=48.306048579676634, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:47:22,069] [INFO] [logging.py:96:log_dist] [Rank 0] step=940, skipped=12, lr=[1.142640483862617e-08, 1.142640483862617e-08], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 939|ppo_ep: 1|act_loss: -0.0245208740234375|cri_loss: 0.0036296844482421875|unsuper_loss: 0.0 average reward score: -6.44921875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.48s (58.17%) |Training time=0.84s (32.95%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.59 epoch: 0|step: 940|ppo_ep: 1|act_loss: 0.01275634765625|cri_loss: 0.0016412734985351562|unsuper_loss: 0.0 average reward score: -4.328125 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.54%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.59 epoch: 0|step: 941|ppo_ep: 1|act_loss: -0.0200958251953125|cri_loss: 0.002391815185546875|unsuper_loss: 0.0 average reward score: -6.6015625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.36%) |Training time=0.80s (31.68%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.59 epoch: 0|step: 942|ppo_ep: 1|act_loss: -0.0009608268737792969|cri_loss: 0.00025773048400878906|unsuper_loss: 0.0 average reward score: -3.0625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.31%) |Training time=0.81s (31.76%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.59 epoch: 0|step: 943|ppo_ep: 1|act_loss: -0.00675201416015625|cri_loss: 0.00072479248046875|unsuper_loss: 0.0 average reward score: -5.109375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (31.97%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.59 epoch: 0|step: 944|ppo_ep: 1|act_loss: 0.0015001296997070312|cri_loss: 0.0011434555053710938|unsuper_loss: 0.0 average reward score: -4.53515625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.21%) |Training time=0.81s (31.91%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.59 epoch: 0|step: 945|ppo_ep: 1|act_loss: -0.002216339111328125|cri_loss: 0.0010166168212890625|unsuper_loss: 0.0 average reward score: -4.390625 ------------------------------------------------------------------------------------- |E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.23%) |Training time=0.81s (31.87%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.59 epoch: 0|step: 946|ppo_ep: 1|act_loss: -0.0008983612060546875|cri_loss: 0.0008797645568847656|unsuper_loss: 0.0 average reward score: -5.04296875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.17%) |Training time=0.81s (31.88%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.59 epoch: 0|step: 947|ppo_ep: 1|act_loss: 0.004314422607421875|cri_loss: 0.00016379356384277344|unsuper_loss: 0.0 average reward score: -4.6953125 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.17%) |Training time=0.81s (31.89%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59 epoch: 0|step: 948|ppo_ep: 1|act_loss: -0.00235748291015625|cri_loss: 0.0015211105346679688|unsuper_loss: 0.0 average reward score: -5.07421875 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.98%) |Training time=0.82s (32.12%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.59 [2023-07-01 08:47:47,089] [INFO] [logging.py:96:log_dist] [Rank 0] step=950, skipped=12, lr=[8.355374263348676e-09, 8.355374263348676e-09], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:47:47,270] [INFO] [timer.py:215:stop] epoch=0/micro_step=950/global_step=950, RunningAvgSamplesPerSec=50.72813976244511, CurrSamplesPerSec=50.11948595103353, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:47:47,435] [INFO] [logging.py:96:log_dist] [Rank 0] step=950, skipped=12, lr=[4.329209462874961e-09, 4.329209462874961e-09], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 949|ppo_ep: 1|act_loss: -0.0032787322998046875|cri_loss: 0.0004875659942626953|unsuper_loss: 0.0 average reward score: -4.50390625 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.07%) |Training time=0.81s (32.01%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59 epoch: 0|step: 950|ppo_ep: 1|act_loss: -5.3882598876953125e-05|cri_loss: 0.0008039474487304688|unsuper_loss: 0.0 average reward score: -4.34375 ------------------------------------------------------------------------------------- |E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.48s (58.35%) |Training time=0.83s (32.66%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.59 epoch: 0|step: 951|ppo_ep: 1|act_loss: 0.00273895263671875|cri_loss: 0.0008635520935058594|unsuper_loss: 0.0 average reward score: -4.04296875 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.35%) |Training time=0.81s (31.70%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.59 epoch: 0|step: 952|ppo_ep: 1|act_loss: -0.03765869140625|cri_loss: 0.01186370849609375|unsuper_loss: 0.0 average reward score: -4.15234375 ------------------------------------------------------------------------------------- |E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.81s (31.62%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.59 epoch: 0|step: 953|ppo_ep: 1|act_loss: -0.0016870498657226562|cri_loss: 0.001628875732421875|unsuper_loss: 0.0 average reward score: -3.5234375 ------------------------------------------------------------------------------------- saving model ...saving model ...saving model ...saving model ...saving model ...saving model ...saving model ... saving model ... [2023-07-01 08:48:00,350] [INFO] [launch.py:346:main] Process 5958 exits successfully. [2023-07-01 08:48:00,350] [INFO] [launch.py:346:main] Process 5960 exits successfully. [2023-07-01 08:48:00,350] [INFO] [launch.py:346:main] Process 5956 exits successfully. [2023-07-01 08:48:00,350] [INFO] [launch.py:346:main] Process 5957 exits successfully. [2023-07-01 08:48:00,350] [INFO] [launch.py:346:main] Process 5962 exits successfully. [2023-07-01 08:48:01,351] [INFO] [launch.py:346:main] Process 5959 exits successfully. [2023-07-01 08:48:01,351] [INFO] [launch.py:346:main] Process 5961 exits successfully. [2023-07-01 08:48:09,360] [INFO] [launch.py:346:main] Process 5955 exits successfully.