Setting ds_accelerator to cuda (auto detect) [2023-07-01 08:03:29,655] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. Detected CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7: setting --include=localhost:0,1,2,3,4,5,6,7 [2023-07-01 08:03:29,723] [INFO] [runner.py:555:main] cmd = /home/zhaiyuanzhao/anaconda3/envs/RLHF/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=12346 --enable_each_rank_log=None main.py --data_path /home/zhaiyuanzhao/llm/dataset/rm-static/data --data_split 2,4,4 --actor_model_name_or_path /home/zhaiyuanzhao/code/DeepSpeedExamples-4datasets/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/output-1.3b --critic_model_name_or_path /home/zhaiyuanzhao/code/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/output-RM --num_padding_at_beginning 1 --per_device_train_batch_size 4 --per_device_mini_train_batch_size 4 --generation_batch_numbers 1 --ppo_epochs 1 --max_answer_seq_len 256 --max_prompt_seq_len 256 --actor_learning_rate 9.65e-6 --critic_learning_rate 5e-6 --num_train_epochs 1 --lr_scheduler_type cosine --gradient_accumulation_steps 1 --disable_actor_dropout --num_warmup_steps 100 --deepspeed --seed 1234 --enable_hybrid_engine --actor_zero_stage 2 --critic_zero_stage 2 --enable_ema --output_dir ./output-1.3b-RM_350m Setting ds_accelerator to cuda (auto detect) [2023-07-01 08:03:32,114] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]} [2023-07-01 08:03:32,114] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=8, node_rank=0 [2023-07-01 08:03:32,114] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}) [2023-07-01 08:03:32,114] [INFO] [launch.py:163:main] dist_world_size=8 [2023-07-01 08:03:32,114] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 Setting ds_accelerator to cuda (auto detect) Setting ds_accelerator to cuda (auto detect) Setting ds_accelerator to cuda (auto detect) Setting ds_accelerator to cuda (auto detect) Setting ds_accelerator to cuda (auto detect) Setting ds_accelerator to cuda (auto detect) Setting ds_accelerator to cuda (auto detect) Setting ds_accelerator to cuda (auto detect) [2023-07-01 08:03:58,950] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-01 08:03:58,950] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-01 08:03:59,205] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-01 08:03:59,205] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-01 08:03:59,236] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-01 08:03:59,236] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-01 08:03:59,402] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-01 08:03:59,402] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-01 08:03:59,452] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-01 08:03:59,452] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-01 08:03:59,498] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-01 08:03:59,498] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-01 08:03:59,515] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-01 08:03:59,515] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-01 08:03:59,515] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-01 08:03:59,515] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-01 08:03:59,515] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl Found cached dataset parquet (/home/zhaiyuanzhao/.cache/huggingface/datasets/parquet/default-d09980a08a1dbd7c/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) 0%| | 0/2 [00:00 [2023-07-01 08:05:28,624] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer [2023-07-01 08:05:28,624] [INFO] [stage_1_and_2.py:133:__init__] Reduce bucket size 500,000,000 [2023-07-01 08:05:28,624] [INFO] [stage_1_and_2.py:134:__init__] Allgather bucket size 500,000,000 [2023-07-01 08:05:28,624] [INFO] [stage_1_and_2.py:135:__init__] CPU Offload: False [2023-07-01 08:05:28,624] [INFO] [stage_1_and_2.py:136:__init__] Round robin gradient partitioning: False Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Emitting ninja build file /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117/utils/build.ninja... Building extension module utils... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module utils... Loading extension module utils...Loading extension module utils... Loading extension module utils... Loading extension module utils...Loading extension module utils... Loading extension module utils...Loading extension module utils... Time to load utils op: 0.5373373031616211 secondsTime to load utils op: 0.5420136451721191 secondsTime to load utils op: 0.542755126953125 secondsTime to load utils op: 0.5428259372711182 seconds Time to load utils op: 0.535088300704956 seconds Time to load utils op: 0.5429253578186035 seconds Time to load utils op: 0.5427916049957275 seconds Time to load utils op: 0.5428504943847656 seconds Rank: 1 partition count [8, 8] and sizes[(164401920, False), (67840, False)] Rank: 7 partition count [8, 8] and sizes[(164401920, False), (67840, False)] Rank: 2 partition count [8, 8] and sizes[(164401920, False), (67840, False)] Rank: 5 partition count [8, 8] and sizes[(164401920, False), (67840, False)] Rank: 0 partition count [8, 8] and sizes[(164401920, False), (67840, False)] Rank: 3 partition count [8, 8] and sizes[(164401920, False), (67840, False)] Rank: 6 partition count [8, 8] and sizes[(164401920, False), (67840, False)] Rank: 4 partition count [8, 8] and sizes[(164401920, False), (67840, False)] Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils...Time to load utils op: 0.0010123252868652344 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Time to load utils op: 0.001024007797241211 seconds No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.000904083251953125 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0009882450103759766 seconds No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0008978843688964844 seconds No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0014503002166748047 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0020041465759277344 seconds [2023-07-01 08:05:40,006] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states [2023-07-01 08:05:40,008] [INFO] [utils.py:786:see_memory_usage] MA 3.06 GB Max_MA 3.06 GB CA 3.07 GB Max_CA 3 GB [2023-07-01 08:05:40,008] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 36.9 GB, percent = 3.7% [2023-07-01 08:05:40,152] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states [2023-07-01 08:05:40,153] [INFO] [utils.py:786:see_memory_usage] MA 4.29 GB Max_MA 4.91 GB CA 4.91 GB Max_CA 5 GB [2023-07-01 08:05:40,153] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 36.9 GB, percent = 3.7% [2023-07-01 08:05:40,153] [INFO] [stage_1_and_2.py:489:__init__] optimizer state initialized [2023-07-01 08:05:40,294] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer [2023-07-01 08:05:40,294] [INFO] [utils.py:786:see_memory_usage] MA 4.29 GB Max_MA 4.29 GB CA 4.91 GB Max_CA 5 GB [2023-07-01 08:05:40,295] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 36.9 GB, percent = 3.7% [2023-07-01 08:05:40,296] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2023-07-01 08:05:40,297] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler [2023-07-01 08:05:40,297] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2023-07-01 08:05:40,297] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:05:40,297] [INFO] [config.py:960:print] DeepSpeedEngine configuration: [2023-07-01 08:05:40,297] [INFO] [config.py:964:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] amp_enabled .................. False [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] amp_params ................... False [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] bfloat16_enabled ............. False [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] checkpoint_parallel_write_pipeline False [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] checkpoint_tag_validation_enabled True [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] checkpoint_tag_validation_fail False [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] comms_config ................. [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] communication_data_type ...... None [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] curriculum_enabled_legacy .... False [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] curriculum_params_legacy ..... False [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] data_efficiency_enabled ...... False [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] dataloader_drop_last ......... False [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] disable_allgather ............ False [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] dump_state ................... False [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1} [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] eigenvalue_enabled ........... False [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] eigenvalue_gas_boundary_resolution 1 [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] eigenvalue_layer_name ........ bert.encoder.layer [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] eigenvalue_layer_num ......... 0 [2023-07-01 08:05:40,298] [INFO] [config.py:964:print] eigenvalue_max_iter .......... 100 [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] eigenvalue_stability ......... 1e-06 [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] eigenvalue_tol ............... 0.01 [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] eigenvalue_verbose ........... False [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] elasticity_enabled ........... False [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] fp16_auto_cast ............... False [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] fp16_enabled ................. True [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] fp16_master_weights_and_gradients False [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] global_rank .................. 0 [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] grad_accum_dtype ............. None [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] gradient_accumulation_steps .. 1 [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] gradient_clipping ............ 1.0 [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] gradient_predivide_factor .... 1.0 [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] hybrid_engine ................ enabled=True max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] initial_dynamic_scale ........ 65536 [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] load_universal_checkpoint .... False [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] loss_scale ................... 0 [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] memory_breakdown ............. False [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] mics_hierarchial_params_gather False [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] mics_shard_size .............. -1 [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] optimizer_legacy_fusion ...... False [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] optimizer_name ............... None [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] optimizer_params ............. None [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] pld_enabled .................. False [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] pld_params ................... False [2023-07-01 08:05:40,299] [INFO] [config.py:964:print] prescale_gradients ........... False [2023-07-01 08:05:40,300] [INFO] [config.py:964:print] scheduler_name ............... None [2023-07-01 08:05:40,300] [INFO] [config.py:964:print] scheduler_params ............. None [2023-07-01 08:05:40,300] [INFO] [config.py:964:print] sparse_attention ............. None [2023-07-01 08:05:40,300] [INFO] [config.py:964:print] sparse_gradients_enabled ..... False [2023-07-01 08:05:40,300] [INFO] [config.py:964:print] steps_per_print .............. 10 [2023-07-01 08:05:40,300] [INFO] [config.py:964:print] train_batch_size ............. 32 [2023-07-01 08:05:40,300] [INFO] [config.py:964:print] train_micro_batch_size_per_gpu 4 [2023-07-01 08:05:40,300] [INFO] [config.py:964:print] use_node_local_storage ....... False [2023-07-01 08:05:40,300] [INFO] [config.py:964:print] wall_clock_breakdown ......... False [2023-07-01 08:05:40,300] [INFO] [config.py:964:print] world_size ................... 8 [2023-07-01 08:05:40,300] [INFO] [config.py:964:print] zero_allow_untested_optimizer False [2023-07-01 08:05:40,300] [INFO] [config.py:964:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False [2023-07-01 08:05:40,300] [INFO] [config.py:964:print] zero_enabled ................. True [2023-07-01 08:05:40,300] [INFO] [config.py:964:print] zero_force_ds_cpu_optimizer .. True [2023-07-01 08:05:40,300] [INFO] [config.py:964:print] zero_optimization_stage ...... 2 [2023-07-01 08:05:40,300] [INFO] [config.py:950:print_user_config] json = { "train_batch_size": 32, "train_micro_batch_size_per_gpu": 4, "steps_per_print": 10, "zero_optimization": { "stage": 2, "offload_param": { "device": "none" }, "offload_optimizer": { "device": "none" }, "stage3_param_persistence_threshold": 1.000000e+04, "stage3_max_live_parameters": 3.000000e+07, "stage3_prefetch_bucket_size": 3.000000e+07, "memory_efficient_linear": false }, "fp16": { "enabled": true, "loss_scale_window": 100 }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false, "hybrid_engine": { "enabled": true, "max_out_tokens": 512, "inference_tp_size": 1, "release_inference_cache": false, "pin_parameters": true, "tp_gather_partition_size": 8 } } Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0010378360748291016 seconds huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combinationInstalled CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combinationInstalled CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117/transformer_inference/build.ninja... Building extension module transformer_inference... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module transformer_inference... Time to load transformer_inference op: 0.5358097553253174 seconds Loading extension module transformer_inference...Loading extension module transformer_inference... Loading extension module transformer_inference... Loading extension module transformer_inference... Loading extension module transformer_inference... Time to load transformer_inference op: 0.5629968643188477 seconds Time to load transformer_inference op: 0.5577342510223389 seconds Time to load transformer_inference op: 0.5631918907165527 seconds Time to load transformer_inference op: 0.5636200904846191 seconds Time to load transformer_inference op: 0.5582687854766846 seconds Loading extension module transformer_inference... Time to load transformer_inference op: 0.5590822696685791 seconds [2023-07-01 08:05:41,008] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 2048, 'intermediate_size': 8192, 'heads': 32, 'num_hidden_layers': -1, 'dtype': torch.float16, 'pre_layer_norm': True, 'norm_type': , 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': , 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 512, 'min_out_tokens': 512, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': True, 'transposed_mode': True} Loading extension module transformer_inference... Time to load transformer_inference op: 0.5609304904937744 seconds huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.05214333534240723 seconds Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Time to load transformer_inference op: 0.051145315170288086 secondsLoading extension module transformer_inference... Time to load transformer_inference op: 0.04891061782836914 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.052359580993652344 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.05263805389404297 seconds No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.054830074310302734 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.05211639404296875 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.056571245193481445 seconds huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.05420184135437012 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.05135607719421387 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.05627012252807617 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.05015301704406738 seconds No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.051249027252197266 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.0525355339050293 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.05635857582092285 seconds ******************[end] Initialized Actor Model [end] (duration: 49.96s)****************** *************************[start] Initializing Ref Model [start] ************************** Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.05696511268615723 seconds model loaded [2023-07-01 08:05:58,473] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.3, git-hash=unknown, git-branch=unknown model loaded model loaded model loaded model loaded model loaded model loaded model loaded [2023-07-01 08:06:08,838] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2023-07-01 08:06:08,840] [INFO] [config.py:960:print] DeepSpeedEngine configuration: [2023-07-01 08:06:08,841] [INFO] [config.py:964:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2023-07-01 08:06:08,841] [INFO] [config.py:964:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2023-07-01 08:06:08,841] [INFO] [config.py:964:print] amp_enabled .................. False [2023-07-01 08:06:08,841] [INFO] [config.py:964:print] amp_params ................... False [2023-07-01 08:06:08,841] [INFO] [config.py:964:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2023-07-01 08:06:08,841] [INFO] [config.py:964:print] bfloat16_enabled ............. False [2023-07-01 08:06:08,841] [INFO] [config.py:964:print] checkpoint_parallel_write_pipeline False [2023-07-01 08:06:08,841] [INFO] [config.py:964:print] checkpoint_tag_validation_enabled True [2023-07-01 08:06:08,841] [INFO] [config.py:964:print] checkpoint_tag_validation_fail False [2023-07-01 08:06:08,841] [INFO] [config.py:964:print] comms_config ................. [2023-07-01 08:06:08,841] [INFO] [config.py:964:print] communication_data_type ...... None [2023-07-01 08:06:08,841] [INFO] [config.py:964:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2023-07-01 08:06:08,841] [INFO] [config.py:964:print] curriculum_enabled_legacy .... False [2023-07-01 08:06:08,841] [INFO] [config.py:964:print] curriculum_params_legacy ..... False [2023-07-01 08:06:08,841] [INFO] [config.py:964:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] data_efficiency_enabled ...... False [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] dataloader_drop_last ......... False [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] disable_allgather ............ False [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] dump_state ................... False [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] dynamic_loss_scale_args ...... None [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] eigenvalue_enabled ........... False [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] eigenvalue_gas_boundary_resolution 1 [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] eigenvalue_layer_name ........ bert.encoder.layer [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] eigenvalue_layer_num ......... 0 [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] eigenvalue_max_iter .......... 100 [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] eigenvalue_stability ......... 1e-06 [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] eigenvalue_tol ............... 0.01 [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] eigenvalue_verbose ........... False [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] elasticity_enabled ........... False [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] fp16_auto_cast ............... False [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] fp16_enabled ................. True [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] fp16_master_weights_and_gradients False [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] global_rank .................. 0 [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] grad_accum_dtype ............. None [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] gradient_accumulation_steps .. 1 [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] gradient_clipping ............ 1.0 [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] gradient_predivide_factor .... 1.0 [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] initial_dynamic_scale ........ 65536 [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] load_universal_checkpoint .... False [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] loss_scale ................... 0 [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] memory_breakdown ............. False [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] mics_hierarchial_params_gather False [2023-07-01 08:06:08,842] [INFO] [config.py:964:print] mics_shard_size .............. -1 [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] optimizer_legacy_fusion ...... False [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] optimizer_name ............... None [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] optimizer_params ............. None [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] pld_enabled .................. False [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] pld_params ................... False [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] prescale_gradients ........... False [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] scheduler_name ............... None [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] scheduler_params ............. None [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] sparse_attention ............. None [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] sparse_gradients_enabled ..... False [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] steps_per_print .............. 10 [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] train_batch_size ............. 32 [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] train_micro_batch_size_per_gpu 4 [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] use_node_local_storage ....... False [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] wall_clock_breakdown ......... False [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] world_size ................... 8 [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] zero_allow_untested_optimizer False [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] zero_enabled ................. False [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] zero_force_ds_cpu_optimizer .. True [2023-07-01 08:06:08,843] [INFO] [config.py:964:print] zero_optimization_stage ...... 0 [2023-07-01 08:06:08,844] [INFO] [config.py:950:print_user_config] json = { "train_batch_size": 32, "train_micro_batch_size_per_gpu": 4, "steps_per_print": 10, "zero_optimization": { "stage": 0, "stage3_param_persistence_threshold": 1.000000e+04, "offload_param": { "device": "none" }, "memory_efficient_linear": false }, "fp16": { "enabled": true }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false } Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.001287221908569336 seconds *******************[end] Initialized Ref Model [end] (duration: 27.48s)******************* *************************[start] Initializing EMA Model [start] ************************** Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0016167163848876953 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0013256072998046875 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0018966197967529297 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0015711784362792969 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0012118816375732422 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.001928091049194336 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0018939971923828125 seconds model loaded [2023-07-01 08:06:24,937] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.3, git-hash=unknown, git-branch=unknown model loaded model loaded model loaded model loaded model loaded model loaded model loaded [2023-07-01 08:06:35,566] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2023-07-01 08:06:35,581] [INFO] [config.py:960:print] DeepSpeedEngine configuration: [2023-07-01 08:06:35,582] [INFO] [config.py:964:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2023-07-01 08:06:35,583] [INFO] [config.py:964:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2023-07-01 08:06:35,583] [INFO] [config.py:964:print] amp_enabled .................. False [2023-07-01 08:06:35,583] [INFO] [config.py:964:print] amp_params ................... False [2023-07-01 08:06:35,583] [INFO] [config.py:964:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2023-07-01 08:06:35,583] [INFO] [config.py:964:print] bfloat16_enabled ............. False [2023-07-01 08:06:35,583] [INFO] [config.py:964:print] checkpoint_parallel_write_pipeline False [2023-07-01 08:06:35,583] [INFO] [config.py:964:print] checkpoint_tag_validation_enabled True [2023-07-01 08:06:35,583] [INFO] [config.py:964:print] checkpoint_tag_validation_fail False [2023-07-01 08:06:35,583] [INFO] [config.py:964:print] comms_config ................. [2023-07-01 08:06:35,583] [INFO] [config.py:964:print] communication_data_type ...... None [2023-07-01 08:06:35,583] [INFO] [config.py:964:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2023-07-01 08:06:35,583] [INFO] [config.py:964:print] curriculum_enabled_legacy .... False [2023-07-01 08:06:35,583] [INFO] [config.py:964:print] curriculum_params_legacy ..... False [2023-07-01 08:06:35,583] [INFO] [config.py:964:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2023-07-01 08:06:35,583] [INFO] [config.py:964:print] data_efficiency_enabled ...... False [2023-07-01 08:06:35,583] [INFO] [config.py:964:print] dataloader_drop_last ......... False [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] disable_allgather ............ False [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] dump_state ................... False [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] dynamic_loss_scale_args ...... None [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] eigenvalue_enabled ........... False [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] eigenvalue_gas_boundary_resolution 1 [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] eigenvalue_layer_name ........ bert.encoder.layer [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] eigenvalue_layer_num ......... 0 [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] eigenvalue_max_iter .......... 100 [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] eigenvalue_stability ......... 1e-06 [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] eigenvalue_tol ............... 0.01 [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] eigenvalue_verbose ........... False [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] elasticity_enabled ........... False [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] fp16_auto_cast ............... False [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] fp16_enabled ................. True [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] fp16_master_weights_and_gradients False [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] global_rank .................. 0 [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] grad_accum_dtype ............. None [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] gradient_accumulation_steps .. 1 [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] gradient_clipping ............ 1.0 [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] gradient_predivide_factor .... 1.0 [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] initial_dynamic_scale ........ 65536 [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] load_universal_checkpoint .... False [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] loss_scale ................... 0 [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] memory_breakdown ............. False [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] mics_hierarchial_params_gather False [2023-07-01 08:06:35,584] [INFO] [config.py:964:print] mics_shard_size .............. -1 [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] optimizer_legacy_fusion ...... False [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] optimizer_name ............... None [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] optimizer_params ............. None [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] pld_enabled .................. False [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] pld_params ................... False [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] prescale_gradients ........... False [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] scheduler_name ............... None [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] scheduler_params ............. None [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] sparse_attention ............. None [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] sparse_gradients_enabled ..... False [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] steps_per_print .............. 10 [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] train_batch_size ............. 32 [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] train_micro_batch_size_per_gpu 4 [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] use_node_local_storage ....... False [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] wall_clock_breakdown ......... False [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] world_size ................... 8 [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] zero_allow_untested_optimizer False [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] zero_enabled ................. False [2023-07-01 08:06:35,585] [INFO] [config.py:964:print] zero_force_ds_cpu_optimizer .. True [2023-07-01 08:06:35,586] [INFO] [config.py:964:print] zero_optimization_stage ...... 0 [2023-07-01 08:06:35,586] [INFO] [config.py:950:print_user_config] json = { "train_batch_size": 32, "train_micro_batch_size_per_gpu": 4, "steps_per_print": 10, "zero_optimization": { "stage": 0, "stage3_param_persistence_threshold": 1.000000e+04, "offload_param": { "device": "none" }, "memory_efficient_linear": false }, "fp16": { "enabled": true }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false } Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.01014089584350586 seconds *******************[end] Initialized EMA Model [end] (duration: 26.75s)******************* ************************[start] Initializing Critic Model [start] ************************ Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0024161338806152344 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.001081705093383789 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0011301040649414062 seconds No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0009920597076416016 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.001585245132446289 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0011470317840576172 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0016450881958007812 seconds model loaded model loaded model loaded model loaded model loaded model loaded model loaded model loaded Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module fused_adam, skipping build step... Loading extension module fused_adam... Time to load fused_adam op: 0.015161752700805664 seconds huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module fused_adam, skipping build step...Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Loading extension module fused_adam... Time to load fused_adam op: 0.010671377182006836 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module fused_adam, skipping build step... Loading extension module fused_adam... Time to load fused_adam op: 0.003968000411987305 seconds [2023-07-01 08:06:52,465] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.3, git-hash=unknown, git-branch=unknown No modifications detected for re-loaded extension module fused_adam, skipping build step... Loading extension module fused_adam... Time to load fused_adam op: 0.023074865341186523 seconds Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module fused_adam, skipping build step... Loading extension module fused_adam... Time to load fused_adam op: 0.005225181579589844 seconds Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module fused_adam, skipping build step... Loading extension module fused_adam... Time to load fused_adam op: 0.023493051528930664 seconds Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module fused_adam, skipping build step... Loading extension module fused_adam... Time to load fused_adam op: 0.0221707820892334 seconds No modifications detected for re-loaded extension module fused_adam, skipping build step... Loading extension module fused_adam... Time to load fused_adam op: 0.07556438446044922 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.030516624450683594 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.005091190338134766 seconds No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.004791975021362305 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0013506412506103516 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Loading extension module utils... Time to load utils op: 0.002511739730834961 seconds No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0009350776672363281 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.001844644546508789 seconds [2023-07-01 08:07:01,983] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2023-07-01 08:07:01,985] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer [2023-07-01 08:07:01,985] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer [2023-07-01 08:07:02,001] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam [2023-07-01 08:07:02,001] [INFO] [utils.py:54:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2023-07-01 08:07:02,001] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer [2023-07-01 08:07:02,002] [INFO] [stage_1_and_2.py:133:__init__] Reduce bucket size 500,000,000 [2023-07-01 08:07:02,002] [INFO] [stage_1_and_2.py:134:__init__] Allgather bucket size 500,000,000 [2023-07-01 08:07:02,002] [INFO] [stage_1_and_2.py:135:__init__] CPU Offload: False [2023-07-01 08:07:02,002] [INFO] [stage_1_and_2.py:136:__init__] Round robin gradient partitioning: False Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.002156496047973633 seconds Rank: 0 partition count [8, 8] and sizes[(41365824, False), (33792, False)] Rank: 1 partition count [8, 8] and sizes[(41365824, False), (33792, False)] Rank: 3 partition count [8, 8] and sizes[(41365824, False), (33792, False)] Rank: 2 partition count [8, 8] and sizes[(41365824, False), (33792, False)] Rank: 7 partition count [8, 8] and sizes[(41365824, False), (33792, False)] Rank: 4 partition count [8, 8] and sizes[(41365824, False), (33792, False)] Rank: 5 partition count [8, 8] and sizes[(41365824, False), (33792, False)] Rank: 6 partition count [8, 8] and sizes[(41365824, False), (33792, False)] Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... No modifications detected for re-loaded extension module utils, skipping build step...Time to load utils op: 0.0013456344604492188 secondsLoading extension module utils...Loading extension module utils... Time to load utils op: 0.00107574462890625 seconds Loading extension module utils... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0005922317504882812 seconds Time to load utils op: 0.0008847713470458984 seconds Time to load utils op: 0.0013260841369628906 seconds Time to load utils op: 0.0009319782257080078 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0009717941284179688 seconds [2023-07-01 08:07:11,072] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states [2023-07-01 08:07:11,072] [INFO] [utils.py:786:see_memory_usage] MA 10.58 GB Max_MA 10.58 GB CA 10.97 GB Max_CA 11 GB [2023-07-01 08:07:11,073] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 59.75 GB, percent = 5.9% [2023-07-01 08:07:11,396] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states [2023-07-01 08:07:11,398] [INFO] [utils.py:786:see_memory_usage] MA 10.89 GB Max_MA 11.05 GB CA 11.43 GB Max_CA 11 GB [2023-07-01 08:07:11,398] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 60.85 GB, percent = 6.0% [2023-07-01 08:07:11,399] [INFO] [stage_1_and_2.py:489:__init__] optimizer state initialized [2023-07-01 08:07:11,651] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer [2023-07-01 08:07:11,652] [INFO] [utils.py:786:see_memory_usage] MA 10.89 GB Max_MA 10.89 GB CA 11.43 GB Max_CA 11 GB [2023-07-01 08:07:11,652] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 61.73 GB, percent = 6.1% [2023-07-01 08:07:11,654] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2023-07-01 08:07:11,654] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler [2023-07-01 08:07:11,654] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2023-07-01 08:07:11,654] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:07:11,655] [INFO] [config.py:960:print] DeepSpeedEngine configuration: [2023-07-01 08:07:11,655] [INFO] [config.py:964:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2023-07-01 08:07:11,655] [INFO] [config.py:964:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2023-07-01 08:07:11,655] [INFO] [config.py:964:print] amp_enabled .................. False [2023-07-01 08:07:11,655] [INFO] [config.py:964:print] amp_params ................... False [2023-07-01 08:07:11,655] [INFO] [config.py:964:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2023-07-01 08:07:11,655] [INFO] [config.py:964:print] bfloat16_enabled ............. False [2023-07-01 08:07:11,655] [INFO] [config.py:964:print] checkpoint_parallel_write_pipeline False [2023-07-01 08:07:11,655] [INFO] [config.py:964:print] checkpoint_tag_validation_enabled True [2023-07-01 08:07:11,655] [INFO] [config.py:964:print] checkpoint_tag_validation_fail False [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] comms_config ................. [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] communication_data_type ...... None [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] curriculum_enabled_legacy .... False [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] curriculum_params_legacy ..... False [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] data_efficiency_enabled ...... False [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] dataloader_drop_last ......... False [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] disable_allgather ............ False [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] dump_state ................... False [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1} [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] eigenvalue_enabled ........... False [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] eigenvalue_gas_boundary_resolution 1 [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] eigenvalue_layer_name ........ bert.encoder.layer [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] eigenvalue_layer_num ......... 0 [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] eigenvalue_max_iter .......... 100 [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] eigenvalue_stability ......... 1e-06 [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] eigenvalue_tol ............... 0.01 [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] eigenvalue_verbose ........... False [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] elasticity_enabled ........... False [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] fp16_auto_cast ............... False [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] fp16_enabled ................. True [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] fp16_master_weights_and_gradients False [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] global_rank .................. 0 [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] grad_accum_dtype ............. None [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] gradient_accumulation_steps .. 1 [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] gradient_clipping ............ 1.0 [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] gradient_predivide_factor .... 1.0 [2023-07-01 08:07:11,656] [INFO] [config.py:964:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] initial_dynamic_scale ........ 65536 [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] load_universal_checkpoint .... False [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] loss_scale ................... 0 [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] memory_breakdown ............. False [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] mics_hierarchial_params_gather False [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] mics_shard_size .............. -1 [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] optimizer_legacy_fusion ...... False [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] optimizer_name ............... None [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] optimizer_params ............. None [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] pld_enabled .................. False [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] pld_params ................... False [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] prescale_gradients ........... False [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] scheduler_name ............... None [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] scheduler_params ............. None [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] sparse_attention ............. None [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] sparse_gradients_enabled ..... False [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] steps_per_print .............. 10 [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] train_batch_size ............. 32 [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] train_micro_batch_size_per_gpu 4 [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] use_node_local_storage ....... False [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] wall_clock_breakdown ......... False [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] world_size ................... 8 [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] zero_allow_untested_optimizer False [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] zero_enabled ................. True [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] zero_force_ds_cpu_optimizer .. True [2023-07-01 08:07:11,657] [INFO] [config.py:964:print] zero_optimization_stage ...... 2 [2023-07-01 08:07:11,658] [INFO] [config.py:950:print_user_config] json = { "train_batch_size": 32, "train_micro_batch_size_per_gpu": 4, "steps_per_print": 10, "zero_optimization": { "stage": 2, "offload_param": { "device": "none" }, "offload_optimizer": { "device": "none" }, "stage3_param_persistence_threshold": 1.000000e+04, "stage3_max_live_parameters": 3.000000e+07, "stage3_prefetch_bucket_size": 3.000000e+07, "memory_efficient_linear": false }, "fp16": { "enabled": true, "loss_scale_window": 100 }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false, "hybrid_engine": { "enabled": false, "max_out_tokens": 512, "inference_tp_size": 1, "release_inference_cache": false, "pin_parameters": true, "tp_gather_partition_size": 8 } } Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0016736984252929688 seconds *****************[end] Initialized Critic Model [end] (duration: 36.06s)****************** ************************[start] Initializing Reward Model [start] ************************ model loaded model loaded model loaded model loaded model loaded model loaded model loaded model loaded [2023-07-01 08:07:25,026] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.3, git-hash=unknown, git-branch=unknown Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.01121068000793457 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.001344919204711914 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0012459754943847656 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0020864009857177734 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0013394355773925781 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0023109912872314453 seconds Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.002219676971435547 seconds [2023-07-01 08:07:33,041] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2023-07-01 08:07:33,043] [INFO] [config.py:960:print] DeepSpeedEngine configuration: [2023-07-01 08:07:33,043] [INFO] [config.py:964:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2023-07-01 08:07:33,043] [INFO] [config.py:964:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2023-07-01 08:07:33,043] [INFO] [config.py:964:print] amp_enabled .................. False [2023-07-01 08:07:33,043] [INFO] [config.py:964:print] amp_params ................... False [2023-07-01 08:07:33,043] [INFO] [config.py:964:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2023-07-01 08:07:33,043] [INFO] [config.py:964:print] bfloat16_enabled ............. False [2023-07-01 08:07:33,043] [INFO] [config.py:964:print] checkpoint_parallel_write_pipeline False [2023-07-01 08:07:33,043] [INFO] [config.py:964:print] checkpoint_tag_validation_enabled True [2023-07-01 08:07:33,043] [INFO] [config.py:964:print] checkpoint_tag_validation_fail False [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] comms_config ................. [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] communication_data_type ...... None [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] curriculum_enabled_legacy .... False [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] curriculum_params_legacy ..... False [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] data_efficiency_enabled ...... False [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] dataloader_drop_last ......... False [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] disable_allgather ............ False [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] dump_state ................... False [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] dynamic_loss_scale_args ...... None [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] eigenvalue_enabled ........... False [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] eigenvalue_gas_boundary_resolution 1 [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] eigenvalue_layer_name ........ bert.encoder.layer [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] eigenvalue_layer_num ......... 0 [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] eigenvalue_max_iter .......... 100 [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] eigenvalue_stability ......... 1e-06 [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] eigenvalue_tol ............... 0.01 [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] eigenvalue_verbose ........... False [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] elasticity_enabled ........... False [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] fp16_auto_cast ............... False [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] fp16_enabled ................. True [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] fp16_master_weights_and_gradients False [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] global_rank .................. 0 [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] grad_accum_dtype ............. None [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] gradient_accumulation_steps .. 1 [2023-07-01 08:07:33,044] [INFO] [config.py:964:print] gradient_clipping ............ 1.0 [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] gradient_predivide_factor .... 1.0 [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] initial_dynamic_scale ........ 65536 [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] load_universal_checkpoint .... False [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] loss_scale ................... 0 [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] memory_breakdown ............. False [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] mics_hierarchial_params_gather False [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] mics_shard_size .............. -1 [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] optimizer_legacy_fusion ...... False [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] optimizer_name ............... None [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] optimizer_params ............. None [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] pld_enabled .................. False [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] pld_params ................... False [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] prescale_gradients ........... False [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] scheduler_name ............... None [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] scheduler_params ............. None [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] sparse_attention ............. None [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] sparse_gradients_enabled ..... False [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] steps_per_print .............. 10 [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] train_batch_size ............. 32 [2023-07-01 08:07:33,045] [INFO] [config.py:964:print] train_micro_batch_size_per_gpu 4 [2023-07-01 08:07:33,046] [INFO] [config.py:964:print] use_node_local_storage ....... False [2023-07-01 08:07:33,046] [INFO] [config.py:964:print] wall_clock_breakdown ......... False [2023-07-01 08:07:33,046] [INFO] [config.py:964:print] world_size ................... 8 [2023-07-01 08:07:33,046] [INFO] [config.py:964:print] zero_allow_untested_optimizer False [2023-07-01 08:07:33,046] [INFO] [config.py:964:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False [2023-07-01 08:07:33,046] [INFO] [config.py:964:print] zero_enabled ................. False [2023-07-01 08:07:33,046] [INFO] [config.py:964:print] zero_force_ds_cpu_optimizer .. True [2023-07-01 08:07:33,046] [INFO] [config.py:964:print] zero_optimization_stage ...... 0 [2023-07-01 08:07:33,046] [INFO] [config.py:950:print_user_config] json = { "train_batch_size": 32, "train_micro_batch_size_per_gpu": 4, "steps_per_print": 10, "zero_optimization": { "stage": 0, "stage3_param_persistence_threshold": 1.000000e+04, "offload_param": { "device": "none" }, "memory_efficient_linear": false }, "fp16": { "enabled": true }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false } Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0013327598571777344 seconds *****************[end] Initialized Reward Model [end] (duration: 21.39s)****************** ***** Running training ***** Beginning of Epoch 1/1, Total Generation Batches 954 ------------------------------------------------------ Free memory : 65.963745 (GigaBytes) Total memory: 79.096497 (GigaBytes) Requested memory: 1.031250 (GigaBytes) Setting maximum total tokens (input + output) to 512 WorkSpace: 0x2b1c0c000000 ------------------------------------------------------ [2023-07-01 08:07:36,761] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 [2023-07-01 08:07:36,919] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 epoch: 0|step: 0|ppo_ep: 1|act_loss: 0.00479888916015625|cri_loss: 0.201416015625|unsuper_loss: 0.0 average reward score: -1.482421875 ------------------------------------------------------------------------------------- |E2E latency=3.85s |Gather latency=0.00s (0.00%) |Generate time=2.86s (74.28%) |Training time=0.81s (21.07%) |Others=0.18 (4.65%)|CurSamplesPerSec=8.31 |AvgSamplesPerSec=8.31 [2023-07-01 08:07:39,051] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 [2023-07-01 08:07:39,207] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 epoch: 0|step: 1|ppo_ep: 1|act_loss: -0.30029296875|cri_loss: 1.76953125|unsuper_loss: 0.0 average reward score: -3.720703125 ------------------------------------------------------------------------------------- |E2E latency=2.29s |Gather latency=0.00s (0.00%) |Generate time=1.51s (66.00%) |Training time=0.60s (26.25%) |Others=0.18 (7.75%)|CurSamplesPerSec=14.00 |AvgSamplesPerSec=10.43 [2023-07-01 08:07:41,323] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384 [2023-07-01 08:07:41,480] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384 epoch: 0|step: 2|ppo_ep: 1|act_loss: -0.11614990234375|cri_loss: 0.6455078125|unsuper_loss: 0.0 average reward score: -1.78515625 ------------------------------------------------------------------------------------- |E2E latency=2.27s |Gather latency=0.00s (0.00%) |Generate time=1.50s (65.88%) |Training time=0.60s (26.37%) |Others=0.18 (7.75%)|CurSamplesPerSec=14.07 |AvgSamplesPerSec=11.41 [2023-07-01 08:07:43,945] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192 epoch: 0|step: 3|ppo_ep: 1|act_loss: -0.0853271484375|cri_loss: 0.2236328125|unsuper_loss: 0.0 average reward score: 0.70947265625 ------------------------------------------------------------------------------------- |E2E latency=2.46s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.84%) |Training time=0.79s (32.02%) |Others=0.18 (7.14%)|CurSamplesPerSec=12.99 |AvgSamplesPerSec=11.77 [2023-07-01 08:07:46,067] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192 epoch: 0|step: 4|ppo_ep: 1|act_loss: -0.032318115234375|cri_loss: 0.200439453125|unsuper_loss: 0.0 average reward score: -0.22509765625 ------------------------------------------------------------------------------------- |E2E latency=2.32s |Gather latency=0.00s (0.00%) |Generate time=1.50s (64.50%) |Training time=0.60s (26.01%) |Others=0.22 (9.49%)|CurSamplesPerSec=13.78 |AvgSamplesPerSec=12.12 epoch: 0|step: 5|ppo_ep: 1|act_loss: -0.345458984375|cri_loss: 1.0078125|unsuper_loss: 0.0 average reward score: -0.45458984375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.67%) |Training time=0.79s (31.56%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.74 |AvgSamplesPerSec=12.22 [2023-07-01 08:07:51,234] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096 epoch: 0|step: 6|ppo_ep: 1|act_loss: 0.09051513671875|cri_loss: 0.2021484375|unsuper_loss: 0.0 average reward score: 0.46240234375 ------------------------------------------------------------------------------------- |E2E latency=2.45s |Gather latency=0.00s (0.00%) |Generate time=1.49s (60.84%) |Training time=0.79s (32.02%) |Others=0.18 (7.14%)|CurSamplesPerSec=13.04 |AvgSamplesPerSec=12.33 epoch: 0|step: 7|ppo_ep: 1|act_loss: 0.128662109375|cri_loss: 0.1058349609375|unsuper_loss: 0.0 average reward score: -1.6728515625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.04%) |Training time=0.78s (31.13%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.39 [2023-07-01 08:07:55,843] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096 epoch: 0|step: 8|ppo_ep: 1|act_loss: -0.116455078125|cri_loss: 0.52734375|unsuper_loss: 0.0 average reward score: -0.121826171875 ------------------------------------------------------------------------------------- |E2E latency=2.31s |Gather latency=0.00s (0.00%) |Generate time=1.50s (64.78%) |Training time=0.60s (25.74%) |Others=0.22 (9.48%)|CurSamplesPerSec=13.84 |AvgSamplesPerSec=12.54 [2023-07-01 08:07:58,169] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=5, lr=[4.825e-07, 4.825e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:07:58,344] [INFO] [timer.py:215:stop] epoch=0/micro_step=10/global_step=10, RunningAvgSamplesPerSec=57.79174715187453, CurrSamplesPerSec=51.394899251349514, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:07:58,506] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=5, lr=[2.5000000000000004e-07, 2.5000000000000004e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 9|ppo_ep: 1|act_loss: 0.11492919921875|cri_loss: 0.1263427734375|unsuper_loss: 0.0 average reward score: -0.98779296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.82%) |Training time=0.79s (31.39%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.56 epoch: 0|step: 10|ppo_ep: 1|act_loss: 0.10296630859375|cri_loss: 0.1690673828125|unsuper_loss: 0.0 average reward score: 0.268798828125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.83%) |Training time=0.78s (31.31%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.58 epoch: 0|step: 11|ppo_ep: 1|act_loss: -0.005687713623046875|cri_loss: 0.12103271484375|unsuper_loss: 0.0 average reward score: -1.0966796875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.86%) |Training time=0.78s (31.27%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.60 epoch: 0|step: 12|ppo_ep: 1|act_loss: -0.173583984375|cri_loss: 0.69140625|unsuper_loss: 0.0 average reward score: -2.2421875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.85%) |Training time=0.78s (31.37%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.61 epoch: 0|step: 13|ppo_ep: 1|act_loss: -0.25|cri_loss: 0.22216796875|unsuper_loss: 0.0 average reward score: 0.267578125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.89%) |Training time=0.78s (31.28%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.63 epoch: 0|step: 14|ppo_ep: 1|act_loss: -0.3173828125|cri_loss: 0.396728515625|unsuper_loss: 0.0 average reward score: -1.21875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.75%) |Training time=0.78s (31.38%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.64 epoch: 0|step: 15|ppo_ep: 1|act_loss: -0.07391357421875|cri_loss: 0.1260986328125|unsuper_loss: 0.0 average reward score: 1.1376953125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.79%) |Training time=0.79s (31.34%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.65 epoch: 0|step: 16|ppo_ep: 1|act_loss: -0.184814453125|cri_loss: 0.1656494140625|unsuper_loss: 0.0 average reward score: -0.33447265625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.75%) |Training time=0.79s (31.37%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.65 epoch: 0|step: 17|ppo_ep: 1|act_loss: -0.11932373046875|cri_loss: 0.05841064453125|unsuper_loss: 0.0 average reward score: 1.076171875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.90%) |Training time=0.78s (31.31%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.66 [2023-07-01 08:08:20,667] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048 epoch: 0|step: 18|ppo_ep: 1|act_loss: -0.076171875|cri_loss: 0.2410888671875|unsuper_loss: 0.0 average reward score: -0.114990234375 ------------------------------------------------------------------------------------- |E2E latency=2.31s |Gather latency=0.00s (0.00%) |Generate time=1.50s (64.77%) |Training time=0.59s (25.71%) |Others=0.22 (9.53%)|CurSamplesPerSec=13.86 |AvgSamplesPerSec=12.72 [2023-07-01 08:08:22,994] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=6, lr=[1.3510000000000003e-06, 1.3510000000000003e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:08:23,174] [INFO] [timer.py:215:stop] epoch=0/micro_step=20/global_step=20, RunningAvgSamplesPerSec=55.12417224553301, CurrSamplesPerSec=51.5406103261145, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:08:23,336] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=5, lr=[7.5e-07, 7.5e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 19|ppo_ep: 1|act_loss: -0.1875|cri_loss: 0.1397705078125|unsuper_loss: 0.0 average reward score: 0.066162109375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.92%) |Training time=0.78s (31.26%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.72 epoch: 0|step: 20|ppo_ep: 1|act_loss: -0.11083984375|cri_loss: 0.08123779296875|unsuper_loss: 0.0 average reward score: -0.359619140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.06%) |Training time=0.78s (31.08%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.73 epoch: 0|step: 21|ppo_ep: 1|act_loss: -0.12493896484375|cri_loss: 1.5810546875|unsuper_loss: 0.0 average reward score: -0.47119140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.26%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.73 epoch: 0|step: 22|ppo_ep: 1|act_loss: 0.006500244140625|cri_loss: 0.096923828125|unsuper_loss: 0.0 average reward score: 0.015869140625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.95%) |Training time=0.78s (31.27%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.73 epoch: 0|step: 23|ppo_ep: 1|act_loss: -0.264404296875|cri_loss: 0.2071533203125|unsuper_loss: 0.0 average reward score: 0.58056640625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.03%) |Training time=0.78s (31.19%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.74 epoch: 0|step: 24|ppo_ep: 1|act_loss: -0.0160980224609375|cri_loss: 0.040435791015625|unsuper_loss: 0.0 average reward score: 0.701171875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.79s (31.41%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.74 epoch: 0|step: 25|ppo_ep: 1|act_loss: 0.0157470703125|cri_loss: 0.09912109375|unsuper_loss: 0.0 average reward score: 0.88232421875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.75%) |Training time=0.79s (31.42%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.74 epoch: 0|step: 26|ppo_ep: 1|act_loss: -0.1885986328125|cri_loss: 0.146728515625|unsuper_loss: 0.0 average reward score: 0.328857421875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.09%) |Training time=0.78s (31.13%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.75 epoch: 0|step: 27|ppo_ep: 1|act_loss: -0.050872802734375|cri_loss: 0.25634765625|unsuper_loss: 0.0 average reward score: 1.8916015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.25%) |Training time=0.77s (30.92%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.75 epoch: 0|step: 28|ppo_ep: 1|act_loss: -0.050018310546875|cri_loss: 0.1875|unsuper_loss: 0.0 average reward score: 0.7919921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.02%) |Training time=0.78s (31.13%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.75 [2023-07-01 08:08:47,979] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=6, lr=[2.316e-06, 2.316e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:08:48,155] [INFO] [timer.py:215:stop] epoch=0/micro_step=30/global_step=30, RunningAvgSamplesPerSec=53.93328586330883, CurrSamplesPerSec=51.76409642894195, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:08:48,316] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=5, lr=[1.25e-06, 1.25e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 29|ppo_ep: 1|act_loss: -0.143310546875|cri_loss: 0.155517578125|unsuper_loss: 0.0 average reward score: -0.705078125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.91%) |Training time=0.78s (31.23%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.75 [2023-07-01 08:08:50,477] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024 epoch: 0|step: 30|ppo_ep: 1|act_loss: -0.201171875|cri_loss: 0.245361328125|unsuper_loss: 0.0 average reward score: 1.01171875 ------------------------------------------------------------------------------------- |E2E latency=2.32s |Gather latency=0.00s (0.00%) |Generate time=1.50s (64.69%) |Training time=0.60s (25.78%) |Others=0.22 (9.52%)|CurSamplesPerSec=13.78 |AvgSamplesPerSec=12.78 epoch: 0|step: 31|ppo_ep: 1|act_loss: -0.1300048828125|cri_loss: 0.458740234375|unsuper_loss: 0.0 average reward score: 1.4248046875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.16%) |Training time=0.78s (31.06%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.78 epoch: 0|step: 32|ppo_ep: 1|act_loss: -0.033416748046875|cri_loss: 0.2181396484375|unsuper_loss: 0.0 average reward score: 2.3203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.08%) |Training time=0.78s (31.12%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.78 epoch: 0|step: 33|ppo_ep: 1|act_loss: 0.0909423828125|cri_loss: 0.2308349609375|unsuper_loss: 0.0 average reward score: 2.40625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.89%) |Training time=0.78s (31.30%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.78 epoch: 0|step: 34|ppo_ep: 1|act_loss: 0.16015625|cri_loss: 0.380859375|unsuper_loss: 0.0 average reward score: 0.155517578125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.72%) |Training time=0.79s (31.44%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.78 [2023-07-01 08:09:03,152] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048 epoch: 0|step: 35|ppo_ep: 1|act_loss: -0.5283203125|cri_loss: 0.5498046875|unsuper_loss: 0.0 average reward score: 0.50830078125 ------------------------------------------------------------------------------------- |E2E latency=2.46s |Gather latency=0.00s (0.00%) |Generate time=1.50s (61.05%) |Training time=0.78s (31.90%) |Others=0.17 (7.06%)|CurSamplesPerSec=13.02 |AvgSamplesPerSec=12.79 epoch: 0|step: 36|ppo_ep: 1|act_loss: -0.3828125|cri_loss: 0.293701171875|unsuper_loss: 0.0 average reward score: 2.45703125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.05%) |Training time=0.78s (31.14%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.79 epoch: 0|step: 37|ppo_ep: 1|act_loss: 0.09698486328125|cri_loss: 1.140625|unsuper_loss: 0.0 average reward score: 1.140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.06%) |Training time=0.78s (31.13%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.79 epoch: 0|step: 38|ppo_ep: 1|act_loss: -0.0311279296875|cri_loss: 0.875|unsuper_loss: 0.0 average reward score: 1.3427734375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.77%) |Training time=0.79s (31.45%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.79 [2023-07-01 08:09:12,781] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=7, lr=[3.1845000000000006e-06, 3.1845000000000006e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:09:12,956] [INFO] [timer.py:215:stop] epoch=0/micro_step=40/global_step=40, RunningAvgSamplesPerSec=53.75452823610218, CurrSamplesPerSec=51.27585440192057, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:09:13,117] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=6, lr=[1.7000000000000002e-06, 1.7000000000000002e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 39|ppo_ep: 1|act_loss: -0.1375732421875|cri_loss: 0.295654296875|unsuper_loss: 0.0 average reward score: 0.8251953125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.78%) |Training time=0.79s (31.47%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.79 epoch: 0|step: 40|ppo_ep: 1|act_loss: 0.08258056640625|cri_loss: 0.638671875|unsuper_loss: 0.0 average reward score: 1.732421875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.19%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.79 epoch: 0|step: 41|ppo_ep: 1|act_loss: 0.0699462890625|cri_loss: 0.9794921875|unsuper_loss: 0.0 average reward score: 0.386474609375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.79s (31.44%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.79 epoch: 0|step: 42|ppo_ep: 1|act_loss: 0.322509765625|cri_loss: 0.9208984375|unsuper_loss: 0.0 average reward score: 1.583984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.79s (31.45%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.79 epoch: 0|step: 43|ppo_ep: 1|act_loss: -0.00948333740234375|cri_loss: 0.356689453125|unsuper_loss: 0.0 average reward score: 2.111328125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.97%) |Training time=0.78s (31.24%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.79 epoch: 0|step: 44|ppo_ep: 1|act_loss: 0.040313720703125|cri_loss: 1.0302734375|unsuper_loss: 0.0 average reward score: 0.6337890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.20%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.79 epoch: 0|step: 45|ppo_ep: 1|act_loss: -0.0197906494140625|cri_loss: 0.369140625|unsuper_loss: 0.0 average reward score: 2.9453125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.04%) |Training time=0.78s (31.18%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.79 epoch: 0|step: 46|ppo_ep: 1|act_loss: 0.023040771484375|cri_loss: 0.261474609375|unsuper_loss: 0.0 average reward score: -0.0382080078125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.10%) |Training time=0.78s (31.07%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.79 epoch: 0|step: 47|ppo_ep: 1|act_loss: 0.08148193359375|cri_loss: 0.2861328125|unsuper_loss: 0.0 average reward score: 2.01953125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.87%) |Training time=0.78s (31.24%) |Others=0.22 (8.90%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.79 epoch: 0|step: 48|ppo_ep: 1|act_loss: -0.533203125|cri_loss: 0.65380859375|unsuper_loss: 0.0 average reward score: 2.21484375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.81%) |Training time=0.79s (31.36%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.79 [2023-07-01 08:09:37,769] [INFO] [logging.py:96:log_dist] [Rank 0] step=50, skipped=7, lr=[4.149500000000001e-06, 4.149500000000001e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:09:37,950] [INFO] [timer.py:215:stop] epoch=0/micro_step=50/global_step=50, RunningAvgSamplesPerSec=53.31678833276616, CurrSamplesPerSec=51.09898873914861, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:09:38,109] [INFO] [logging.py:96:log_dist] [Rank 0] step=50, skipped=6, lr=[2.2e-06, 2.2e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 49|ppo_ep: 1|act_loss: 0.08367919921875|cri_loss: 0.22802734375|unsuper_loss: 0.0 average reward score: 2.3828125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.79s (31.53%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.79 epoch: 0|step: 50|ppo_ep: 1|act_loss: 0.10125732421875|cri_loss: 0.251708984375|unsuper_loss: 0.0 average reward score: 0.8916015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.40%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.79 epoch: 0|step: 51|ppo_ep: 1|act_loss: -0.1507568359375|cri_loss: 0.33447265625|unsuper_loss: 0.0 average reward score: 0.9453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.84%) |Training time=0.78s (31.33%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.79 epoch: 0|step: 52|ppo_ep: 1|act_loss: 0.10089111328125|cri_loss: 0.299560546875|unsuper_loss: 0.0 average reward score: 2.8203125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.65%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.79 epoch: 0|step: 53|ppo_ep: 1|act_loss: -0.151123046875|cri_loss: 0.250244140625|unsuper_loss: 0.0 average reward score: 2.5703125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.34%) |Training time=0.80s (31.91%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.73 |AvgSamplesPerSec=12.79 [2023-07-01 08:09:50,278] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1024, reducing to 512 epoch: 0|step: 54|ppo_ep: 1|act_loss: -0.341796875|cri_loss: 0.6640625|unsuper_loss: 0.0 average reward score: 1.484375 ------------------------------------------------------------------------------------- |E2E latency=2.31s |Gather latency=0.00s (0.00%) |Generate time=1.49s (64.68%) |Training time=0.60s (25.76%) |Others=0.22 (9.56%)|CurSamplesPerSec=13.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 55|ppo_ep: 1|act_loss: 0.1827392578125|cri_loss: 0.251220703125|unsuper_loss: 0.0 average reward score: 1.4091796875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.90%) |Training time=0.78s (31.33%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 56|ppo_ep: 1|act_loss: 0.2410888671875|cri_loss: 0.34375|unsuper_loss: 0.0 average reward score: 1.634765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.08%) |Training time=0.78s (31.10%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 57|ppo_ep: 1|act_loss: 0.279541015625|cri_loss: 0.3466796875|unsuper_loss: 0.0 average reward score: 1.4453125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.11%) |Training time=0.77s (31.06%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 58|ppo_ep: 1|act_loss: -0.273193359375|cri_loss: 0.72021484375|unsuper_loss: 0.0 average reward score: 1.748046875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.21%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 [2023-07-01 08:10:02,583] [INFO] [logging.py:96:log_dist] [Rank 0] step=60, skipped=8, lr=[5.018000000000001e-06, 5.018000000000001e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:10:02,758] [INFO] [timer.py:215:stop] epoch=0/micro_step=60/global_step=60, RunningAvgSamplesPerSec=53.29622440894831, CurrSamplesPerSec=51.95278262916032, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:10:02,917] [INFO] [logging.py:96:log_dist] [Rank 0] step=60, skipped=6, lr=[2.7000000000000004e-06, 2.7000000000000004e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 59|ppo_ep: 1|act_loss: 0.0276947021484375|cri_loss: 0.2529296875|unsuper_loss: 0.0 average reward score: 1.58984375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.02%) |Training time=0.78s (31.25%) |Others=0.22 (8.73%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 60|ppo_ep: 1|act_loss: -0.6240234375|cri_loss: 0.80859375|unsuper_loss: 0.0 average reward score: 2.216796875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.90%) |Training time=0.78s (31.30%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 61|ppo_ep: 1|act_loss: 0.04522705078125|cri_loss: 0.372802734375|unsuper_loss: 0.0 average reward score: 1.447265625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.81%) |Training time=0.78s (31.41%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 62|ppo_ep: 1|act_loss: 0.06988525390625|cri_loss: 0.3798828125|unsuper_loss: 0.0 average reward score: 1.2255859375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.81%) |Training time=0.78s (31.40%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 63|ppo_ep: 1|act_loss: 0.26513671875|cri_loss: 0.4345703125|unsuper_loss: 0.0 average reward score: 2.41015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.79s (31.43%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 64|ppo_ep: 1|act_loss: 0.57861328125|cri_loss: 0.51220703125|unsuper_loss: 0.0 average reward score: 1.8310546875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.78s (31.40%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 65|ppo_ep: 1|act_loss: -0.12744140625|cri_loss: 0.471923828125|unsuper_loss: 0.0 average reward score: 0.8876953125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.18%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 66|ppo_ep: 1|act_loss: 0.11785888671875|cri_loss: 0.328369140625|unsuper_loss: 0.0 average reward score: 0.77880859375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.73%) |Training time=0.79s (31.45%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.81 epoch: 0|step: 67|ppo_ep: 1|act_loss: -0.155517578125|cri_loss: 0.271484375|unsuper_loss: 0.0 average reward score: 0.8349609375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.78%) |Training time=0.79s (31.41%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81 epoch: 0|step: 68|ppo_ep: 1|act_loss: -0.2369384765625|cri_loss: 0.470458984375|unsuper_loss: 0.0 average reward score: 1.556640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.92%) |Training time=0.78s (31.25%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 [2023-07-01 08:10:27,579] [INFO] [logging.py:96:log_dist] [Rank 0] step=70, skipped=8, lr=[5.983e-06, 5.983e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:10:27,758] [INFO] [timer.py:215:stop] epoch=0/micro_step=70/global_step=70, RunningAvgSamplesPerSec=53.04370475961942, CurrSamplesPerSec=52.004153583395585, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:10:27,918] [INFO] [logging.py:96:log_dist] [Rank 0] step=70, skipped=6, lr=[3.2000000000000003e-06, 3.2000000000000003e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 69|ppo_ep: 1|act_loss: -0.050323486328125|cri_loss: 0.59521484375|unsuper_loss: 0.0 average reward score: 2.0703125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.02%) |Training time=0.78s (31.19%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 70|ppo_ep: 1|act_loss: 0.1898193359375|cri_loss: 0.451416015625|unsuper_loss: 0.0 average reward score: 1.25 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.93%) |Training time=0.78s (31.25%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 71|ppo_ep: 1|act_loss: -0.22265625|cri_loss: 0.4140625|unsuper_loss: 0.0 average reward score: 2.359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.06%) |Training time=0.78s (31.15%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 72|ppo_ep: 1|act_loss: 0.25927734375|cri_loss: 0.6513671875|unsuper_loss: 0.0 average reward score: 1.634765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.38%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 73|ppo_ep: 1|act_loss: 0.2115478515625|cri_loss: 0.343994140625|unsuper_loss: 0.0 average reward score: 1.193359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.83%) |Training time=0.79s (31.41%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 74|ppo_ep: 1|act_loss: -0.1343994140625|cri_loss: 0.5712890625|unsuper_loss: 0.0 average reward score: 1.62109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.90%) |Training time=0.78s (31.29%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 75|ppo_ep: 1|act_loss: -0.423095703125|cri_loss: 0.384033203125|unsuper_loss: 0.0 average reward score: 0.6923828125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.83%) |Training time=0.78s (31.36%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 76|ppo_ep: 1|act_loss: -0.04229736328125|cri_loss: 0.2203369140625|unsuper_loss: 0.0 average reward score: 1.1923828125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.22%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 77|ppo_ep: 1|act_loss: 0.01357269287109375|cri_loss: 0.176513671875|unsuper_loss: 0.0 average reward score: 1.23828125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.25%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 78|ppo_ep: 1|act_loss: 0.3056640625|cri_loss: 0.2081298828125|unsuper_loss: 0.0 average reward score: 1.6806640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.93%) |Training time=0.78s (31.29%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 [2023-07-01 08:10:52,583] [INFO] [logging.py:96:log_dist] [Rank 0] step=80, skipped=8, lr=[6.948e-06, 6.948e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:10:52,761] [INFO] [timer.py:215:stop] epoch=0/micro_step=80/global_step=80, RunningAvgSamplesPerSec=52.87346162859914, CurrSamplesPerSec=51.559023442812, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:10:52,921] [INFO] [logging.py:96:log_dist] [Rank 0] step=80, skipped=6, lr=[3.7e-06, 3.7e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 79|ppo_ep: 1|act_loss: 0.20654296875|cri_loss: 0.1474609375|unsuper_loss: 0.0 average reward score: 1.4677734375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.86%) |Training time=0.78s (31.30%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 80|ppo_ep: 1|act_loss: 0.152587890625|cri_loss: 0.2091064453125|unsuper_loss: 0.0 average reward score: 0.423583984375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.59%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 81|ppo_ep: 1|act_loss: -0.0245208740234375|cri_loss: 0.1005859375|unsuper_loss: 0.0 average reward score: 1.4228515625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.78%) |Training time=0.79s (31.41%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81 epoch: 0|step: 82|ppo_ep: 1|act_loss: 0.0177459716796875|cri_loss: 0.6953125|unsuper_loss: 0.0 average reward score: -0.423828125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.69%) |Training time=0.79s (31.53%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81 epoch: 0|step: 83|ppo_ep: 1|act_loss: -0.61376953125|cri_loss: 0.54638671875|unsuper_loss: 0.0 average reward score: 1.32421875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.23%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 84|ppo_ep: 1|act_loss: 0.044830322265625|cri_loss: 0.4111328125|unsuper_loss: 0.0 average reward score: 0.93115234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.89%) |Training time=0.78s (31.25%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 85|ppo_ep: 1|act_loss: 0.51171875|cri_loss: 0.59375|unsuper_loss: 0.0 average reward score: 0.29296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.51%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 86|ppo_ep: 1|act_loss: 0.87841796875|cri_loss: 0.859375|unsuper_loss: 0.0 average reward score: -0.36474609375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.63%) |Training time=0.79s (31.62%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.80 epoch: 0|step: 87|ppo_ep: 1|act_loss: 0.26025390625|cri_loss: 0.1788330078125|unsuper_loss: 0.0 average reward score: 0.8642578125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.75%) |Training time=0.79s (31.51%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 epoch: 0|step: 88|ppo_ep: 1|act_loss: -0.36328125|cri_loss: 0.3447265625|unsuper_loss: 0.0 average reward score: 0.7294921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.86%) |Training time=0.78s (31.34%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 [2023-07-01 08:11:17,629] [INFO] [logging.py:96:log_dist] [Rank 0] step=90, skipped=8, lr=[7.913e-06, 7.913e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:11:17,804] [INFO] [timer.py:215:stop] epoch=0/micro_step=90/global_step=90, RunningAvgSamplesPerSec=52.705695811425215, CurrSamplesPerSec=52.0945200465606, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:11:17,965] [INFO] [logging.py:96:log_dist] [Rank 0] step=90, skipped=6, lr=[4.2000000000000004e-06, 4.2000000000000004e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 89|ppo_ep: 1|act_loss: -0.42333984375|cri_loss: 0.440185546875|unsuper_loss: 0.0 average reward score: -0.0455322265625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.06%) |Training time=0.78s (31.16%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 90|ppo_ep: 1|act_loss: -0.424072265625|cri_loss: 0.2314453125|unsuper_loss: 0.0 average reward score: 2.3984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.08%) |Training time=0.78s (31.09%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 91|ppo_ep: 1|act_loss: 0.7021484375|cri_loss: 0.5908203125|unsuper_loss: 0.0 average reward score: 2.578125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.12%) |Training time=0.78s (31.11%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80 epoch: 0|step: 92|ppo_ep: 1|act_loss: 0.76513671875|cri_loss: 0.6279296875|unsuper_loss: 0.0 average reward score: 1.6533203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.07%) |Training time=0.78s (31.12%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 93|ppo_ep: 1|act_loss: 0.31298828125|cri_loss: 0.272705078125|unsuper_loss: 0.0 average reward score: 1.1591796875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.99%) |Training time=0.78s (31.20%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 epoch: 0|step: 94|ppo_ep: 1|act_loss: 0.306396484375|cri_loss: 0.343994140625|unsuper_loss: 0.0 average reward score: 0.095458984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.86%) |Training time=0.78s (31.34%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 95|ppo_ep: 1|act_loss: -1.2255859375|cri_loss: 1.7861328125|unsuper_loss: 0.0 average reward score: 0.2744140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.94%) |Training time=0.78s (31.24%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 96|ppo_ep: 1|act_loss: -0.08453369140625|cri_loss: 0.12261962890625|unsuper_loss: 0.0 average reward score: 1.5029296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.00%) |Training time=0.78s (31.18%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 97|ppo_ep: 1|act_loss: 0.412109375|cri_loss: 0.2176513671875|unsuper_loss: 0.0 average reward score: 1.33984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.18%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 98|ppo_ep: 1|act_loss: 0.35400390625|cri_loss: 0.208251953125|unsuper_loss: 0.0 average reward score: 0.7197265625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.56%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80 [2023-07-01 08:11:42,650] [INFO] [logging.py:96:log_dist] [Rank 0] step=100, skipped=8, lr=[8.878e-06, 8.878e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:11:42,830] [INFO] [timer.py:215:stop] epoch=0/micro_step=100/global_step=100, RunningAvgSamplesPerSec=52.601879430986315, CurrSamplesPerSec=49.972830583204946, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:11:42,991] [INFO] [logging.py:96:log_dist] [Rank 0] step=100, skipped=6, lr=[4.7e-06, 4.7e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 99|ppo_ep: 1|act_loss: 0.2303466796875|cri_loss: 0.10552978515625|unsuper_loss: 0.0 average reward score: 1.10546875 ------------------------------------------------------------------------------------- |E2E latency=2.52s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.30%) |Training time=0.81s (31.93%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.69 |AvgSamplesPerSec=12.80 epoch: 0|step: 100|ppo_ep: 1|act_loss: 0.24755859375|cri_loss: 0.1744384765625|unsuper_loss: 0.0 average reward score: -0.3828125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.38%) |Training time=0.80s (31.84%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.73 |AvgSamplesPerSec=12.80 epoch: 0|step: 101|ppo_ep: 1|act_loss: -0.00725555419921875|cri_loss: 0.1685791015625|unsuper_loss: 0.0 average reward score: -1.8544921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.81%) |Training time=0.79s (31.41%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 102|ppo_ep: 1|act_loss: 0.0902099609375|cri_loss: 0.1353759765625|unsuper_loss: 0.0 average reward score: 0.08642578125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.00%) |Training time=0.78s (31.16%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 103|ppo_ep: 1|act_loss: -0.06756591796875|cri_loss: 0.167724609375|unsuper_loss: 0.0 average reward score: -0.189453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.63%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 104|ppo_ep: 1|act_loss: -0.0633544921875|cri_loss: 0.1798095703125|unsuper_loss: 0.0 average reward score: -0.07470703125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.78%) |Training time=0.79s (31.40%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 105|ppo_ep: 1|act_loss: 0.1318359375|cri_loss: 0.27294921875|unsuper_loss: 0.0 average reward score: -0.595703125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.43%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 epoch: 0|step: 106|ppo_ep: 1|act_loss: -0.0809326171875|cri_loss: 0.1402587890625|unsuper_loss: 0.0 average reward score: -0.389404296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.98%) |Training time=0.78s (31.23%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 107|ppo_ep: 1|act_loss: -0.419189453125|cri_loss: 0.461669921875|unsuper_loss: 0.0 average reward score: 0.203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.88%) |Training time=0.78s (31.31%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 108|ppo_ep: 1|act_loss: -0.402099609375|cri_loss: 0.374755859375|unsuper_loss: 0.0 average reward score: -1.35546875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.89%) |Training time=0.78s (31.28%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 [2023-07-01 08:12:07,676] [INFO] [logging.py:96:log_dist] [Rank 0] step=110, skipped=8, lr=[9.649869410169466e-06, 9.649869410169466e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:12:07,852] [INFO] [timer.py:215:stop] epoch=0/micro_step=110/global_step=110, RunningAvgSamplesPerSec=52.5035659286346, CurrSamplesPerSec=52.02652925000989, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:12:08,012] [INFO] [logging.py:96:log_dist] [Rank 0] step=110, skipped=6, lr=[4.999729351164122e-06, 4.999729351164122e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 109|ppo_ep: 1|act_loss: -0.46630859375|cri_loss: 0.311767578125|unsuper_loss: 0.0 average reward score: -1.654296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.98%) |Training time=0.78s (31.23%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 110|ppo_ep: 1|act_loss: 0.2249755859375|cri_loss: 0.2509765625|unsuper_loss: 0.0 average reward score: -1.0 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.09%) |Training time=0.78s (31.12%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 epoch: 0|step: 111|ppo_ep: 1|act_loss: 0.1627197265625|cri_loss: 0.2108154296875|unsuper_loss: 0.0 average reward score: -0.64990234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.87%) |Training time=0.78s (31.32%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 112|ppo_ep: 1|act_loss: -1.0615234375|cri_loss: 1.146484375|unsuper_loss: 0.0 average reward score: 0.5087890625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.54%) |Training time=0.80s (31.65%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.74 |AvgSamplesPerSec=12.80 epoch: 0|step: 113|ppo_ep: 1|act_loss: -0.794921875|cri_loss: 0.68359375|unsuper_loss: 0.0 average reward score: 1.3603515625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.70%) |Training time=0.79s (31.42%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.80 epoch: 0|step: 114|ppo_ep: 1|act_loss: 0.352294921875|cri_loss: 0.537109375|unsuper_loss: 0.0 average reward score: 1.6796875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.61%) |Training time=0.79s (31.62%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.80 epoch: 0|step: 115|ppo_ep: 1|act_loss: 0.54638671875|cri_loss: 0.63623046875|unsuper_loss: 0.0 average reward score: 1.669921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.08%) |Training time=0.78s (31.09%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 116|ppo_ep: 1|act_loss: 0.54638671875|cri_loss: 0.7724609375|unsuper_loss: 0.0 average reward score: 0.455810546875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.95%) |Training time=0.78s (31.16%) |Others=0.22 (8.89%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.80 epoch: 0|step: 117|ppo_ep: 1|act_loss: -0.0537109375|cri_loss: 0.304931640625|unsuper_loss: 0.0 average reward score: 1.400390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.54%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 epoch: 0|step: 118|ppo_ep: 1|act_loss: -0.0611572265625|cri_loss: 0.1751708984375|unsuper_loss: 0.0 average reward score: 0.521484375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.99%) |Training time=0.78s (31.16%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80 [2023-07-01 08:12:32,709] [INFO] [logging.py:96:log_dist] [Rank 0] step=120, skipped=8, lr=[9.64529950829165e-06, 9.64529950829165e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:12:32,886] [INFO] [timer.py:215:stop] epoch=0/micro_step=120/global_step=120, RunningAvgSamplesPerSec=52.431066956321224, CurrSamplesPerSec=52.50632887623131, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:12:33,045] [INFO] [logging.py:96:log_dist] [Rank 0] step=120, skipped=6, lr=[4.996685224712077e-06, 4.996685224712077e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 119|ppo_ep: 1|act_loss: 0.335205078125|cri_loss: 0.404296875|unsuper_loss: 0.0 average reward score: 1.6015625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.20%) |Training time=0.77s (31.05%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 epoch: 0|step: 120|ppo_ep: 1|act_loss: 0.072265625|cri_loss: 0.296142578125|unsuper_loss: 0.0 average reward score: 1.1806640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.03%) |Training time=0.78s (31.18%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 121|ppo_ep: 1|act_loss: 0.07342529296875|cri_loss: 0.31103515625|unsuper_loss: 0.0 average reward score: 2.5390625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.76%) |Training time=0.79s (31.44%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80 epoch: 0|step: 122|ppo_ep: 1|act_loss: -0.37158203125|cri_loss: 0.28759765625|unsuper_loss: 0.0 average reward score: 0.6318359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.46%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 123|ppo_ep: 1|act_loss: -0.25927734375|cri_loss: 0.37158203125|unsuper_loss: 0.0 average reward score: 1.94921875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.68%) |Training time=0.79s (31.56%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80 epoch: 0|step: 124|ppo_ep: 1|act_loss: 0.1942138671875|cri_loss: 0.273193359375|unsuper_loss: 0.0 average reward score: 1.0634765625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.12%) |Training time=0.77s (31.09%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80 epoch: 0|step: 125|ppo_ep: 1|act_loss: 0.2216796875|cri_loss: 0.20849609375|unsuper_loss: 0.0 average reward score: -0.217041015625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.02%) |Training time=0.78s (31.19%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 epoch: 0|step: 126|ppo_ep: 1|act_loss: -0.400390625|cri_loss: 0.434814453125|unsuper_loss: 0.0 average reward score: 0.7265625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.84%) |Training time=0.79s (31.34%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 epoch: 0|step: 127|ppo_ep: 1|act_loss: -0.3203125|cri_loss: 0.254150390625|unsuper_loss: 0.0 average reward score: 1.5029296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.50%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 128|ppo_ep: 1|act_loss: -0.27294921875|cri_loss: 0.43896484375|unsuper_loss: 0.0 average reward score: 1.4375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.86%) |Training time=0.78s (31.27%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 [2023-07-01 08:12:57,705] [INFO] [logging.py:96:log_dist] [Rank 0] step=130, skipped=8, lr=[9.63420718206011e-06, 9.63420718206011e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:12:57,886] [INFO] [timer.py:215:stop] epoch=0/micro_step=130/global_step=130, RunningAvgSamplesPerSec=52.37184465997516, CurrSamplesPerSec=52.1022248826302, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:12:58,045] [INFO] [logging.py:96:log_dist] [Rank 0] step=130, skipped=6, lr=[4.99026279355402e-06, 4.99026279355402e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 129|ppo_ep: 1|act_loss: 0.49365234375|cri_loss: 0.46728515625|unsuper_loss: 0.0 average reward score: 0.03564453125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.86%) |Training time=0.78s (31.30%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 epoch: 0|step: 130|ppo_ep: 1|act_loss: -0.1641845703125|cri_loss: 0.4501953125|unsuper_loss: 0.0 average reward score: 0.18115234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.02%) |Training time=0.78s (31.13%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 epoch: 0|step: 131|ppo_ep: 1|act_loss: -0.006988525390625|cri_loss: 0.187744140625|unsuper_loss: 0.0 average reward score: 2.06640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.79%) |Training time=0.78s (31.34%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 epoch: 0|step: 132|ppo_ep: 1|act_loss: 0.260986328125|cri_loss: 0.1494140625|unsuper_loss: 0.0 average reward score: 0.223388671875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.84%) |Training time=0.78s (31.31%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 epoch: 0|step: 133|ppo_ep: 1|act_loss: -0.228759765625|cri_loss: 0.194580078125|unsuper_loss: 0.0 average reward score: 0.6591796875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.22%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 134|ppo_ep: 1|act_loss: 0.1409912109375|cri_loss: 0.2529296875|unsuper_loss: 0.0 average reward score: 1.7900390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.87%) |Training time=0.78s (31.33%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 135|ppo_ep: 1|act_loss: -0.2266845703125|cri_loss: 0.11761474609375|unsuper_loss: 0.0 average reward score: 0.896484375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.23%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 136|ppo_ep: 1|act_loss: -0.005107879638671875|cri_loss: 0.1849365234375|unsuper_loss: 0.0 average reward score: 2.248046875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.04%) |Training time=0.77s (31.10%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80 epoch: 0|step: 137|ppo_ep: 1|act_loss: 0.080322265625|cri_loss: 0.0968017578125|unsuper_loss: 0.0 average reward score: 2.5390625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.99%) |Training time=0.78s (31.19%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80 epoch: 0|step: 138|ppo_ep: 1|act_loss: -0.0014581680297851562|cri_loss: 0.0823974609375|unsuper_loss: 0.0 average reward score: 0.923828125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.92%) |Training time=0.78s (31.25%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 [2023-07-01 08:13:22,697] [INFO] [logging.py:96:log_dist] [Rank 0] step=140, skipped=8, lr=[9.616607440678868e-06, 9.616607440678868e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:13:22,873] [INFO] [timer.py:215:stop] epoch=0/micro_step=140/global_step=140, RunningAvgSamplesPerSec=52.34261795657865, CurrSamplesPerSec=52.49745683026082, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:13:23,033] [INFO] [logging.py:96:log_dist] [Rank 0] step=140, skipped=6, lr=[4.980470747984265e-06, 4.980470747984265e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 139|ppo_ep: 1|act_loss: -0.26171875|cri_loss: 0.2237548828125|unsuper_loss: 0.0 average reward score: 0.68359375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.13%) |Training time=0.77s (31.06%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 epoch: 0|step: 140|ppo_ep: 1|act_loss: 0.01806640625|cri_loss: 0.11468505859375|unsuper_loss: 0.0 average reward score: 1.009765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.78%) |Training time=0.79s (31.40%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 141|ppo_ep: 1|act_loss: 0.301513671875|cri_loss: 0.2069091796875|unsuper_loss: 0.0 average reward score: 2.64453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.79s (31.45%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 142|ppo_ep: 1|act_loss: 0.48779296875|cri_loss: 0.289306640625|unsuper_loss: 0.0 average reward score: 2.27734375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.78s (31.46%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80 epoch: 0|step: 143|ppo_ep: 1|act_loss: 0.0810546875|cri_loss: 0.1422119140625|unsuper_loss: 0.0 average reward score: 0.8125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.93%) |Training time=0.78s (31.25%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 144|ppo_ep: 1|act_loss: 0.0169525146484375|cri_loss: 0.061187744140625|unsuper_loss: 0.0 average reward score: 2.017578125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.85%) |Training time=0.78s (31.30%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 145|ppo_ep: 1|act_loss: 0.12054443359375|cri_loss: 0.11749267578125|unsuper_loss: 0.0 average reward score: 2.16796875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.48%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80 epoch: 0|step: 146|ppo_ep: 1|act_loss: -0.0943603515625|cri_loss: 0.10076904296875|unsuper_loss: 0.0 average reward score: 0.73291015625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.81%) |Training time=0.78s (31.32%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80 epoch: 0|step: 147|ppo_ep: 1|act_loss: 0.12384033203125|cri_loss: 0.09320068359375|unsuper_loss: 0.0 average reward score: 1.529296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.52%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 epoch: 0|step: 148|ppo_ep: 1|act_loss: -0.0909423828125|cri_loss: 0.06854248046875|unsuper_loss: 0.0 average reward score: 1.4853515625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.61%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80 [2023-07-01 08:13:47,708] [INFO] [logging.py:96:log_dist] [Rank 0] step=150, skipped=8, lr=[9.592524098639447e-06, 9.592524098639447e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:13:47,888] [INFO] [timer.py:215:stop] epoch=0/micro_step=150/global_step=150, RunningAvgSamplesPerSec=52.280948186958256, CurrSamplesPerSec=51.466181368635546, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:13:48,048] [INFO] [logging.py:96:log_dist] [Rank 0] step=150, skipped=6, lr=[4.967322337776272e-06, 4.967322337776272e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 149|ppo_ep: 1|act_loss: -0.265625|cri_loss: 0.1473388671875|unsuper_loss: 0.0 average reward score: 1.2587890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.79s (31.44%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 150|ppo_ep: 1|act_loss: -0.297607421875|cri_loss: 0.1822509765625|unsuper_loss: 0.0 average reward score: 1.873046875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.93%) |Training time=0.78s (31.21%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 151|ppo_ep: 1|act_loss: -0.14453125|cri_loss: 0.129638671875|unsuper_loss: 0.0 average reward score: 1.263671875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.22%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80 epoch: 0|step: 152|ppo_ep: 1|act_loss: -0.0584716796875|cri_loss: 0.07110595703125|unsuper_loss: 0.0 average reward score: 1.4951171875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.10%) |Training time=0.77s (31.10%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80 epoch: 0|step: 153|ppo_ep: 1|act_loss: 0.021087646484375|cri_loss: 0.1566162109375|unsuper_loss: 0.0 average reward score: 1.486328125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.22%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80 epoch: 0|step: 154|ppo_ep: 1|act_loss: 0.802734375|cri_loss: 0.6201171875|unsuper_loss: 0.0 average reward score: 1.3837890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.55%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 155|ppo_ep: 1|act_loss: 0.04766845703125|cri_loss: 0.1353759765625|unsuper_loss: 0.0 average reward score: 0.409912109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.85%) |Training time=0.78s (31.35%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 156|ppo_ep: 1|act_loss: 0.26171875|cri_loss: 0.1607666015625|unsuper_loss: 0.0 average reward score: 0.5810546875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.18%) |Training time=0.77s (31.04%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 epoch: 0|step: 157|ppo_ep: 1|act_loss: 0.2440185546875|cri_loss: 0.1148681640625|unsuper_loss: 0.0 average reward score: 2.048828125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.24%) |Training time=0.77s (30.96%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80 epoch: 0|step: 158|ppo_ep: 1|act_loss: 0.26708984375|cri_loss: 0.10986328125|unsuper_loss: 0.0 average reward score: 2.015625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.04%) |Training time=0.78s (31.13%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 [2023-07-01 08:14:12,671] [INFO] [logging.py:96:log_dist] [Rank 0] step=160, skipped=8, lr=[9.561989743497123e-06, 9.561989743497123e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:14:12,849] [INFO] [timer.py:215:stop] epoch=0/micro_step=160/global_step=160, RunningAvgSamplesPerSec=52.269319390366206, CurrSamplesPerSec=52.08991036828757, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:14:13,008] [INFO] [logging.py:96:log_dist] [Rank 0] step=160, skipped=6, lr=[4.950835354254168e-06, 4.950835354254168e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 159|ppo_ep: 1|act_loss: -0.1343994140625|cri_loss: 0.101806640625|unsuper_loss: 0.0 average reward score: 2.173828125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.93%) |Training time=0.78s (31.23%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80 epoch: 0|step: 160|ppo_ep: 1|act_loss: -0.26025390625|cri_loss: 0.2020263671875|unsuper_loss: 0.0 average reward score: 1.400390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.86%) |Training time=0.78s (31.32%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 epoch: 0|step: 161|ppo_ep: 1|act_loss: -0.0947265625|cri_loss: 0.152099609375|unsuper_loss: 0.0 average reward score: 0.2568359375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.68%) |Training time=0.79s (31.50%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80 epoch: 0|step: 162|ppo_ep: 1|act_loss: -0.0096282958984375|cri_loss: 0.0770263671875|unsuper_loss: 0.0 average reward score: 2.259765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.89%) |Training time=0.78s (31.28%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 163|ppo_ep: 1|act_loss: -0.1458740234375|cri_loss: 0.1275634765625|unsuper_loss: 0.0 average reward score: 2.134765625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.78%) |Training time=0.79s (31.40%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.80 epoch: 0|step: 164|ppo_ep: 1|act_loss: 0.0100250244140625|cri_loss: 0.12158203125|unsuper_loss: 0.0 average reward score: 1.6982421875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.79s (31.43%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 165|ppo_ep: 1|act_loss: 0.291015625|cri_loss: 0.1094970703125|unsuper_loss: 0.0 average reward score: 0.74658203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.91%) |Training time=0.78s (31.28%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80 epoch: 0|step: 166|ppo_ep: 1|act_loss: 0.18505859375|cri_loss: 0.1199951171875|unsuper_loss: 0.0 average reward score: 1.6787109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.48%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 167|ppo_ep: 1|act_loss: 0.0205841064453125|cri_loss: 0.121337890625|unsuper_loss: 0.0 average reward score: 1.40234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.72%) |Training time=0.79s (31.49%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 168|ppo_ep: 1|act_loss: 0.200927734375|cri_loss: 0.09912109375|unsuper_loss: 0.0 average reward score: 2.150390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.78%) |Training time=0.79s (31.46%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 [2023-07-01 08:14:37,687] [INFO] [logging.py:96:log_dist] [Rank 0] step=170, skipped=8, lr=[9.525045691776156e-06, 9.525045691776156e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:14:37,862] [INFO] [timer.py:215:stop] epoch=0/micro_step=170/global_step=170, RunningAvgSamplesPerSec=52.22736749070443, CurrSamplesPerSec=51.84219343081623, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:14:38,022] [INFO] [logging.py:96:log_dist] [Rank 0] step=170, skipped=6, lr=[4.931032106219029e-06, 4.931032106219029e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 169|ppo_ep: 1|act_loss: -0.19775390625|cri_loss: 0.05303955078125|unsuper_loss: 0.0 average reward score: 1.37109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.93%) |Training time=0.78s (31.28%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 170|ppo_ep: 1|act_loss: -0.5263671875|cri_loss: 0.384521484375|unsuper_loss: 0.0 average reward score: 2.90234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.12%) |Training time=0.78s (31.09%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 171|ppo_ep: 1|act_loss: 0.102294921875|cri_loss: 0.1634521484375|unsuper_loss: 0.0 average reward score: 2.15234375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.12%) |Training time=0.77s (31.11%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.80 epoch: 0|step: 172|ppo_ep: 1|act_loss: 0.14697265625|cri_loss: 0.12646484375|unsuper_loss: 0.0 average reward score: 1.748046875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.85%) |Training time=0.78s (31.33%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 epoch: 0|step: 173|ppo_ep: 1|act_loss: -0.08734130859375|cri_loss: 0.08270263671875|unsuper_loss: 0.0 average reward score: 0.697265625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.37%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80 epoch: 0|step: 174|ppo_ep: 1|act_loss: -0.31689453125|cri_loss: 0.195556640625|unsuper_loss: 0.0 average reward score: 0.99169921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.78%) |Training time=0.79s (31.46%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 175|ppo_ep: 1|act_loss: 0.1104736328125|cri_loss: 0.09356689453125|unsuper_loss: 0.0 average reward score: 2.2109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.72%) |Training time=0.79s (31.45%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 epoch: 0|step: 176|ppo_ep: 1|act_loss: 0.201904296875|cri_loss: 0.1785888671875|unsuper_loss: 0.0 average reward score: 3.423828125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.84%) |Training time=0.78s (31.33%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 177|ppo_ep: 1|act_loss: -0.07379150390625|cri_loss: 0.1068115234375|unsuper_loss: 0.0 average reward score: 1.94140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.79s (31.43%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 178|ppo_ep: 1|act_loss: -0.10406494140625|cri_loss: 0.2044677734375|unsuper_loss: 0.0 average reward score: 3.119140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.86%) |Training time=0.78s (31.26%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 [2023-07-01 08:15:02,658] [INFO] [logging.py:96:log_dist] [Rank 0] step=180, skipped=8, lr=[9.481741933063763e-06, 9.481741933063763e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:15:02,838] [INFO] [timer.py:215:stop] epoch=0/micro_step=180/global_step=180, RunningAvgSamplesPerSec=52.20197983152347, CurrSamplesPerSec=51.571048563751575, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:15:02,997] [INFO] [logging.py:96:log_dist] [Rank 0] step=180, skipped=6, lr=[4.907939389762475e-06, 4.907939389762475e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 179|ppo_ep: 1|act_loss: 0.365478515625|cri_loss: 0.300048828125|unsuper_loss: 0.0 average reward score: 1.03125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.75%) |Training time=0.79s (31.47%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 180|ppo_ep: 1|act_loss: -0.0006165504455566406|cri_loss: 0.10247802734375|unsuper_loss: 0.0 average reward score: 2.4453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.02%) |Training time=0.78s (31.13%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 181|ppo_ep: 1|act_loss: 0.0224761962890625|cri_loss: 0.07757568359375|unsuper_loss: 0.0 average reward score: 2.96484375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.83%) |Training time=0.78s (31.34%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 182|ppo_ep: 1|act_loss: -0.11004638671875|cri_loss: 0.086669921875|unsuper_loss: 0.0 average reward score: 3.24609375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.85%) |Training time=0.78s (31.31%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 183|ppo_ep: 1|act_loss: -0.07958984375|cri_loss: 0.081298828125|unsuper_loss: 0.0 average reward score: 3.369140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.21%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 184|ppo_ep: 1|act_loss: -0.051361083984375|cri_loss: 0.08209228515625|unsuper_loss: 0.0 average reward score: 0.373046875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.16%) |Training time=0.77s (31.07%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80 epoch: 0|step: 185|ppo_ep: 1|act_loss: -0.0115966796875|cri_loss: 0.1260986328125|unsuper_loss: 0.0 average reward score: 2.322265625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.05%) |Training time=0.78s (31.15%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80 epoch: 0|step: 186|ppo_ep: 1|act_loss: -0.1728515625|cri_loss: 0.16259765625|unsuper_loss: 0.0 average reward score: 4.08984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.93%) |Training time=0.78s (31.26%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80 epoch: 0|step: 187|ppo_ep: 1|act_loss: 0.286865234375|cri_loss: 0.2060546875|unsuper_loss: 0.0 average reward score: 2.9921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.79s (31.41%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 188|ppo_ep: 1|act_loss: -0.10748291015625|cri_loss: 0.06787109375|unsuper_loss: 0.0 average reward score: 1.9609375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.75%) |Training time=0.79s (31.51%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80 [2023-07-01 08:15:27,638] [INFO] [logging.py:96:log_dist] [Rank 0] step=190, skipped=8, lr=[9.432137062368396e-06, 9.432137062368396e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:15:27,813] [INFO] [timer.py:215:stop] epoch=0/micro_step=190/global_step=190, RunningAvgSamplesPerSec=52.18664239862417, CurrSamplesPerSec=51.60189971407404, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:15:27,973] [INFO] [logging.py:96:log_dist] [Rank 0] step=190, skipped=6, lr=[4.881588452008457e-06, 4.881588452008457e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 189|ppo_ep: 1|act_loss: -0.1380615234375|cri_loss: 0.06768798828125|unsuper_loss: 0.0 average reward score: 1.51171875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.81%) |Training time=0.78s (31.42%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80 epoch: 0|step: 190|ppo_ep: 1|act_loss: 0.0645751953125|cri_loss: 0.1287841796875|unsuper_loss: 0.0 average reward score: 2.787109375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (60.04%) |Training time=0.78s (31.15%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.80 epoch: 0|step: 191|ppo_ep: 1|act_loss: 0.2034912109375|cri_loss: 0.1402587890625|unsuper_loss: 0.0 average reward score: 0.9404296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.55%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 epoch: 0|step: 192|ppo_ep: 1|act_loss: -0.046600341796875|cri_loss: 0.12109375|unsuper_loss: 0.0 average reward score: 1.8515625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.87%) |Training time=0.78s (31.27%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 193|ppo_ep: 1|act_loss: 0.00504302978515625|cri_loss: 0.1109619140625|unsuper_loss: 0.0 average reward score: 1.76953125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.07%) |Training time=0.78s (31.11%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 epoch: 0|step: 194|ppo_ep: 1|act_loss: -0.228271484375|cri_loss: 0.13134765625|unsuper_loss: 0.0 average reward score: 3.765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.91%) |Training time=0.78s (31.27%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80 epoch: 0|step: 195|ppo_ep: 1|act_loss: -0.26318359375|cri_loss: 0.289794921875|unsuper_loss: 0.0 average reward score: 3.26171875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.91%) |Training time=0.78s (31.24%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 196|ppo_ep: 1|act_loss: 0.08001708984375|cri_loss: 0.1861572265625|unsuper_loss: 0.0 average reward score: 3.36328125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.57%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 epoch: 0|step: 197|ppo_ep: 1|act_loss: -0.1329345703125|cri_loss: 0.218994140625|unsuper_loss: 0.0 average reward score: 2.84375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.85%) |Training time=0.78s (31.33%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 198|ppo_ep: 1|act_loss: -0.1678466796875|cri_loss: 0.3349609375|unsuper_loss: 0.0 average reward score: 1.615234375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.11%) |Training time=0.77s (31.05%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 [2023-07-01 08:15:52,609] [INFO] [logging.py:96:log_dist] [Rank 0] step=200, skipped=8, lr=[9.376298200833905e-06, 9.376298200833905e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:15:52,786] [INFO] [timer.py:215:stop] epoch=0/micro_step=200/global_step=200, RunningAvgSamplesPerSec=52.17385596484633, CurrSamplesPerSec=52.39363895635663, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:15:52,944] [INFO] [logging.py:96:log_dist] [Rank 0] step=200, skipped=6, lr=[4.852014948832268e-06, 4.852014948832268e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 199|ppo_ep: 1|act_loss: 0.0509033203125|cri_loss: 0.178955078125|unsuper_loss: 0.0 average reward score: 3.533203125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.11%) |Training time=0.78s (31.12%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.80 epoch: 0|step: 200|ppo_ep: 1|act_loss: 0.1173095703125|cri_loss: 0.261962890625|unsuper_loss: 0.0 average reward score: 3.337890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.95%) |Training time=0.78s (31.23%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80 epoch: 0|step: 201|ppo_ep: 1|act_loss: 0.177734375|cri_loss: 0.1182861328125|unsuper_loss: 0.0 average reward score: 3.576171875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.79s (31.42%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 202|ppo_ep: 1|act_loss: -0.267822265625|cri_loss: 0.175048828125|unsuper_loss: 0.0 average reward score: 3.10546875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.86%) |Training time=0.78s (31.37%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 203|ppo_ep: 1|act_loss: -0.051361083984375|cri_loss: 0.1474609375|unsuper_loss: 0.0 average reward score: 2.3671875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.69%) |Training time=0.79s (31.55%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 204|ppo_ep: 1|act_loss: 0.1959228515625|cri_loss: 0.34814453125|unsuper_loss: 0.0 average reward score: 2.306640625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.13%) |Training time=0.77s (31.02%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.80 epoch: 0|step: 205|ppo_ep: 1|act_loss: 0.280517578125|cri_loss: 0.2239990234375|unsuper_loss: 0.0 average reward score: 2.890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.94%) |Training time=0.78s (31.23%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80 epoch: 0|step: 206|ppo_ep: 1|act_loss: -0.043243408203125|cri_loss: 0.2216796875|unsuper_loss: 0.0 average reward score: 2.68359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.98%) |Training time=0.78s (31.20%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 207|ppo_ep: 1|act_loss: -0.389892578125|cri_loss: 0.219482421875|unsuper_loss: 0.0 average reward score: 3.72265625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.06%) |Training time=0.78s (31.14%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 208|ppo_ep: 1|act_loss: 0.26611328125|cri_loss: 0.286865234375|unsuper_loss: 0.0 average reward score: 2.693359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.95%) |Training time=0.78s (31.24%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 [2023-07-01 08:16:17,584] [INFO] [logging.py:96:log_dist] [Rank 0] step=210, skipped=8, lr=[9.31430090491684e-06, 9.31430090491684e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:16:17,764] [INFO] [timer.py:215:stop] epoch=0/micro_step=210/global_step=210, RunningAvgSamplesPerSec=52.15907709400849, CurrSamplesPerSec=51.50153102806808, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:16:17,925] [INFO] [logging.py:96:log_dist] [Rank 0] step=210, skipped=6, lr=[4.819258896614014e-06, 4.819258896614014e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 209|ppo_ep: 1|act_loss: 0.2457275390625|cri_loss: 0.334716796875|unsuper_loss: 0.0 average reward score: 2.59375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.72%) |Training time=0.79s (31.41%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 epoch: 0|step: 210|ppo_ep: 1|act_loss: 0.0133819580078125|cri_loss: 0.45166015625|unsuper_loss: 0.0 average reward score: 2.435546875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.51%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80 epoch: 0|step: 211|ppo_ep: 1|act_loss: -0.0655517578125|cri_loss: 0.1932373046875|unsuper_loss: 0.0 average reward score: 1.869140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.62%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 epoch: 0|step: 212|ppo_ep: 1|act_loss: 0.024932861328125|cri_loss: 0.2200927734375|unsuper_loss: 0.0 average reward score: 0.412109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.38%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 213|ppo_ep: 1|act_loss: 0.1929931640625|cri_loss: 0.30908203125|unsuper_loss: 0.0 average reward score: 0.529296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.79s (31.48%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 214|ppo_ep: 1|act_loss: -0.1488037109375|cri_loss: 0.29833984375|unsuper_loss: 0.0 average reward score: -1.19140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.52%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 215|ppo_ep: 1|act_loss: -0.480712890625|cri_loss: 0.51318359375|unsuper_loss: 0.0 average reward score: 0.87890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.79s (31.46%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 216|ppo_ep: 1|act_loss: 0.2427978515625|cri_loss: 0.29248046875|unsuper_loss: 0.0 average reward score: -0.30810546875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.88%) |Training time=0.78s (31.36%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 217|ppo_ep: 1|act_loss: 0.2440185546875|cri_loss: 0.210205078125|unsuper_loss: 0.0 average reward score: 1.08203125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.12%) |Training time=0.77s (31.09%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.87 |AvgSamplesPerSec=12.80 epoch: 0|step: 218|ppo_ep: 1|act_loss: 0.50927734375|cri_loss: 0.491455078125|unsuper_loss: 0.0 average reward score: -1.513671875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.09%) |Training time=0.77s (31.05%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.80 [2023-07-01 08:16:42,567] [INFO] [logging.py:96:log_dist] [Rank 0] step=220, skipped=8, lr=[9.246229064149799e-06, 9.246229064149799e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:16:42,742] [INFO] [timer.py:215:stop] epoch=0/micro_step=220/global_step=220, RunningAvgSamplesPerSec=52.137330015874944, CurrSamplesPerSec=52.36796315530317, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:16:42,902] [INFO] [logging.py:96:log_dist] [Rank 0] step=220, skipped=6, lr=[4.783364618091804e-06, 4.783364618091804e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 219|ppo_ep: 1|act_loss: 0.1173095703125|cri_loss: 0.07891845703125|unsuper_loss: 0.0 average reward score: 1.5478515625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.08%) |Training time=0.78s (31.10%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80 epoch: 0|step: 220|ppo_ep: 1|act_loss: -0.0670166015625|cri_loss: 0.103271484375|unsuper_loss: 0.0 average reward score: 1.1064453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.04%) |Training time=0.78s (31.14%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 221|ppo_ep: 1|act_loss: -0.178955078125|cri_loss: 0.227783203125|unsuper_loss: 0.0 average reward score: -0.8388671875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.20%) |Training time=0.77s (31.02%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 epoch: 0|step: 222|ppo_ep: 1|act_loss: 0.13623046875|cri_loss: 0.166748046875|unsuper_loss: 0.0 average reward score: 0.187744140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.87%) |Training time=0.78s (31.35%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80 epoch: 0|step: 223|ppo_ep: 1|act_loss: 0.06787109375|cri_loss: 0.13232421875|unsuper_loss: 0.0 average reward score: -1.75 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.72%) |Training time=0.79s (31.42%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80 epoch: 0|step: 224|ppo_ep: 1|act_loss: -0.12060546875|cri_loss: 0.16064453125|unsuper_loss: 0.0 average reward score: -0.8623046875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.85%) |Training time=0.78s (31.27%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 225|ppo_ep: 1|act_loss: -0.2724609375|cri_loss: 0.267822265625|unsuper_loss: 0.0 average reward score: -1.15625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.48%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 226|ppo_ep: 1|act_loss: -0.481689453125|cri_loss: 0.322265625|unsuper_loss: 0.0 average reward score: -0.32275390625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.89%) |Training time=0.78s (31.28%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 epoch: 0|step: 227|ppo_ep: 1|act_loss: -0.326416015625|cri_loss: 0.164794921875|unsuper_loss: 0.0 average reward score: 0.388671875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.91%) |Training time=0.78s (31.28%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 epoch: 0|step: 228|ppo_ep: 1|act_loss: -0.01126861572265625|cri_loss: 0.0947265625|unsuper_loss: 0.0 average reward score: 1.173828125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.47%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 [2023-07-01 08:17:07,550] [INFO] [logging.py:96:log_dist] [Rank 0] step=230, skipped=8, lr=[9.172174787629172e-06, 9.172174787629172e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:17:07,729] [INFO] [timer.py:215:stop] epoch=0/micro_step=230/global_step=230, RunningAvgSamplesPerSec=52.12167218935985, CurrSamplesPerSec=51.53437660663692, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:17:07,890] [INFO] [logging.py:96:log_dist] [Rank 0] step=230, skipped=6, lr=[4.74438068238795e-06, 4.74438068238795e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 229|ppo_ep: 1|act_loss: 0.2076416015625|cri_loss: 0.1134033203125|unsuper_loss: 0.0 average reward score: -0.1201171875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.80%) |Training time=0.79s (31.37%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80 epoch: 0|step: 230|ppo_ep: 1|act_loss: 0.2239990234375|cri_loss: 0.1163330078125|unsuper_loss: 0.0 average reward score: 0.81884765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.97%) |Training time=0.78s (31.25%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 231|ppo_ep: 1|act_loss: -0.10308837890625|cri_loss: 0.0853271484375|unsuper_loss: 0.0 average reward score: -0.75634765625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.15%) |Training time=0.77s (31.07%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80 epoch: 0|step: 232|ppo_ep: 1|act_loss: -0.4169921875|cri_loss: 0.28955078125|unsuper_loss: 0.0 average reward score: -0.7265625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.03%) |Training time=0.78s (31.14%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 epoch: 0|step: 233|ppo_ep: 1|act_loss: -0.12939453125|cri_loss: 0.1280517578125|unsuper_loss: 0.0 average reward score: -0.681640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.81%) |Training time=0.78s (31.39%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80 epoch: 0|step: 234|ppo_ep: 1|act_loss: 0.334716796875|cri_loss: 0.263916015625|unsuper_loss: 0.0 average reward score: -0.54296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.65%) |Training time=0.79s (31.55%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 epoch: 0|step: 235|ppo_ep: 1|act_loss: 0.29736328125|cri_loss: 0.178466796875|unsuper_loss: 0.0 average reward score: -0.213623046875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.95%) |Training time=0.78s (31.26%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80 epoch: 0|step: 236|ppo_ep: 1|act_loss: 0.30322265625|cri_loss: 0.199462890625|unsuper_loss: 0.0 average reward score: -1.373046875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.91%) |Training time=0.78s (31.33%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 237|ppo_ep: 1|act_loss: 0.0158843994140625|cri_loss: 0.138916015625|unsuper_loss: 0.0 average reward score: 0.0645751953125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.79s (31.47%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80 epoch: 0|step: 238|ppo_ep: 1|act_loss: -0.207763671875|cri_loss: 0.1807861328125|unsuper_loss: 0.0 average reward score: -1.05859375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.92%) |Training time=0.78s (31.30%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 [2023-07-01 08:17:32,518] [INFO] [logging.py:96:log_dist] [Rank 0] step=240, skipped=8, lr=[9.09223827938084e-06, 9.09223827938084e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:17:32,696] [INFO] [timer.py:215:stop] epoch=0/micro_step=240/global_step=240, RunningAvgSamplesPerSec=52.11019433075464, CurrSamplesPerSec=52.049711379426014, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:17:32,855] [INFO] [logging.py:96:log_dist] [Rank 0] step=240, skipped=6, lr=[4.702359839289306e-06, 4.702359839289306e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 239|ppo_ep: 1|act_loss: 0.1796875|cri_loss: 0.173583984375|unsuper_loss: 0.0 average reward score: -2.109375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.90%) |Training time=0.78s (31.30%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80 epoch: 0|step: 240|ppo_ep: 1|act_loss: 0.102294921875|cri_loss: 0.153564453125|unsuper_loss: 0.0 average reward score: -1.3837890625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.20%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 epoch: 0|step: 241|ppo_ep: 1|act_loss: 0.254638671875|cri_loss: 0.268310546875|unsuper_loss: 0.0 average reward score: -1.421875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.12%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80 epoch: 0|step: 242|ppo_ep: 1|act_loss: 0.320068359375|cri_loss: 0.2890625|unsuper_loss: 0.0 average reward score: -1.111328125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.75%) |Training time=0.79s (31.45%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.80 epoch: 0|step: 243|ppo_ep: 1|act_loss: 0.02081298828125|cri_loss: 0.0860595703125|unsuper_loss: 0.0 average reward score: -0.7314453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.89%) |Training time=0.78s (31.19%) |Others=0.22 (8.93%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 epoch: 0|step: 244|ppo_ep: 1|act_loss: 0.2783203125|cri_loss: 0.1820068359375|unsuper_loss: 0.0 average reward score: -1.31640625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.94%) |Training time=0.78s (31.20%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 epoch: 0|step: 245|ppo_ep: 1|act_loss: -0.2364501953125|cri_loss: 0.1331787109375|unsuper_loss: 0.0 average reward score: -1.1396484375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.94%) |Training time=0.78s (31.23%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80 epoch: 0|step: 246|ppo_ep: 1|act_loss: -0.419677734375|cri_loss: 0.204345703125|unsuper_loss: 0.0 average reward score: -2.09765625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.02%) |Training time=0.78s (31.11%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80 epoch: 0|step: 247|ppo_ep: 1|act_loss: -0.388427734375|cri_loss: 0.2030029296875|unsuper_loss: 0.0 average reward score: -1.853515625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.98%) |Training time=0.78s (31.22%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80 epoch: 0|step: 248|ppo_ep: 1|act_loss: 0.226806640625|cri_loss: 0.197509765625|unsuper_loss: 0.0 average reward score: -2.10546875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.85%) |Training time=0.78s (31.32%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80 [2023-07-01 08:17:57,482] [INFO] [logging.py:96:log_dist] [Rank 0] step=250, skipped=8, lr=[9.006527702772504e-06, 9.006527702772504e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:17:57,658] [INFO] [timer.py:215:stop] epoch=0/micro_step=250/global_step=250, RunningAvgSamplesPerSec=52.10667841485632, CurrSamplesPerSec=52.08291653747903, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:17:57,817] [INFO] [logging.py:96:log_dist] [Rank 0] step=250, skipped=6, lr=[4.657358947870691e-06, 4.657358947870691e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 249|ppo_ep: 1|act_loss: 0.3671875|cri_loss: 0.2061767578125|unsuper_loss: 0.0 average reward score: -2.86328125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.24%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 epoch: 0|step: 250|ppo_ep: 1|act_loss: 0.449951171875|cri_loss: 0.2022705078125|unsuper_loss: 0.0 average reward score: -3.322265625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.13%) |Training time=0.78s (31.07%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80 epoch: 0|step: 251|ppo_ep: 1|act_loss: 0.2705078125|cri_loss: 0.1883544921875|unsuper_loss: 0.0 average reward score: -3.009765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.79s (31.43%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 epoch: 0|step: 252|ppo_ep: 1|act_loss: 0.11505126953125|cri_loss: 0.1763916015625|unsuper_loss: 0.0 average reward score: -0.480224609375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.79s (31.46%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 epoch: 0|step: 253|ppo_ep: 1|act_loss: -0.032073974609375|cri_loss: 0.09918212890625|unsuper_loss: 0.0 average reward score: -2.31640625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.21%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80 epoch: 0|step: 254|ppo_ep: 1|act_loss: -0.188232421875|cri_loss: 0.1361083984375|unsuper_loss: 0.0 average reward score: -0.43310546875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.12%) |Training time=0.77s (31.11%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 255|ppo_ep: 1|act_loss: 0.0109405517578125|cri_loss: 0.13916015625|unsuper_loss: 0.0 average reward score: -0.94873046875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.08%) |Training time=0.78s (31.06%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 256|ppo_ep: 1|act_loss: -0.18408203125|cri_loss: 0.10284423828125|unsuper_loss: 0.0 average reward score: -1.50390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.82%) |Training time=0.79s (31.35%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80 epoch: 0|step: 257|ppo_ep: 1|act_loss: 0.223876953125|cri_loss: 0.0982666015625|unsuper_loss: 0.0 average reward score: -2.5703125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.68%) |Training time=0.79s (31.50%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.80 epoch: 0|step: 258|ppo_ep: 1|act_loss: -0.052978515625|cri_loss: 0.153076171875|unsuper_loss: 0.0 average reward score: -2.15625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.79s (31.42%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80 [2023-07-01 08:18:22,461] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, but hysteresis is 2. Reducing hysteresis to 1 [2023-07-01 08:18:22,461] [INFO] [logging.py:96:log_dist] [Rank 0] step=260, skipped=9, lr=[8.924547050894679e-06, 8.924547050894679e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:18:22,462] [INFO] [timer.py:215:stop] epoch=0/micro_step=260/global_step=260, RunningAvgSamplesPerSec=52.15651524122892, CurrSamplesPerSec=74.34278909266142, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:18:22,620] [INFO] [logging.py:96:log_dist] [Rank 0] step=260, skipped=6, lr=[4.609438899557964e-06, 4.609438899557964e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 259|ppo_ep: 1|act_loss: -0.35400390625|cri_loss: 0.207763671875|unsuper_loss: 0.0 average reward score: -2.005859375 ------------------------------------------------------------------------------------- |E2E latency=2.31s |Gather latency=0.00s (0.00%) |Generate time=1.50s (64.78%) |Training time=0.59s (25.71%) |Others=0.22 (9.51%)|CurSamplesPerSec=13.84 |AvgSamplesPerSec=12.81 [2023-07-01 08:18:24,766] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024 epoch: 0|step: 260|ppo_ep: 1|act_loss: -0.1966552734375|cri_loss: 0.1453857421875|unsuper_loss: 0.0 average reward score: -2.146484375 ------------------------------------------------------------------------------------- |E2E latency=2.31s |Gather latency=0.00s (0.00%) |Generate time=1.49s (64.80%) |Training time=0.59s (25.60%) |Others=0.22 (9.60%)|CurSamplesPerSec=13.87 |AvgSamplesPerSec=12.81 epoch: 0|step: 261|ppo_ep: 1|act_loss: -0.19091796875|cri_loss: 0.292724609375|unsuper_loss: 0.0 average reward score: -2.21484375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.79s (31.42%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 262|ppo_ep: 1|act_loss: -0.1279296875|cri_loss: 0.092041015625|unsuper_loss: 0.0 average reward score: -3.62109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.60%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 263|ppo_ep: 1|act_loss: 0.021759033203125|cri_loss: 0.151611328125|unsuper_loss: 0.0 average reward score: -2.53515625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.84%) |Training time=0.78s (31.35%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 264|ppo_ep: 1|act_loss: -0.017181396484375|cri_loss: 0.1300048828125|unsuper_loss: 0.0 average reward score: -3.58984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.80%) |Training time=0.79s (31.43%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 265|ppo_ep: 1|act_loss: 0.30712890625|cri_loss: 0.19140625|unsuper_loss: 0.0 average reward score: -3.1484375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.24%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 266|ppo_ep: 1|act_loss: -0.01415252685546875|cri_loss: 0.1092529296875|unsuper_loss: 0.0 average reward score: -2.630859375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.05%) |Training time=0.78s (31.12%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 267|ppo_ep: 1|act_loss: 0.1328125|cri_loss: 0.0802001953125|unsuper_loss: 0.0 average reward score: -2.01953125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.19%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 268|ppo_ep: 1|act_loss: -0.07440185546875|cri_loss: 0.1796875|unsuper_loss: 0.0 average reward score: -1.6611328125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.90%) |Training time=0.78s (31.34%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 [2023-07-01 08:18:47,070] [INFO] [logging.py:96:log_dist] [Rank 0] step=270, skipped=10, lr=[8.838073100970824e-06, 8.838073100970824e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:18:47,246] [INFO] [timer.py:215:stop] epoch=0/micro_step=270/global_step=270, RunningAvgSamplesPerSec=52.20487912951929, CurrSamplesPerSec=52.52854267419502, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:18:47,406] [INFO] [logging.py:96:log_dist] [Rank 0] step=270, skipped=6, lr=[4.558664535734864e-06, 4.558664535734864e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 269|ppo_ep: 1|act_loss: -0.275390625|cri_loss: 0.201904296875|unsuper_loss: 0.0 average reward score: -2.205078125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.10%) |Training time=0.77s (31.09%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 270|ppo_ep: 1|act_loss: 0.048828125|cri_loss: 0.1572265625|unsuper_loss: 0.0 average reward score: -3.140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.88%) |Training time=0.78s (31.32%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 271|ppo_ep: 1|act_loss: 0.036224365234375|cri_loss: 0.1575927734375|unsuper_loss: 0.0 average reward score: -1.5205078125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.85%) |Training time=0.78s (31.34%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 272|ppo_ep: 1|act_loss: 0.05078125|cri_loss: 0.0865478515625|unsuper_loss: 0.0 average reward score: -4.41015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.19%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 273|ppo_ep: 1|act_loss: 0.10662841796875|cri_loss: 0.09228515625|unsuper_loss: 0.0 average reward score: -3.4453125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.92%) |Training time=0.78s (31.23%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 274|ppo_ep: 1|act_loss: -0.181396484375|cri_loss: 0.1827392578125|unsuper_loss: 0.0 average reward score: -4.5546875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.75%) |Training time=0.79s (31.38%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 275|ppo_ep: 1|act_loss: 0.27099609375|cri_loss: 0.1676025390625|unsuper_loss: 0.0 average reward score: -2.134765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.77%) |Training time=0.79s (31.38%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 276|ppo_ep: 1|act_loss: 0.31396484375|cri_loss: 0.1676025390625|unsuper_loss: 0.0 average reward score: -2.30859375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.89%) |Training time=0.79s (31.34%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 277|ppo_ep: 1|act_loss: 0.3427734375|cri_loss: 0.438720703125|unsuper_loss: 0.0 average reward score: -1.958984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.90%) |Training time=0.78s (31.31%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 278|ppo_ep: 1|act_loss: 0.108154296875|cri_loss: 0.1785888671875|unsuper_loss: 0.0 average reward score: -2.982421875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.92%) |Training time=0.78s (31.30%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 [2023-07-01 08:19:12,074] [INFO] [logging.py:96:log_dist] [Rank 0] step=280, skipped=10, lr=[8.736836458736355e-06, 8.736836458736355e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:19:12,253] [INFO] [timer.py:215:stop] epoch=0/micro_step=280/global_step=280, RunningAvgSamplesPerSec=52.18662218968872, CurrSamplesPerSec=51.58343614068504, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:19:12,414] [INFO] [logging.py:96:log_dist] [Rank 0] step=280, skipped=6, lr=[4.5051045600050906e-06, 4.5051045600050906e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 279|ppo_ep: 1|act_loss: 0.006816864013671875|cri_loss: 0.06549072265625|unsuper_loss: 0.0 average reward score: -2.412109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.79%) |Training time=0.79s (31.38%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 280|ppo_ep: 1|act_loss: 0.0237884521484375|cri_loss: 0.265625|unsuper_loss: 0.0 average reward score: -2.21875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.75%) |Training time=0.79s (31.41%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 281|ppo_ep: 1|act_loss: 0.0087738037109375|cri_loss: 0.1356201171875|unsuper_loss: 0.0 average reward score: -2.646484375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.83%) |Training time=0.78s (31.36%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 282|ppo_ep: 1|act_loss: 0.0026302337646484375|cri_loss: 0.09356689453125|unsuper_loss: 0.0 average reward score: -2.587890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.37%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 283|ppo_ep: 1|act_loss: -0.056884765625|cri_loss: 0.11456298828125|unsuper_loss: 0.0 average reward score: -2.431640625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.87%) |Training time=0.78s (31.31%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 284|ppo_ep: 1|act_loss: -0.126953125|cri_loss: 0.216552734375|unsuper_loss: 0.0 average reward score: -1.4580078125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.79s (31.41%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 285|ppo_ep: 1|act_loss: -0.2120361328125|cri_loss: 0.249267578125|unsuper_loss: 0.0 average reward score: -4.5234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.39%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 286|ppo_ep: 1|act_loss: 0.10711669921875|cri_loss: 0.1737060546875|unsuper_loss: 0.0 average reward score: -2.416015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.82%) |Training time=0.78s (31.39%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 287|ppo_ep: 1|act_loss: 0.06396484375|cri_loss: 0.1331787109375|unsuper_loss: 0.0 average reward score: -3.41796875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.87%) |Training time=0.78s (31.37%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 288|ppo_ep: 1|act_loss: -0.08319091796875|cri_loss: 0.1009521484375|unsuper_loss: 0.0 average reward score: -1.4404296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.91%) |Training time=0.78s (31.20%) |Others=0.22 (8.89%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 [2023-07-01 08:19:37,055] [INFO] [logging.py:96:log_dist] [Rank 0] step=290, skipped=10, lr=[8.630306648029188e-06, 8.630306648029188e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:19:37,235] [INFO] [timer.py:215:stop] epoch=0/micro_step=290/global_step=290, RunningAvgSamplesPerSec=52.16971898264238, CurrSamplesPerSec=51.749367195003416, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:19:37,396] [INFO] [logging.py:96:log_dist] [Rank 0] step=290, skipped=6, lr=[4.448831445228368e-06, 4.448831445228368e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 289|ppo_ep: 1|act_loss: 0.2198486328125|cri_loss: 0.09527587890625|unsuper_loss: 0.0 average reward score: -2.9453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.86%) |Training time=0.78s (31.30%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 290|ppo_ep: 1|act_loss: -0.0577392578125|cri_loss: 0.12420654296875|unsuper_loss: 0.0 average reward score: -2.8671875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.92%) |Training time=0.78s (31.20%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 291|ppo_ep: 1|act_loss: -0.21630859375|cri_loss: 0.1055908203125|unsuper_loss: 0.0 average reward score: -2.427734375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.89%) |Training time=0.78s (31.28%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 292|ppo_ep: 1|act_loss: -0.09027099609375|cri_loss: 0.1099853515625|unsuper_loss: 0.0 average reward score: -4.58984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.79%) |Training time=0.79s (31.38%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 293|ppo_ep: 1|act_loss: -0.005260467529296875|cri_loss: 0.088134765625|unsuper_loss: 0.0 average reward score: -3.82421875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.81%) |Training time=0.78s (31.36%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 294|ppo_ep: 1|act_loss: 0.316162109375|cri_loss: 0.095703125|unsuper_loss: 0.0 average reward score: -2.01171875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.79s (31.42%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 295|ppo_ep: 1|act_loss: -0.09100341796875|cri_loss: 0.0877685546875|unsuper_loss: 0.0 average reward score: -2.4609375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.87%) |Training time=0.78s (31.32%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 296|ppo_ep: 1|act_loss: -0.2142333984375|cri_loss: 0.107666015625|unsuper_loss: 0.0 average reward score: -2.798828125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.79%) |Training time=0.78s (31.43%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 297|ppo_ep: 1|act_loss: 0.1929931640625|cri_loss: 0.12384033203125|unsuper_loss: 0.0 average reward score: -2.666015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.84%) |Training time=0.78s (31.36%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 298|ppo_ep: 1|act_loss: 0.409912109375|cri_loss: 0.257080078125|unsuper_loss: 0.0 average reward score: -1.650390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.36%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 [2023-07-01 08:20:02,053] [INFO] [logging.py:96:log_dist] [Rank 0] step=300, skipped=10, lr=[8.518627816039882e-06, 8.518627816039882e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:20:02,229] [INFO] [timer.py:215:stop] epoch=0/micro_step=300/global_step=300, RunningAvgSamplesPerSec=52.151096640881924, CurrSamplesPerSec=51.24603745201091, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:20:02,388] [INFO] [logging.py:96:log_dist] [Rank 0] step=300, skipped=6, lr=[4.389921335456253e-06, 4.389921335456253e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 299|ppo_ep: 1|act_loss: 0.009185791015625|cri_loss: 0.052886962890625|unsuper_loss: 0.0 average reward score: -2.890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.55%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 300|ppo_ep: 1|act_loss: -0.1358642578125|cri_loss: 0.07867431640625|unsuper_loss: 0.0 average reward score: -2.0234375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.99%) |Training time=0.78s (31.24%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 301|ppo_ep: 1|act_loss: -0.2626953125|cri_loss: 0.099609375|unsuper_loss: 0.0 average reward score: -1.32421875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.23%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 302|ppo_ep: 1|act_loss: -0.1563720703125|cri_loss: 0.18017578125|unsuper_loss: 0.0 average reward score: -1.7333984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.85%) |Training time=0.78s (31.39%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 303|ppo_ep: 1|act_loss: 0.296875|cri_loss: 0.194091796875|unsuper_loss: 0.0 average reward score: -4.99609375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.93%) |Training time=0.78s (31.25%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 304|ppo_ep: 1|act_loss: 0.0010328292846679688|cri_loss: 0.12188720703125|unsuper_loss: 0.0 average reward score: -2.32421875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.89%) |Training time=0.78s (31.29%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 305|ppo_ep: 1|act_loss: -0.1708984375|cri_loss: 0.233642578125|unsuper_loss: 0.0 average reward score: -1.4580078125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.53%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 306|ppo_ep: 1|act_loss: -0.1097412109375|cri_loss: 0.2275390625|unsuper_loss: 0.0 average reward score: -2.587890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.79s (31.40%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 307|ppo_ep: 1|act_loss: -0.113525390625|cri_loss: 0.10150146484375|unsuper_loss: 0.0 average reward score: -2.595703125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.68%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 308|ppo_ep: 1|act_loss: -0.1412353515625|cri_loss: 0.28759765625|unsuper_loss: 0.0 average reward score: -2.68359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.48%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 [2023-07-01 08:20:27,026] [INFO] [logging.py:96:log_dist] [Rank 0] step=310, skipped=10, lr=[8.401951077182031e-06, 8.401951077182031e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:20:27,205] [INFO] [timer.py:215:stop] epoch=0/micro_step=310/global_step=310, RunningAvgSamplesPerSec=52.13350628077688, CurrSamplesPerSec=51.58214755541754, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:20:27,364] [INFO] [logging.py:96:log_dist] [Rank 0] step=310, skipped=6, lr=[4.328453942900402e-06, 4.328453942900402e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 309|ppo_ep: 1|act_loss: -0.07696533203125|cri_loss: 0.1412353515625|unsuper_loss: 0.0 average reward score: -1.78125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.78s (31.42%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 310|ppo_ep: 1|act_loss: -0.26171875|cri_loss: 0.12054443359375|unsuper_loss: 0.0 average reward score: -1.0166015625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.92%) |Training time=0.78s (31.27%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 311|ppo_ep: 1|act_loss: 0.0108642578125|cri_loss: 0.0836181640625|unsuper_loss: 0.0 average reward score: -1.6484375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.78%) |Training time=0.78s (31.33%) |Others=0.22 (8.89%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 312|ppo_ep: 1|act_loss: 0.167724609375|cri_loss: 0.0809326171875|unsuper_loss: 0.0 average reward score: -1.09375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.07%) |Training time=0.78s (31.07%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 313|ppo_ep: 1|act_loss: 0.0885009765625|cri_loss: 0.0457763671875|unsuper_loss: 0.0 average reward score: -1.857421875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.02%) |Training time=0.78s (31.19%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 314|ppo_ep: 1|act_loss: 0.438720703125|cri_loss: 0.1385498046875|unsuper_loss: 0.0 average reward score: -2.98828125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.03%) |Training time=0.78s (31.18%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 315|ppo_ep: 1|act_loss: -0.58837890625|cri_loss: 0.387939453125|unsuper_loss: 0.0 average reward score: 1.1044921875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.06%) |Training time=0.78s (31.14%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 316|ppo_ep: 1|act_loss: -0.2413330078125|cri_loss: 0.05047607421875|unsuper_loss: 0.0 average reward score: 1.2255859375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.24%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 317|ppo_ep: 1|act_loss: -0.110107421875|cri_loss: 0.201171875|unsuper_loss: 0.0 average reward score: 4.46875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.81%) |Training time=0.78s (31.37%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 318|ppo_ep: 1|act_loss: -0.237548828125|cri_loss: 0.17626953125|unsuper_loss: 0.0 average reward score: 3.4609375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.90%) |Training time=0.78s (31.31%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 [2023-07-01 08:20:51,996] [INFO] [logging.py:96:log_dist] [Rank 0] step=320, skipped=10, lr=[8.280434308616948e-06, 8.280434308616948e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:20:52,171] [INFO] [timer.py:215:stop] epoch=0/micro_step=320/global_step=320, RunningAvgSamplesPerSec=52.127762109783276, CurrSamplesPerSec=51.654149181493786, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:20:52,330] [INFO] [logging.py:96:log_dist] [Rank 0] step=320, skipped=6, lr=[4.264512440072707e-06, 4.264512440072707e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 319|ppo_ep: 1|act_loss: -0.054931640625|cri_loss: 0.1356201171875|unsuper_loss: 0.0 average reward score: 3.72265625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.79%) |Training time=0.79s (31.47%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 320|ppo_ep: 1|act_loss: -0.06903076171875|cri_loss: 0.10064697265625|unsuper_loss: 0.0 average reward score: 3.5234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.87%) |Training time=0.78s (31.22%) |Others=0.22 (8.91%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 321|ppo_ep: 1|act_loss: -0.1444091796875|cri_loss: 0.041595458984375|unsuper_loss: 0.0 average reward score: 3.27734375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.83%) |Training time=0.78s (31.32%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 322|ppo_ep: 1|act_loss: -0.04400634765625|cri_loss: 0.057281494140625|unsuper_loss: 0.0 average reward score: 1.736328125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.83%) |Training time=0.78s (31.32%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 323|ppo_ep: 1|act_loss: -0.10302734375|cri_loss: 0.11444091796875|unsuper_loss: 0.0 average reward score: 2.623046875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.86%) |Training time=0.78s (31.30%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 324|ppo_ep: 1|act_loss: 0.07342529296875|cri_loss: 0.12744140625|unsuper_loss: 0.0 average reward score: 2.306640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.57%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 325|ppo_ep: 1|act_loss: -0.017364501953125|cri_loss: 0.0731201171875|unsuper_loss: 0.0 average reward score: 4.0 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.93%) |Training time=0.78s (31.24%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 326|ppo_ep: 1|act_loss: -0.176513671875|cri_loss: 0.1510009765625|unsuper_loss: 0.0 average reward score: 2.859375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.72%) |Training time=0.79s (31.50%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81 epoch: 0|step: 327|ppo_ep: 1|act_loss: -0.005863189697265625|cri_loss: 0.17626953125|unsuper_loss: 0.0 average reward score: 2.05078125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.78%) |Training time=0.79s (31.44%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 328|ppo_ep: 1|act_loss: 0.07830810546875|cri_loss: 0.06134033203125|unsuper_loss: 0.0 average reward score: 2.84375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.91%) |Training time=0.78s (31.31%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 [2023-07-01 08:21:17,001] [INFO] [logging.py:96:log_dist] [Rank 0] step=330, skipped=10, lr=[8.154241936627547e-06, 8.154241936627547e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:21:17,177] [INFO] [timer.py:215:stop] epoch=0/micro_step=330/global_step=330, RunningAvgSamplesPerSec=52.11398943675025, CurrSamplesPerSec=51.958836384189105, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:21:17,336] [INFO] [logging.py:96:log_dist] [Rank 0] step=330, skipped=6, lr=[4.198183347243233e-06, 4.198183347243233e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 329|ppo_ep: 1|act_loss: 0.07696533203125|cri_loss: 0.068603515625|unsuper_loss: 0.0 average reward score: 4.14453125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.90%) |Training time=0.78s (31.33%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 330|ppo_ep: 1|act_loss: -0.039886474609375|cri_loss: 0.044189453125|unsuper_loss: 0.0 average reward score: 3.6640625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.92%) |Training time=0.78s (31.25%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 331|ppo_ep: 1|act_loss: -0.052337646484375|cri_loss: 0.04412841796875|unsuper_loss: 0.0 average reward score: 3.8359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.50%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 332|ppo_ep: 1|act_loss: 0.12115478515625|cri_loss: 0.1629638671875|unsuper_loss: 0.0 average reward score: 3.171875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.41%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 333|ppo_ep: 1|act_loss: 0.0147247314453125|cri_loss: 0.047698974609375|unsuper_loss: 0.0 average reward score: 4.046875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.56%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 334|ppo_ep: 1|act_loss: -0.08074951171875|cri_loss: 0.032257080078125|unsuper_loss: 0.0 average reward score: 5.09375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.99%) |Training time=0.78s (31.15%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 335|ppo_ep: 1|act_loss: -0.057952880859375|cri_loss: 0.042724609375|unsuper_loss: 0.0 average reward score: 3.677734375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.45s (58.11%) |Training time=0.82s (33.01%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 336|ppo_ep: 1|act_loss: -0.1143798828125|cri_loss: 0.045684814453125|unsuper_loss: 0.0 average reward score: 3.123046875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.35%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 337|ppo_ep: 1|act_loss: -0.1322021484375|cri_loss: 0.031494140625|unsuper_loss: 0.0 average reward score: 4.28125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.58%) |Training time=0.79s (31.62%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 338|ppo_ep: 1|act_loss: -0.144287109375|cri_loss: 0.035675048828125|unsuper_loss: 0.0 average reward score: 3.578125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.54%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 [2023-07-01 08:21:41,960] [INFO] [logging.py:96:log_dist] [Rank 0] step=340, skipped=10, lr=[8.023544714130509e-06, 8.023544714130509e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:21:42,140] [INFO] [timer.py:215:stop] epoch=0/micro_step=340/global_step=340, RunningAvgSamplesPerSec=52.08920957291715, CurrSamplesPerSec=51.04680739069268, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:21:42,300] [INFO] [logging.py:96:log_dist] [Rank 0] step=340, skipped=6, lr=[4.129556415368261e-06, 4.129556415368261e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 339|ppo_ep: 1|act_loss: -0.1048583984375|cri_loss: 0.0208587646484375|unsuper_loss: 0.0 average reward score: 3.08203125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.65%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 340|ppo_ep: 1|act_loss: -0.061767578125|cri_loss: 0.035980224609375|unsuper_loss: 0.0 average reward score: 3.326171875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.58%) |Training time=0.79s (31.58%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 341|ppo_ep: 1|act_loss: -0.05889892578125|cri_loss: 0.013214111328125|unsuper_loss: 0.0 average reward score: 3.626953125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.72%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 342|ppo_ep: 1|act_loss: -0.050262451171875|cri_loss: 0.043212890625|unsuper_loss: 0.0 average reward score: 3.720703125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.64%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 343|ppo_ep: 1|act_loss: -0.06964111328125|cri_loss: 0.1083984375|unsuper_loss: 0.0 average reward score: 4.0390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.74%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 344|ppo_ep: 1|act_loss: 0.08087158203125|cri_loss: 0.10015869140625|unsuper_loss: 0.0 average reward score: 2.064453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.71%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 345|ppo_ep: 1|act_loss: 0.041839599609375|cri_loss: 0.079345703125|unsuper_loss: 0.0 average reward score: 3.9296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.69%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 346|ppo_ep: 1|act_loss: 0.077392578125|cri_loss: 0.08160400390625|unsuper_loss: 0.0 average reward score: 2.216796875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.49%) |Training time=0.79s (31.74%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 347|ppo_ep: 1|act_loss: 0.0736083984375|cri_loss: 0.08294677734375|unsuper_loss: 0.0 average reward score: 4.203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.66%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 348|ppo_ep: 1|act_loss: 0.0135650634765625|cri_loss: 0.09674072265625|unsuper_loss: 0.0 average reward score: 3.330078125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.71%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 [2023-07-01 08:22:06,975] [INFO] [logging.py:96:log_dist] [Rank 0] step=350, skipped=10, lr=[7.888519489627777e-06, 7.888519489627777e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:22:07,151] [INFO] [timer.py:215:stop] epoch=0/micro_step=350/global_step=350, RunningAvgSamplesPerSec=52.05623551844447, CurrSamplesPerSec=50.85811852847716, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:22:07,310] [INFO] [logging.py:96:log_dist] [Rank 0] step=350, skipped=6, lr=[4.058724504646834e-06, 4.058724504646834e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 349|ppo_ep: 1|act_loss: 0.10675048828125|cri_loss: 0.12646484375|unsuper_loss: 0.0 average reward score: 1.84765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.73%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 350|ppo_ep: 1|act_loss: 0.11016845703125|cri_loss: 0.070068359375|unsuper_loss: 0.0 average reward score: 1.279296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.64%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 351|ppo_ep: 1|act_loss: 0.2406005859375|cri_loss: 0.1600341796875|unsuper_loss: 0.0 average reward score: 0.479248046875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.89%) |Training time=0.78s (31.38%) |Others=0.22 (8.73%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 [2023-07-01 08:22:14,799] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, but hysteresis is 2. Reducing hysteresis to 1 epoch: 0|step: 352|ppo_ep: 1|act_loss: 0.2386474609375|cri_loss: 0.1103515625|unsuper_loss: 0.0 average reward score: 1.291015625 ------------------------------------------------------------------------------------- |E2E latency=2.45s |Gather latency=0.00s (0.00%) |Generate time=1.49s (60.63%) |Training time=0.79s (32.32%) |Others=0.17 (7.06%)|CurSamplesPerSec=13.04 |AvgSamplesPerSec=12.81 [2023-07-01 08:22:17,261] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192 epoch: 0|step: 353|ppo_ep: 1|act_loss: 0.481201171875|cri_loss: 0.308349609375|unsuper_loss: 0.0 average reward score: -0.63427734375 ------------------------------------------------------------------------------------- |E2E latency=2.46s |Gather latency=0.00s (0.00%) |Generate time=1.49s (60.49%) |Training time=0.80s (32.48%) |Others=0.17 (7.03%)|CurSamplesPerSec=13.00 |AvgSamplesPerSec=12.81 epoch: 0|step: 354|ppo_ep: 1|act_loss: 0.296142578125|cri_loss: 0.20068359375|unsuper_loss: 0.0 average reward score: -0.2060546875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.71%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 355|ppo_ep: 1|act_loss: 0.456787109375|cri_loss: 0.278564453125|unsuper_loss: 0.0 average reward score: -0.403564453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.65%) |Training time=0.79s (31.55%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 356|ppo_ep: 1|act_loss: 0.443603515625|cri_loss: 0.2313232421875|unsuper_loss: 0.0 average reward score: -0.427734375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.66%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 357|ppo_ep: 1|act_loss: 0.359375|cri_loss: 0.1785888671875|unsuper_loss: 0.0 average reward score: 0.2083740234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.58%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 358|ppo_ep: 1|act_loss: 0.0963134765625|cri_loss: 0.2235107421875|unsuper_loss: 0.0 average reward score: -0.01922607421875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.53%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 [2023-07-01 08:22:31,884] [INFO] [logging.py:96:log_dist] [Rank 0] step=360, skipped=10, lr=[7.749348967910034e-06, 7.749348967910034e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:22:32,063] [INFO] [timer.py:215:stop] epoch=0/micro_step=360/global_step=360, RunningAvgSamplesPerSec=52.0301029292908, CurrSamplesPerSec=51.29909768320761, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:22:32,223] [INFO] [logging.py:96:log_dist] [Rank 0] step=360, skipped=8, lr=[4.0005357013709215e-06, 4.0005357013709215e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 359|ppo_ep: 1|act_loss: 0.2340087890625|cri_loss: 0.140380859375|unsuper_loss: 0.0 average reward score: -0.38818359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.58%) |Training time=0.79s (31.60%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 360|ppo_ep: 1|act_loss: 0.3876953125|cri_loss: 0.318115234375|unsuper_loss: 0.0 average reward score: -2.419921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.70%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 [2023-07-01 08:22:37,221] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096 epoch: 0|step: 361|ppo_ep: 1|act_loss: 0.129150390625|cri_loss: 0.09161376953125|unsuper_loss: 0.0 average reward score: -1.6640625 ------------------------------------------------------------------------------------- |E2E latency=2.45s |Gather latency=0.00s (0.00%) |Generate time=1.49s (60.64%) |Training time=0.79s (32.34%) |Others=0.17 (7.02%)|CurSamplesPerSec=13.04 |AvgSamplesPerSec=12.81 epoch: 0|step: 362|ppo_ep: 1|act_loss: 0.0443115234375|cri_loss: 0.25|unsuper_loss: 0.0 average reward score: -2.34765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.61%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 363|ppo_ep: 1|act_loss: 0.20361328125|cri_loss: 0.2042236328125|unsuper_loss: 0.0 average reward score: -2.720703125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.78s (31.42%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 364|ppo_ep: 1|act_loss: 0.02618408203125|cri_loss: 0.276123046875|unsuper_loss: 0.0 average reward score: -0.67919921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.73%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 365|ppo_ep: 1|act_loss: 0.2371826171875|cri_loss: 0.1641845703125|unsuper_loss: 0.0 average reward score: -2.16015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.75%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 366|ppo_ep: 1|act_loss: 0.006702423095703125|cri_loss: 0.2052001953125|unsuper_loss: 0.0 average reward score: -0.93212890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.60%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 367|ppo_ep: 1|act_loss: -0.04510498046875|cri_loss: 0.1639404296875|unsuper_loss: 0.0 average reward score: -2.287109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.59%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 368|ppo_ep: 1|act_loss: -0.252197265625|cri_loss: 0.254638671875|unsuper_loss: 0.0 average reward score: -1.240234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.68%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 [2023-07-01 08:22:56,833] [INFO] [logging.py:96:log_dist] [Rank 0] step=370, skipped=10, lr=[7.606221462835909e-06, 7.606221462835909e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:22:57,010] [INFO] [timer.py:215:stop] epoch=0/micro_step=370/global_step=370, RunningAvgSamplesPerSec=52.00473590207312, CurrSamplesPerSec=51.09381442985007, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:22:57,169] [INFO] [logging.py:96:log_dist] [Rank 0] step=370, skipped=9, lr=[3.933522533409623e-06, 3.933522533409623e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 369|ppo_ep: 1|act_loss: 0.0250396728515625|cri_loss: 0.1590576171875|unsuper_loss: 0.0 average reward score: -2.888671875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.67%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 370|ppo_ep: 1|act_loss: -0.1568603515625|cri_loss: 0.397705078125|unsuper_loss: 0.0 average reward score: -2.58984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.61%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 371|ppo_ep: 1|act_loss: 0.263916015625|cri_loss: 0.1297607421875|unsuper_loss: 0.0 average reward score: -3.072265625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.23%) |Training time=0.80s (31.94%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81 epoch: 0|step: 372|ppo_ep: 1|act_loss: -0.193359375|cri_loss: 0.11859130859375|unsuper_loss: 0.0 average reward score: -2.68359375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.67%) |Training time=0.79s (31.50%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81 epoch: 0|step: 373|ppo_ep: 1|act_loss: 0.08880615234375|cri_loss: 0.061126708984375|unsuper_loss: 0.0 average reward score: -1.970703125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.30%) |Training time=0.80s (31.90%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.81 epoch: 0|step: 374|ppo_ep: 1|act_loss: 0.0213165283203125|cri_loss: 0.137451171875|unsuper_loss: 0.0 average reward score: -2.658203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.70%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 375|ppo_ep: 1|act_loss: 0.08203125|cri_loss: 0.1251220703125|unsuper_loss: 0.0 average reward score: -2.24609375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.57%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 376|ppo_ep: 1|act_loss: -0.047088623046875|cri_loss: 0.1439208984375|unsuper_loss: 0.0 average reward score: -2.392578125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.90%) |Training time=0.78s (31.22%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 377|ppo_ep: 1|act_loss: -0.1260986328125|cri_loss: 0.2059326171875|unsuper_loss: 0.0 average reward score: -1.533203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.68%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 378|ppo_ep: 1|act_loss: -0.1795654296875|cri_loss: 0.10009765625|unsuper_loss: 0.0 average reward score: -3.173828125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.73%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 [2023-07-01 08:23:21,858] [INFO] [logging.py:96:log_dist] [Rank 0] step=380, skipped=10, lr=[7.459330642521499e-06, 7.459330642521499e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:23:22,035] [INFO] [timer.py:215:stop] epoch=0/micro_step=380/global_step=380, RunningAvgSamplesPerSec=51.976800306125085, CurrSamplesPerSec=51.10830902963015, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:23:22,193] [INFO] [logging.py:96:log_dist] [Rank 0] step=380, skipped=9, lr=[3.8572239314745966e-06, 3.8572239314745966e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 379|ppo_ep: 1|act_loss: -0.103515625|cri_loss: 0.037200927734375|unsuper_loss: 0.0 average reward score: -2.57421875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.63%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 380|ppo_ep: 1|act_loss: -0.11346435546875|cri_loss: 0.11126708984375|unsuper_loss: 0.0 average reward score: -3.271484375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.51%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 381|ppo_ep: 1|act_loss: -0.10687255859375|cri_loss: 0.1171875|unsuper_loss: 0.0 average reward score: -4.25390625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.69%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 382|ppo_ep: 1|act_loss: 0.185791015625|cri_loss: 0.1903076171875|unsuper_loss: 0.0 average reward score: -1.146484375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.75%) |Training time=0.79s (31.42%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 383|ppo_ep: 1|act_loss: 0.053070068359375|cri_loss: 0.094970703125|unsuper_loss: 0.0 average reward score: -3.900390625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.52%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 384|ppo_ep: 1|act_loss: 0.1495361328125|cri_loss: 0.08660888671875|unsuper_loss: 0.0 average reward score: -1.6904296875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.52%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 385|ppo_ep: 1|act_loss: 0.138671875|cri_loss: 0.09027099609375|unsuper_loss: 0.0 average reward score: -5.4375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.78s (31.43%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 386|ppo_ep: 1|act_loss: 0.14306640625|cri_loss: 0.0672607421875|unsuper_loss: 0.0 average reward score: -4.953125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.62%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81 epoch: 0|step: 387|ppo_ep: 1|act_loss: 0.12481689453125|cri_loss: 0.08941650390625|unsuper_loss: 0.0 average reward score: -4.46484375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.71%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81 epoch: 0|step: 388|ppo_ep: 1|act_loss: -0.0234375|cri_loss: 0.04107666015625|unsuper_loss: 0.0 average reward score: -2.8984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.64%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 [2023-07-01 08:23:46,862] [INFO] [logging.py:96:log_dist] [Rank 0] step=390, skipped=10, lr=[7.308875267284935e-06, 7.308875267284935e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:23:47,043] [INFO] [timer.py:215:stop] epoch=0/micro_step=390/global_step=390, RunningAvgSamplesPerSec=51.956256933038894, CurrSamplesPerSec=50.59014466794897, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:23:47,203] [INFO] [logging.py:96:log_dist] [Rank 0] step=390, skipped=9, lr=[3.779088848132372e-06, 3.779088848132372e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 389|ppo_ep: 1|act_loss: 0.15185546875|cri_loss: 0.13330078125|unsuper_loss: 0.0 average reward score: -1.75 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.40%) |Training time=0.80s (31.83%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 390|ppo_ep: 1|act_loss: 0.064697265625|cri_loss: 0.06304931640625|unsuper_loss: 0.0 average reward score: -3.287109375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.56%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 391|ppo_ep: 1|act_loss: -0.0550537109375|cri_loss: 0.05303955078125|unsuper_loss: 0.0 average reward score: -2.60546875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.60%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 392|ppo_ep: 1|act_loss: -0.005657196044921875|cri_loss: 0.1802978515625|unsuper_loss: 0.0 average reward score: -2.046875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.73%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 393|ppo_ep: 1|act_loss: -0.1446533203125|cri_loss: 0.07208251953125|unsuper_loss: 0.0 average reward score: -2.337890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.75%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 394|ppo_ep: 1|act_loss: -0.10504150390625|cri_loss: 0.179931640625|unsuper_loss: 0.0 average reward score: -2.212890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.82%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 395|ppo_ep: 1|act_loss: 0.1029052734375|cri_loss: 0.09686279296875|unsuper_loss: 0.0 average reward score: -4.046875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.68%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 396|ppo_ep: 1|act_loss: 0.0245361328125|cri_loss: 0.1693115234375|unsuper_loss: 0.0 average reward score: -2.314453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.36%) |Training time=0.79s (31.81%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 397|ppo_ep: 1|act_loss: -0.043304443359375|cri_loss: 0.04229736328125|unsuper_loss: 0.0 average reward score: -3.359375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.81%) |Training time=0.78s (31.41%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 398|ppo_ep: 1|act_loss: -0.2254638671875|cri_loss: 0.07806396484375|unsuper_loss: 0.0 average reward score: -3.142578125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.79%) |Training time=0.78s (31.43%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 [2023-07-01 08:24:11,869] [INFO] [logging.py:96:log_dist] [Rank 0] step=400, skipped=10, lr=[7.155058920700617e-06, 7.155058920700617e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:24:12,045] [INFO] [timer.py:215:stop] epoch=0/micro_step=400/global_step=400, RunningAvgSamplesPerSec=51.93466840082398, CurrSamplesPerSec=51.04929257384149, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:24:12,206] [INFO] [logging.py:96:log_dist] [Rank 0] step=400, skipped=9, lr=[3.6992230092138004e-06, 3.6992230092138004e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 399|ppo_ep: 1|act_loss: -0.0390625|cri_loss: 0.0706787109375|unsuper_loss: 0.0 average reward score: -4.3515625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.64%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 400|ppo_ep: 1|act_loss: 0.222900390625|cri_loss: 0.07318115234375|unsuper_loss: 0.0 average reward score: -2.384765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.67%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 401|ppo_ep: 1|act_loss: 0.051666259765625|cri_loss: 0.04010009765625|unsuper_loss: 0.0 average reward score: -4.3203125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.34%) |Training time=0.80s (31.85%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 402|ppo_ep: 1|act_loss: -0.08135986328125|cri_loss: 0.0285491943359375|unsuper_loss: 0.0 average reward score: -3.83984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.78%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 403|ppo_ep: 1|act_loss: -0.12310791015625|cri_loss: 0.06390380859375|unsuper_loss: 0.0 average reward score: -2.3125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.65%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 404|ppo_ep: 1|act_loss: -0.0479736328125|cri_loss: 0.042877197265625|unsuper_loss: 0.0 average reward score: -3.5234375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.66%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 405|ppo_ep: 1|act_loss: 0.060333251953125|cri_loss: 0.060272216796875|unsuper_loss: 0.0 average reward score: -3.236328125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.81%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 406|ppo_ep: 1|act_loss: 0.192138671875|cri_loss: 0.048065185546875|unsuper_loss: 0.0 average reward score: -3.92578125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.24%) |Training time=0.80s (31.99%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81 epoch: 0|step: 407|ppo_ep: 1|act_loss: 0.02398681640625|cri_loss: 0.0236053466796875|unsuper_loss: 0.0 average reward score: -1.84765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.80s (31.79%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 408|ppo_ep: 1|act_loss: -0.066650390625|cri_loss: 0.05072021484375|unsuper_loss: 0.0 average reward score: -2.701171875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.35%) |Training time=0.80s (31.81%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 [2023-07-01 08:24:36,897] [INFO] [logging.py:96:log_dist] [Rank 0] step=410, skipped=10, lr=[6.998089734127033e-06, 6.998089734127033e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:24:37,076] [INFO] [timer.py:215:stop] epoch=0/micro_step=410/global_step=410, RunningAvgSamplesPerSec=51.90624478389156, CurrSamplesPerSec=51.19460994249568, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:24:37,237] [INFO] [logging.py:96:log_dist] [Rank 0] step=410, skipped=9, lr=[3.6177344824627854e-06, 3.6177344824627854e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 409|ppo_ep: 1|act_loss: -0.09442138671875|cri_loss: 0.05120849609375|unsuper_loss: 0.0 average reward score: -1.87890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.62%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 410|ppo_ep: 1|act_loss: -0.09503173828125|cri_loss: 0.10931396484375|unsuper_loss: 0.0 average reward score: -0.666015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.54%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 411|ppo_ep: 1|act_loss: -0.183837890625|cri_loss: 0.2255859375|unsuper_loss: 0.0 average reward score: -0.67041015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.62%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 412|ppo_ep: 1|act_loss: -0.12164306640625|cri_loss: 0.1085205078125|unsuper_loss: 0.0 average reward score: 0.165771484375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.79s (31.49%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 413|ppo_ep: 1|act_loss: -0.1180419921875|cri_loss: 0.1715087890625|unsuper_loss: 0.0 average reward score: 0.2890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.69%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 414|ppo_ep: 1|act_loss: 0.0251922607421875|cri_loss: 0.12139892578125|unsuper_loss: 0.0 average reward score: 0.818359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.61%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 415|ppo_ep: 1|act_loss: 0.307861328125|cri_loss: 0.1944580078125|unsuper_loss: 0.0 average reward score: 0.89453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.72%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 416|ppo_ep: 1|act_loss: 0.1326904296875|cri_loss: 0.1396484375|unsuper_loss: 0.0 average reward score: 1.078125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.55%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 417|ppo_ep: 1|act_loss: -0.10302734375|cri_loss: 0.1336669921875|unsuper_loss: 0.0 average reward score: 1.86328125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.61%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 418|ppo_ep: 1|act_loss: -0.364990234375|cri_loss: 0.22509765625|unsuper_loss: 0.0 average reward score: 2.28125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.68%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 [2023-07-01 08:25:01,890] [INFO] [logging.py:96:log_dist] [Rank 0] step=420, skipped=10, lr=[6.838180105080878e-06, 6.838180105080878e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:25:02,071] [INFO] [timer.py:215:stop] epoch=0/micro_step=420/global_step=420, RunningAvgSamplesPerSec=51.88667175359361, CurrSamplesPerSec=50.3423646308708, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:25:02,232] [INFO] [logging.py:96:log_dist] [Rank 0] step=420, skipped=9, lr=[3.534733531308085e-06, 3.534733531308085e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 419|ppo_ep: 1|act_loss: -0.08856201171875|cri_loss: 0.1396484375|unsuper_loss: 0.0 average reward score: 1.4580078125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.24%) |Training time=0.80s (31.94%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 420|ppo_ep: 1|act_loss: -0.0119171142578125|cri_loss: 0.044158935546875|unsuper_loss: 0.0 average reward score: 0.61083984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.65%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 421|ppo_ep: 1|act_loss: -0.01837158203125|cri_loss: 0.08209228515625|unsuper_loss: 0.0 average reward score: 2.408203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.75%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 422|ppo_ep: 1|act_loss: -0.265625|cri_loss: 0.1708984375|unsuper_loss: 0.0 average reward score: 2.84765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.69%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 423|ppo_ep: 1|act_loss: 0.432373046875|cri_loss: 0.287109375|unsuper_loss: 0.0 average reward score: 3.68359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.65%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 424|ppo_ep: 1|act_loss: -0.404296875|cri_loss: 0.175048828125|unsuper_loss: 0.0 average reward score: 4.3046875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.72%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 425|ppo_ep: 1|act_loss: -0.05853271484375|cri_loss: 0.045562744140625|unsuper_loss: 0.0 average reward score: 4.19921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.69%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 426|ppo_ep: 1|act_loss: 0.7734375|cri_loss: 0.381591796875|unsuper_loss: 0.0 average reward score: 3.810546875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.53%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 427|ppo_ep: 1|act_loss: 0.50244140625|cri_loss: 0.3427734375|unsuper_loss: 0.0 average reward score: 4.34375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.64%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 428|ppo_ep: 1|act_loss: -0.017791748046875|cri_loss: 0.09185791015625|unsuper_loss: 0.0 average reward score: 2.7265625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.60%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 [2023-07-01 08:25:26,907] [INFO] [logging.py:96:log_dist] [Rank 0] step=430, skipped=10, lr=[6.675546409838583e-06, 6.675546409838583e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:25:27,083] [INFO] [timer.py:215:stop] epoch=0/micro_step=430/global_step=430, RunningAvgSamplesPerSec=51.865142424222554, CurrSamplesPerSec=50.65506157249227, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:25:27,242] [INFO] [logging.py:96:log_dist] [Rank 0] step=430, skipped=9, lr=[3.4503324656641074e-06, 3.4503324656641074e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 429|ppo_ep: 1|act_loss: -0.09375|cri_loss: 0.09967041015625|unsuper_loss: 0.0 average reward score: 4.0703125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.39%) |Training time=0.80s (31.87%) |Others=0.22 (8.73%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 430|ppo_ep: 1|act_loss: 0.040008544921875|cri_loss: 0.062286376953125|unsuper_loss: 0.0 average reward score: 3.318359375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.59%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 431|ppo_ep: 1|act_loss: -0.0281524658203125|cri_loss: 0.11102294921875|unsuper_loss: 0.0 average reward score: 3.7890625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.58%) |Training time=0.79s (31.62%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 432|ppo_ep: 1|act_loss: 0.1917724609375|cri_loss: 0.1549072265625|unsuper_loss: 0.0 average reward score: 3.83203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.50%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 433|ppo_ep: 1|act_loss: -0.1365966796875|cri_loss: 0.047943115234375|unsuper_loss: 0.0 average reward score: 4.01953125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.63%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 434|ppo_ep: 1|act_loss: -0.0265960693359375|cri_loss: 0.070068359375|unsuper_loss: 0.0 average reward score: 2.775390625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.39%) |Training time=0.80s (31.83%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 435|ppo_ep: 1|act_loss: 0.0946044921875|cri_loss: 0.064453125|unsuper_loss: 0.0 average reward score: 2.84375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.79s (31.44%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 436|ppo_ep: 1|act_loss: 0.1785888671875|cri_loss: 0.09979248046875|unsuper_loss: 0.0 average reward score: 2.943359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.60%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 437|ppo_ep: 1|act_loss: 0.49755859375|cri_loss: 0.343505859375|unsuper_loss: 0.0 average reward score: 2.65234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.74%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 438|ppo_ep: 1|act_loss: 0.321533203125|cri_loss: 0.1927490234375|unsuper_loss: 0.0 average reward score: 3.462890625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.28%) |Training time=0.80s (31.85%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81 [2023-07-01 08:25:51,911] [INFO] [logging.py:96:log_dist] [Rank 0] step=440, skipped=10, lr=[6.5104087106541136e-06, 6.5104087106541136e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:25:52,092] [INFO] [timer.py:215:stop] epoch=0/micro_step=440/global_step=440, RunningAvgSamplesPerSec=51.84449417609737, CurrSamplesPerSec=50.286176724696695, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:25:52,251] [INFO] [logging.py:96:log_dist] [Rank 0] step=440, skipped=9, lr=[3.364645489962566e-06, 3.364645489962566e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 439|ppo_ep: 1|act_loss: 0.1375732421875|cri_loss: 0.085693359375|unsuper_loss: 0.0 average reward score: 1.3857421875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.26%) |Training time=0.80s (32.00%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 440|ppo_ep: 1|act_loss: -0.1767578125|cri_loss: 0.04156494140625|unsuper_loss: 0.0 average reward score: 4.0 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.42%) |Training time=0.80s (31.82%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 441|ppo_ep: 1|act_loss: -0.217529296875|cri_loss: 0.065673828125|unsuper_loss: 0.0 average reward score: 2.2890625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.23%) |Training time=0.80s (32.04%) |Others=0.22 (8.73%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81 epoch: 0|step: 442|ppo_ep: 1|act_loss: -0.1175537109375|cri_loss: 0.041351318359375|unsuper_loss: 0.0 average reward score: 3.23828125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.41%) |Training time=0.80s (31.82%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 443|ppo_ep: 1|act_loss: -0.10382080078125|cri_loss: 0.015106201171875|unsuper_loss: 0.0 average reward score: 3.2890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.69%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 444|ppo_ep: 1|act_loss: -0.0259552001953125|cri_loss: 0.08148193359375|unsuper_loss: 0.0 average reward score: 1.93359375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.52%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 445|ppo_ep: 1|act_loss: -0.04638671875|cri_loss: 0.166748046875|unsuper_loss: 0.0 average reward score: 3.08984375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.79s (31.49%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 446|ppo_ep: 1|act_loss: -0.053070068359375|cri_loss: 0.04461669921875|unsuper_loss: 0.0 average reward score: 1.9794921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.57%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 447|ppo_ep: 1|act_loss: 0.06768798828125|cri_loss: 0.08343505859375|unsuper_loss: 0.0 average reward score: 3.490234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.57%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 448|ppo_ep: 1|act_loss: -0.1307373046875|cri_loss: 0.0931396484375|unsuper_loss: 0.0 average reward score: 3.224609375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.60%) |Others=0.22 (8.73%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 [2023-07-01 08:26:16,899] [INFO] [logging.py:96:log_dist] [Rank 0] step=450, skipped=10, lr=[6.342990457989214e-06, 6.342990457989214e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:26:17,075] [INFO] [timer.py:215:stop] epoch=0/micro_step=450/global_step=450, RunningAvgSamplesPerSec=51.82896790981009, CurrSamplesPerSec=51.82415783352665, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:26:17,234] [INFO] [logging.py:96:log_dist] [Rank 0] step=450, skipped=9, lr=[3.277788548620639e-06, 3.277788548620639e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 449|ppo_ep: 1|act_loss: 0.042144775390625|cri_loss: 0.0300445556640625|unsuper_loss: 0.0 average reward score: 3.412109375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.78%) |Training time=0.78s (31.40%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 450|ppo_ep: 1|act_loss: 0.12384033203125|cri_loss: 0.1572265625|unsuper_loss: 0.0 average reward score: 2.734375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.53%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 451|ppo_ep: 1|act_loss: 0.0927734375|cri_loss: 0.030181884765625|unsuper_loss: 0.0 average reward score: 2.31640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.65%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 452|ppo_ep: 1|act_loss: 0.1341552734375|cri_loss: 0.041839599609375|unsuper_loss: 0.0 average reward score: 3.46484375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.80s (31.79%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 453|ppo_ep: 1|act_loss: -0.00974273681640625|cri_loss: 0.0679931640625|unsuper_loss: 0.0 average reward score: 3.43359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.75%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 454|ppo_ep: 1|act_loss: 0.0266571044921875|cri_loss: 0.0303192138671875|unsuper_loss: 0.0 average reward score: 3.1640625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.72%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 455|ppo_ep: 1|act_loss: -0.0196075439453125|cri_loss: 0.0340576171875|unsuper_loss: 0.0 average reward score: 3.009765625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.21%) |Training time=0.80s (32.01%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.81 epoch: 0|step: 456|ppo_ep: 1|act_loss: 0.05120849609375|cri_loss: 0.0267181396484375|unsuper_loss: 0.0 average reward score: 3.541015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.71%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 457|ppo_ep: 1|act_loss: 0.0208892822265625|cri_loss: 0.06402587890625|unsuper_loss: 0.0 average reward score: 2.392578125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.90%) |Training time=0.78s (31.33%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 458|ppo_ep: 1|act_loss: 0.037109375|cri_loss: 0.039093017578125|unsuper_loss: 0.0 average reward score: 3.6484375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.96%) |Training time=0.78s (31.25%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 [2023-07-01 08:26:41,914] [INFO] [logging.py:96:log_dist] [Rank 0] step=460, skipped=10, lr=[6.173518188159017e-06, 6.173518188159017e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:26:42,090] [INFO] [timer.py:215:stop] epoch=0/micro_step=460/global_step=460, RunningAvgSamplesPerSec=51.81293659980854, CurrSamplesPerSec=51.52634425877704, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:26:42,250] [INFO] [logging.py:96:log_dist] [Rank 0] step=460, skipped=9, lr=[3.189879169154723e-06, 3.189879169154723e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 459|ppo_ep: 1|act_loss: -0.04974365234375|cri_loss: 0.05517578125|unsuper_loss: 0.0 average reward score: 4.078125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.72%) |Training time=0.79s (31.46%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 460|ppo_ep: 1|act_loss: 0.115234375|cri_loss: 0.06146240234375|unsuper_loss: 0.0 average reward score: 3.22265625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.79s (31.50%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 461|ppo_ep: 1|act_loss: -0.0013332366943359375|cri_loss: 0.056243896484375|unsuper_loss: 0.0 average reward score: 2.4296875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.41%) |Training time=0.80s (31.83%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81 epoch: 0|step: 462|ppo_ep: 1|act_loss: 0.1343994140625|cri_loss: 0.045166015625|unsuper_loss: 0.0 average reward score: 2.552734375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.70%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 463|ppo_ep: 1|act_loss: 0.192138671875|cri_loss: 0.205810546875|unsuper_loss: 0.0 average reward score: 2.0859375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.59%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 464|ppo_ep: 1|act_loss: -0.054290771484375|cri_loss: 0.0467529296875|unsuper_loss: 0.0 average reward score: 3.90234375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.50%) |Training time=0.79s (31.69%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 465|ppo_ep: 1|act_loss: -0.0231475830078125|cri_loss: 0.037078857421875|unsuper_loss: 0.0 average reward score: 3.13671875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.82%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 466|ppo_ep: 1|act_loss: -0.1474609375|cri_loss: 0.043487548828125|unsuper_loss: 0.0 average reward score: 3.0859375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.27%) |Training time=0.80s (31.96%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 467|ppo_ep: 1|act_loss: 0.0682373046875|cri_loss: 0.0576171875|unsuper_loss: 0.0 average reward score: 3.060546875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.36%) |Training time=0.80s (31.86%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 468|ppo_ep: 1|act_loss: 0.006526947021484375|cri_loss: 0.0452880859375|unsuper_loss: 0.0 average reward score: 4.16796875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.59%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 [2023-07-01 08:27:06,918] [INFO] [logging.py:96:log_dist] [Rank 0] step=470, skipped=10, lr=[6.002221216802128e-06, 6.002221216802128e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:27:07,099] [INFO] [timer.py:215:stop] epoch=0/micro_step=470/global_step=470, RunningAvgSamplesPerSec=51.79474671061898, CurrSamplesPerSec=51.72822604484345, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:27:07,259] [INFO] [logging.py:96:log_dist] [Rank 0] step=470, skipped=9, lr=[3.101036303152072e-06, 3.101036303152072e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 469|ppo_ep: 1|act_loss: 0.037017822265625|cri_loss: 0.0202484130859375|unsuper_loss: 0.0 average reward score: 3.955078125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.81%) |Training time=0.78s (31.36%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 470|ppo_ep: 1|act_loss: 0.1474609375|cri_loss: 0.055572509765625|unsuper_loss: 0.0 average reward score: 2.88671875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.79s (31.41%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 471|ppo_ep: 1|act_loss: 0.09393310546875|cri_loss: 0.05352783203125|unsuper_loss: 0.0 average reward score: 3.267578125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.56%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 472|ppo_ep: 1|act_loss: -0.005138397216796875|cri_loss: 0.0299835205078125|unsuper_loss: 0.0 average reward score: 2.87109375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.78s (31.45%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 473|ppo_ep: 1|act_loss: -0.0870361328125|cri_loss: 0.0927734375|unsuper_loss: 0.0 average reward score: 2.216796875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.53%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 474|ppo_ep: 1|act_loss: -0.0041656494140625|cri_loss: 0.05389404296875|unsuper_loss: 0.0 average reward score: 2.61328125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.60%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 475|ppo_ep: 1|act_loss: -0.047088623046875|cri_loss: 0.047393798828125|unsuper_loss: 0.0 average reward score: 2.06640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.65%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 476|ppo_ep: 1|act_loss: -0.0125885009765625|cri_loss: 0.06610107421875|unsuper_loss: 0.0 average reward score: 3.3515625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.57%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 477|ppo_ep: 1|act_loss: -0.0002532005310058594|cri_loss: 0.11529541015625|unsuper_loss: 0.0 average reward score: 1.6953125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.58%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 478|ppo_ep: 1|act_loss: -0.1527099609375|cri_loss: 0.1478271484375|unsuper_loss: 0.0 average reward score: 2.587890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.56%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 [2023-07-01 08:27:31,905] [INFO] [logging.py:96:log_dist] [Rank 0] step=480, skipped=10, lr=[5.829331328589974e-06, 5.829331328589974e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:27:32,080] [INFO] [timer.py:215:stop] epoch=0/micro_step=480/global_step=480, RunningAvgSamplesPerSec=51.78436242445469, CurrSamplesPerSec=50.665846255129615, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:27:32,240] [INFO] [logging.py:96:log_dist] [Rank 0] step=480, skipped=9, lr=[3.011380165315503e-06, 3.011380165315503e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 479|ppo_ep: 1|act_loss: -0.259521484375|cri_loss: 0.10260009765625|unsuper_loss: 0.0 average reward score: 2.890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.38%) |Training time=0.80s (31.87%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 480|ppo_ep: 1|act_loss: -0.128662109375|cri_loss: 0.037445068359375|unsuper_loss: 0.0 average reward score: 2.390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.79s (31.52%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 481|ppo_ep: 1|act_loss: -0.0643310546875|cri_loss: 0.042388916015625|unsuper_loss: 0.0 average reward score: 4.25 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.65%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 482|ppo_ep: 1|act_loss: 0.1011962890625|cri_loss: 0.06683349609375|unsuper_loss: 0.0 average reward score: 3.07421875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.72%) |Training time=0.79s (31.47%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 483|ppo_ep: 1|act_loss: 0.1641845703125|cri_loss: 0.1060791015625|unsuper_loss: 0.0 average reward score: 2.236328125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.64%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 484|ppo_ep: 1|act_loss: 0.014984130859375|cri_loss: 0.029022216796875|unsuper_loss: 0.0 average reward score: 2.958984375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.42%) |Training time=0.80s (31.77%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.81 epoch: 0|step: 485|ppo_ep: 1|act_loss: -0.0292510986328125|cri_loss: 0.02252197265625|unsuper_loss: 0.0 average reward score: 3.72265625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.80s (31.80%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 486|ppo_ep: 1|act_loss: -0.004482269287109375|cri_loss: 0.01898193359375|unsuper_loss: 0.0 average reward score: 4.72265625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.66%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 487|ppo_ep: 1|act_loss: -0.1724853515625|cri_loss: 0.03619384765625|unsuper_loss: 0.0 average reward score: 2.236328125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.77%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 488|ppo_ep: 1|act_loss: -0.05059814453125|cri_loss: 0.0300445556640625|unsuper_loss: 0.0 average reward score: 4.03515625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.30%) |Training time=0.80s (31.90%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 [2023-07-01 08:27:56,927] [INFO] [logging.py:96:log_dist] [Rank 0] step=490, skipped=10, lr=[5.655082463595249e-06, 5.655082463595249e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:27:57,106] [INFO] [timer.py:215:stop] epoch=0/micro_step=490/global_step=490, RunningAvgSamplesPerSec=51.767584021520136, CurrSamplesPerSec=51.21482851213101, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:27:57,266] [INFO] [logging.py:96:log_dist] [Rank 0] step=490, skipped=9, lr=[2.9210320707989525e-06, 2.9210320707989525e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 489|ppo_ep: 1|act_loss: 0.0249176025390625|cri_loss: 0.0303802490234375|unsuper_loss: 0.0 average reward score: 3.994140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.62%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 490|ppo_ep: 1|act_loss: -0.045806884765625|cri_loss: 0.016265869140625|unsuper_loss: 0.0 average reward score: 4.609375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.78%) |Training time=0.78s (31.46%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.87 |AvgSamplesPerSec=12.81 epoch: 0|step: 491|ppo_ep: 1|act_loss: -0.007472991943359375|cri_loss: 0.00922393798828125|unsuper_loss: 0.0 average reward score: 3.900390625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.78s (31.48%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 492|ppo_ep: 1|act_loss: 0.054168701171875|cri_loss: 0.0140533447265625|unsuper_loss: 0.0 average reward score: 4.49609375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.60%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 493|ppo_ep: 1|act_loss: 0.135498046875|cri_loss: 0.0302581787109375|unsuper_loss: 0.0 average reward score: 3.87109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.53%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 494|ppo_ep: 1|act_loss: 0.064208984375|cri_loss: 0.029998779296875|unsuper_loss: 0.0 average reward score: 3.58203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.75%) |Training time=0.79s (31.49%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 495|ppo_ep: 1|act_loss: -0.01373291015625|cri_loss: 0.0218353271484375|unsuper_loss: 0.0 average reward score: 3.40625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.83%) |Training time=0.78s (31.39%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 496|ppo_ep: 1|act_loss: -0.0264129638671875|cri_loss: 0.032501220703125|unsuper_loss: 0.0 average reward score: 3.6171875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.86%) |Training time=0.78s (31.28%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 497|ppo_ep: 1|act_loss: 0.031707763671875|cri_loss: 0.0582275390625|unsuper_loss: 0.0 average reward score: 3.048828125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.57%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 498|ppo_ep: 1|act_loss: 0.043731689453125|cri_loss: 0.0711669921875|unsuper_loss: 0.0 average reward score: 3.072265625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.72%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 [2023-07-01 08:28:21,878] [INFO] [logging.py:96:log_dist] [Rank 0] step=500, skipped=10, lr=[5.479710400743868e-06, 5.479710400743868e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:28:22,058] [INFO] [timer.py:215:stop] epoch=0/micro_step=500/global_step=500, RunningAvgSamplesPerSec=51.761869176640715, CurrSamplesPerSec=51.35148201386001, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:28:22,217] [INFO] [logging.py:96:log_dist] [Rank 0] step=500, skipped=9, lr=[2.830114271054013e-06, 2.830114271054013e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 499|ppo_ep: 1|act_loss: -0.049957275390625|cri_loss: 0.04779052734375|unsuper_loss: 0.0 average reward score: 3.689453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.59%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 500|ppo_ep: 1|act_loss: -0.16162109375|cri_loss: 0.0325927734375|unsuper_loss: 0.0 average reward score: 3.6015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.69%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 501|ppo_ep: 1|act_loss: -0.043731689453125|cri_loss: 0.054473876953125|unsuper_loss: 0.0 average reward score: 2.8828125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.26%) |Training time=0.80s (31.91%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81 epoch: 0|step: 502|ppo_ep: 1|act_loss: -0.09417724609375|cri_loss: 0.0143280029296875|unsuper_loss: 0.0 average reward score: 3.181640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.38%) |Training time=0.80s (31.79%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 503|ppo_ep: 1|act_loss: -0.12103271484375|cri_loss: 0.0193939208984375|unsuper_loss: 0.0 average reward score: 3.7578125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.34%) |Training time=0.80s (31.89%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 504|ppo_ep: 1|act_loss: -0.06085205078125|cri_loss: 0.01446533203125|unsuper_loss: 0.0 average reward score: 2.8984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.39%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 505|ppo_ep: 1|act_loss: 0.09393310546875|cri_loss: 0.02349853515625|unsuper_loss: 0.0 average reward score: 3.625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.75%) |Training time=0.79s (31.49%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 506|ppo_ep: 1|act_loss: 0.091064453125|cri_loss: 0.04632568359375|unsuper_loss: 0.0 average reward score: 2.390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.42%) |Training time=0.79s (31.74%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 507|ppo_ep: 1|act_loss: 0.041412353515625|cri_loss: 0.016265869140625|unsuper_loss: 0.0 average reward score: 3.16796875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.64%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 508|ppo_ep: 1|act_loss: -0.199951171875|cri_loss: 0.0562744140625|unsuper_loss: 0.0 average reward score: 2.87109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.91%) |Others=0.22 (8.73%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 [2023-07-01 08:28:46,900] [INFO] [logging.py:96:log_dist] [Rank 0] step=510, skipped=10, lr=[5.30345243877873e-06, 5.30345243877873e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:28:47,077] [INFO] [timer.py:215:stop] epoch=0/micro_step=510/global_step=510, RunningAvgSamplesPerSec=51.74410620499772, CurrSamplesPerSec=50.86902841618862, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:28:47,236] [INFO] [logging.py:96:log_dist] [Rank 0] step=510, skipped=9, lr=[2.7387497884095297e-06, 2.7387497884095297e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 509|ppo_ep: 1|act_loss: -0.10919189453125|cri_loss: 0.10418701171875|unsuper_loss: 0.0 average reward score: 3.232421875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.46%) |Training time=0.79s (31.80%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 510|ppo_ep: 1|act_loss: -0.11083984375|cri_loss: 0.10247802734375|unsuper_loss: 0.0 average reward score: 2.951171875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.78s (31.43%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 511|ppo_ep: 1|act_loss: 0.06085205078125|cri_loss: 0.1883544921875|unsuper_loss: 0.0 average reward score: 1.4443359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.53%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 512|ppo_ep: 1|act_loss: -0.00598907470703125|cri_loss: 0.038604736328125|unsuper_loss: 0.0 average reward score: 3.337890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.67%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 513|ppo_ep: 1|act_loss: -0.01195526123046875|cri_loss: 0.038055419921875|unsuper_loss: 0.0 average reward score: 2.54296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.43%) |Training time=0.79s (31.73%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 514|ppo_ep: 1|act_loss: 0.06494140625|cri_loss: 0.052398681640625|unsuper_loss: 0.0 average reward score: 2.95703125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.31%) |Training time=0.80s (31.90%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 515|ppo_ep: 1|act_loss: 0.0782470703125|cri_loss: 0.07891845703125|unsuper_loss: 0.0 average reward score: 1.26953125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.33%) |Training time=0.80s (31.82%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 516|ppo_ep: 1|act_loss: 0.11248779296875|cri_loss: 0.177001953125|unsuper_loss: 0.0 average reward score: 1.86328125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.72%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 517|ppo_ep: 1|act_loss: -0.0275726318359375|cri_loss: 0.10992431640625|unsuper_loss: 0.0 average reward score: 2.41796875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.60%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 518|ppo_ep: 1|act_loss: -0.1278076171875|cri_loss: 0.208251953125|unsuper_loss: 0.0 average reward score: 1.6474609375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.42%) |Others=0.22 (8.91%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 [2023-07-01 08:29:11,882] [INFO] [logging.py:96:log_dist] [Rank 0] step=520, skipped=10, lr=[5.126547075166989e-06, 5.126547075166989e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:29:12,062] [INFO] [timer.py:215:stop] epoch=0/micro_step=520/global_step=520, RunningAvgSamplesPerSec=51.73223575240543, CurrSamplesPerSec=51.60632420793602, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:29:12,221] [INFO] [logging.py:96:log_dist] [Rank 0] step=520, skipped=9, lr=[2.647062249608123e-06, 2.647062249608123e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 519|ppo_ep: 1|act_loss: 0.0419921875|cri_loss: 0.143798828125|unsuper_loss: 0.0 average reward score: 2.224609375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.78s (31.44%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 520|ppo_ep: 1|act_loss: 0.0538330078125|cri_loss: 0.08062744140625|unsuper_loss: 0.0 average reward score: 0.73583984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.49%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 521|ppo_ep: 1|act_loss: -0.0163726806640625|cri_loss: 0.0753173828125|unsuper_loss: 0.0 average reward score: 2.09765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.30%) |Training time=0.80s (31.90%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 522|ppo_ep: 1|act_loss: -0.0465087890625|cri_loss: 0.22265625|unsuper_loss: 0.0 average reward score: 0.1876220703125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.78s (31.47%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 523|ppo_ep: 1|act_loss: 0.180908203125|cri_loss: 0.226318359375|unsuper_loss: 0.0 average reward score: 0.41064453125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.78s (31.48%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 524|ppo_ep: 1|act_loss: 0.133544921875|cri_loss: 0.11236572265625|unsuper_loss: 0.0 average reward score: 0.63427734375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.64%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 525|ppo_ep: 1|act_loss: -0.01505279541015625|cri_loss: 0.07958984375|unsuper_loss: 0.0 average reward score: 0.46533203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.46%) |Training time=0.79s (31.73%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 526|ppo_ep: 1|act_loss: 0.051055908203125|cri_loss: 0.037017822265625|unsuper_loss: 0.0 average reward score: 2.390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.71%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 527|ppo_ep: 1|act_loss: 0.137451171875|cri_loss: 0.061920166015625|unsuper_loss: 0.0 average reward score: 1.20703125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.78%) |Training time=0.78s (31.44%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 528|ppo_ep: 1|act_loss: -0.040985107421875|cri_loss: 0.1102294921875|unsuper_loss: 0.0 average reward score: 0.55859375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.65%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 [2023-07-01 08:29:36,860] [INFO] [logging.py:96:log_dist] [Rank 0] step=530, skipped=10, lr=[4.949233683385321e-06, 4.949233683385321e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:29:37,036] [INFO] [timer.py:215:stop] epoch=0/micro_step=530/global_step=530, RunningAvgSamplesPerSec=51.721183361310175, CurrSamplesPerSec=50.70008552148463, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:29:37,196] [INFO] [logging.py:96:log_dist] [Rank 0] step=530, skipped=9, lr=[2.5551757185248656e-06, 2.5551757185248656e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 529|ppo_ep: 1|act_loss: -0.09222412109375|cri_loss: 0.051910400390625|unsuper_loss: 0.0 average reward score: 1.794921875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.41%) |Training time=0.80s (31.78%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81 epoch: 0|step: 530|ppo_ep: 1|act_loss: -0.02020263671875|cri_loss: 0.04315185546875|unsuper_loss: 0.0 average reward score: -0.61376953125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.53%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 531|ppo_ep: 1|act_loss: 0.00592803955078125|cri_loss: 0.080078125|unsuper_loss: 0.0 average reward score: 0.5224609375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.73%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 532|ppo_ep: 1|act_loss: 0.01611328125|cri_loss: 0.11322021484375|unsuper_loss: 0.0 average reward score: -1.646484375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.64%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 533|ppo_ep: 1|act_loss: 0.01515960693359375|cri_loss: 0.08782958984375|unsuper_loss: 0.0 average reward score: -1.4111328125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.73%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 534|ppo_ep: 1|act_loss: -0.04595947265625|cri_loss: 0.1668701171875|unsuper_loss: 0.0 average reward score: -2.87109375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.34%) |Training time=0.80s (31.88%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 535|ppo_ep: 1|act_loss: 0.07501220703125|cri_loss: 0.095458984375|unsuper_loss: 0.0 average reward score: -1.78515625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.33%) |Training time=0.80s (31.86%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 536|ppo_ep: 1|act_loss: -0.032012939453125|cri_loss: 0.12078857421875|unsuper_loss: 0.0 average reward score: -2.419921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.32%) |Training time=0.80s (31.87%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 [2023-07-01 08:29:56,856] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, but hysteresis is 2. Reducing hysteresis to 1 epoch: 0|step: 537|ppo_ep: 1|act_loss: -0.0178680419921875|cri_loss: 0.16845703125|unsuper_loss: 0.0 average reward score: -2.171875 ------------------------------------------------------------------------------------- |E2E latency=2.31s |Gather latency=0.00s (0.00%) |Generate time=1.48s (64.27%) |Training time=0.61s (26.22%) |Others=0.22 (9.51%)|CurSamplesPerSec=13.86 |AvgSamplesPerSec=12.81 [2023-07-01 08:29:59,164] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048 epoch: 0|step: 538|ppo_ep: 1|act_loss: 0.0302734375|cri_loss: 0.128173828125|unsuper_loss: 0.0 average reward score: -0.40087890625 ------------------------------------------------------------------------------------- |E2E latency=2.31s |Gather latency=0.00s (0.00%) |Generate time=1.48s (64.29%) |Training time=0.60s (26.16%) |Others=0.22 (9.55%)|CurSamplesPerSec=13.86 |AvgSamplesPerSec=12.81 [2023-07-01 08:30:01,484] [INFO] [logging.py:96:log_dist] [Rank 0] step=540, skipped=12, lr=[4.807250409408546e-06, 4.807250409408546e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:30:01,660] [INFO] [timer.py:215:stop] epoch=0/micro_step=540/global_step=540, RunningAvgSamplesPerSec=51.76420263166127, CurrSamplesPerSec=51.31545507585044, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:30:01,820] [INFO] [logging.py:96:log_dist] [Rank 0] step=540, skipped=9, lr=[2.46321452829447e-06, 2.46321452829447e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 539|ppo_ep: 1|act_loss: -0.033111572265625|cri_loss: 0.10748291015625|unsuper_loss: 0.0 average reward score: -1.4599609375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.60%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 540|ppo_ep: 1|act_loss: -0.1107177734375|cri_loss: 0.1082763671875|unsuper_loss: 0.0 average reward score: -3.384765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.79s (31.45%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 541|ppo_ep: 1|act_loss: 0.06768798828125|cri_loss: 0.0693359375|unsuper_loss: 0.0 average reward score: -3.0234375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.86%) |Training time=0.78s (31.35%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 542|ppo_ep: 1|act_loss: 0.248779296875|cri_loss: 0.07879638671875|unsuper_loss: 0.0 average reward score: -3.58984375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.78s (31.49%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 543|ppo_ep: 1|act_loss: 0.19091796875|cri_loss: 0.054412841796875|unsuper_loss: 0.0 average reward score: -1.0712890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.69%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 544|ppo_ep: 1|act_loss: 0.263671875|cri_loss: 0.1318359375|unsuper_loss: 0.0 average reward score: -2.3125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.83%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 545|ppo_ep: 1|act_loss: 0.1492919921875|cri_loss: 0.08697509765625|unsuper_loss: 0.0 average reward score: -3.083984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.76%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 546|ppo_ep: 1|act_loss: 0.1226806640625|cri_loss: 0.06744384765625|unsuper_loss: 0.0 average reward score: -2.255859375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.64%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 547|ppo_ep: 1|act_loss: 0.0078277587890625|cri_loss: 0.0673828125|unsuper_loss: 0.0 average reward score: -1.623046875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.26%) |Training time=0.80s (31.94%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 548|ppo_ep: 1|act_loss: -0.004375457763671875|cri_loss: 0.05511474609375|unsuper_loss: 0.0 average reward score: -0.75244140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.36%) |Training time=0.80s (31.83%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 [2023-07-01 08:30:26,468] [INFO] [logging.py:96:log_dist] [Rank 0] step=550, skipped=12, lr=[4.629807343170943e-06, 4.629807343170943e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:30:26,649] [INFO] [timer.py:215:stop] epoch=0/micro_step=550/global_step=550, RunningAvgSamplesPerSec=51.75146350786166, CurrSamplesPerSec=50.7776535196902, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:30:26,810] [INFO] [logging.py:96:log_dist] [Rank 0] step=550, skipped=9, lr=[2.371303113074134e-06, 2.371303113074134e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 549|ppo_ep: 1|act_loss: -0.1824951171875|cri_loss: 0.08197021484375|unsuper_loss: 0.0 average reward score: -2.2578125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.80s (31.73%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.81 epoch: 0|step: 550|ppo_ep: 1|act_loss: 0.0292816162109375|cri_loss: 0.09417724609375|unsuper_loss: 0.0 average reward score: -1.544921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.79s (31.45%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 551|ppo_ep: 1|act_loss: -0.08197021484375|cri_loss: 0.1343994140625|unsuper_loss: 0.0 average reward score: -1.4375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.84%) |Training time=0.78s (31.33%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 552|ppo_ep: 1|act_loss: -0.07244873046875|cri_loss: 0.03155517578125|unsuper_loss: 0.0 average reward score: -2.021484375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.53%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 [2023-07-01 08:30:36,466] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024 epoch: 0|step: 553|ppo_ep: 1|act_loss: -0.174560546875|cri_loss: 0.0902099609375|unsuper_loss: 0.0 average reward score: -1.0419921875 ------------------------------------------------------------------------------------- |E2E latency=2.32s |Gather latency=0.00s (0.00%) |Generate time=1.49s (64.23%) |Training time=0.61s (26.24%) |Others=0.22 (9.53%)|CurSamplesPerSec=13.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 554|ppo_ep: 1|act_loss: -0.2220458984375|cri_loss: 0.0888671875|unsuper_loss: 0.0 average reward score: -0.7626953125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.75%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 555|ppo_ep: 1|act_loss: -0.14794921875|cri_loss: 0.094482421875|unsuper_loss: 0.0 average reward score: -0.76318359375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.78s (31.51%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 556|ppo_ep: 1|act_loss: -0.0026340484619140625|cri_loss: 0.06329345703125|unsuper_loss: 0.0 average reward score: -0.335205078125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.44%) |Training time=0.79s (31.77%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 557|ppo_ep: 1|act_loss: 0.051513671875|cri_loss: 0.0293426513671875|unsuper_loss: 0.0 average reward score: -0.184814453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.38%) |Training time=0.80s (31.83%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 558|ppo_ep: 1|act_loss: 0.1036376953125|cri_loss: 0.06988525390625|unsuper_loss: 0.0 average reward score: -0.689453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.40%) |Training time=0.79s (31.80%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 [2023-07-01 08:30:51,273] [INFO] [logging.py:96:log_dist] [Rank 0] step=560, skipped=13, lr=[4.4703275677370524e-06, 4.4703275677370524e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:30:51,448] [INFO] [timer.py:215:stop] epoch=0/micro_step=560/global_step=560, RunningAvgSamplesPerSec=51.76853926361346, CurrSamplesPerSec=51.16896413689512, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:30:51,607] [INFO] [logging.py:96:log_dist] [Rank 0] step=560, skipped=9, lr=[2.279565839669693e-06, 2.279565839669693e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 559|ppo_ep: 1|act_loss: 0.09820556640625|cri_loss: 0.020111083984375|unsuper_loss: 0.0 average reward score: -0.248291015625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.66%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 560|ppo_ep: 1|act_loss: 0.057952880859375|cri_loss: 0.020294189453125|unsuper_loss: 0.0 average reward score: 0.00335693359375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.86%) |Training time=0.78s (31.32%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.81 epoch: 0|step: 561|ppo_ep: 1|act_loss: -0.055572509765625|cri_loss: 0.0701904296875|unsuper_loss: 0.0 average reward score: -0.8681640625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.72%) |Training time=0.78s (31.46%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 562|ppo_ep: 1|act_loss: -0.043365478515625|cri_loss: 0.030242919921875|unsuper_loss: 0.0 average reward score: -0.0408935546875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.50%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 563|ppo_ep: 1|act_loss: -0.241943359375|cri_loss: 0.09368896484375|unsuper_loss: 0.0 average reward score: -1.150390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.52%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 564|ppo_ep: 1|act_loss: 0.0892333984375|cri_loss: 0.0275421142578125|unsuper_loss: 0.0 average reward score: -1.296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.79s (31.46%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 565|ppo_ep: 1|act_loss: 0.1240234375|cri_loss: 0.0247955322265625|unsuper_loss: 0.0 average reward score: -0.0386962890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.56%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 566|ppo_ep: 1|act_loss: 0.11968994140625|cri_loss: 0.033172607421875|unsuper_loss: 0.0 average reward score: 0.352783203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.44%) |Training time=0.79s (31.70%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 567|ppo_ep: 1|act_loss: 0.0838623046875|cri_loss: 0.0297698974609375|unsuper_loss: 0.0 average reward score: 0.21728515625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.46%) |Training time=0.79s (31.67%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 568|ppo_ep: 1|act_loss: 0.11962890625|cri_loss: 0.04913330078125|unsuper_loss: 0.0 average reward score: -0.8125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.69%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 [2023-07-01 08:31:16,233] [INFO] [logging.py:96:log_dist] [Rank 0] step=570, skipped=13, lr=[4.293591324008047e-06, 4.293591324008047e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:31:16,412] [INFO] [timer.py:215:stop] epoch=0/micro_step=570/global_step=570, RunningAvgSamplesPerSec=51.7616068490536, CurrSamplesPerSec=51.40755675533235, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:31:16,572] [INFO] [logging.py:96:log_dist] [Rank 0] step=570, skipped=9, lr=[2.1881268392529074e-06, 2.1881268392529074e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 569|ppo_ep: 1|act_loss: 0.006191253662109375|cri_loss: 0.026611328125|unsuper_loss: 0.0 average reward score: 0.263916015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.48%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 570|ppo_ep: 1|act_loss: -0.060089111328125|cri_loss: 0.077392578125|unsuper_loss: 0.0 average reward score: 0.0289306640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.74%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 571|ppo_ep: 1|act_loss: -0.1455078125|cri_loss: 0.08050537109375|unsuper_loss: 0.0 average reward score: 0.332275390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.35%) |Training time=0.80s (31.87%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 572|ppo_ep: 1|act_loss: -0.07379150390625|cri_loss: 0.061767578125|unsuper_loss: 0.0 average reward score: 1.40234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.69%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 573|ppo_ep: 1|act_loss: -0.032318115234375|cri_loss: 0.0406494140625|unsuper_loss: 0.0 average reward score: -0.324462890625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.41%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 574|ppo_ep: 1|act_loss: 0.0238494873046875|cri_loss: 0.0222015380859375|unsuper_loss: 0.0 average reward score: -1.2275390625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (60.06%) |Training time=0.77s (31.13%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.87 |AvgSamplesPerSec=12.81 epoch: 0|step: 575|ppo_ep: 1|act_loss: 0.0130462646484375|cri_loss: 0.0172271728515625|unsuper_loss: 0.0 average reward score: -1.2734375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.83%) |Training time=0.78s (31.31%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 576|ppo_ep: 1|act_loss: -0.06475830078125|cri_loss: 0.1422119140625|unsuper_loss: 0.0 average reward score: 1.521484375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.73%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 577|ppo_ep: 1|act_loss: -0.08251953125|cri_loss: 0.0236663818359375|unsuper_loss: 0.0 average reward score: 0.89404296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.80s (31.78%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 578|ppo_ep: 1|act_loss: -0.1185302734375|cri_loss: 0.053955078125|unsuper_loss: 0.0 average reward score: 1.5 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.66%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 [2023-07-01 08:31:41,208] [INFO] [logging.py:96:log_dist] [Rank 0] step=580, skipped=13, lr=[4.117574137857126e-06, 4.117574137857126e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:31:41,384] [INFO] [timer.py:215:stop] epoch=0/micro_step=580/global_step=580, RunningAvgSamplesPerSec=51.75262658340682, CurrSamplesPerSec=51.67767707566083, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:31:41,545] [INFO] [logging.py:96:log_dist] [Rank 0] step=580, skipped=9, lr=[2.097109839397588e-06, 2.097109839397588e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 579|ppo_ep: 1|act_loss: -0.0074310302734375|cri_loss: 0.007106781005859375|unsuper_loss: 0.0 average reward score: 0.39892578125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.78s (31.47%) |Others=0.22 (8.90%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 580|ppo_ep: 1|act_loss: -0.05609130859375|cri_loss: 0.061798095703125|unsuper_loss: 0.0 average reward score: 1.068359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.73%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 581|ppo_ep: 1|act_loss: 0.018035888671875|cri_loss: 0.020660400390625|unsuper_loss: 0.0 average reward score: -0.53759765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.73%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 582|ppo_ep: 1|act_loss: 0.048309326171875|cri_loss: 0.009765625|unsuper_loss: 0.0 average reward score: 1.193359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.47%) |Training time=0.79s (31.71%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 583|ppo_ep: 1|act_loss: 0.061309814453125|cri_loss: 0.02166748046875|unsuper_loss: 0.0 average reward score: 1.8447265625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.35%) |Training time=0.80s (31.86%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 584|ppo_ep: 1|act_loss: -0.0028705596923828125|cri_loss: 0.01517486572265625|unsuper_loss: 0.0 average reward score: 0.37451171875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.53%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 585|ppo_ep: 1|act_loss: -0.009521484375|cri_loss: 0.024169921875|unsuper_loss: 0.0 average reward score: 0.9423828125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.63%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 586|ppo_ep: 1|act_loss: -0.07073974609375|cri_loss: 0.024169921875|unsuper_loss: 0.0 average reward score: 1.3076171875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.54%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 587|ppo_ep: 1|act_loss: -0.00460052490234375|cri_loss: 0.023193359375|unsuper_loss: 0.0 average reward score: -1.7431640625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.98%) |Training time=0.78s (31.22%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 588|ppo_ep: 1|act_loss: 0.040557861328125|cri_loss: 0.0224761962890625|unsuper_loss: 0.0 average reward score: -0.30126953125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.81%) |Training time=0.78s (31.37%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.81 [2023-07-01 08:32:06,189] [INFO] [logging.py:96:log_dist] [Rank 0] step=590, skipped=13, lr=[3.94251418095384e-06, 3.94251418095384e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:32:06,364] [INFO] [timer.py:215:stop] epoch=0/micro_step=590/global_step=590, RunningAvgSamplesPerSec=51.74402132404825, CurrSamplesPerSec=51.91287524531589, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:32:06,525] [INFO] [logging.py:96:log_dist] [Rank 0] step=590, skipped=9, lr=[2.0066379966618336e-06, 2.0066379966618336e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 589|ppo_ep: 1|act_loss: 0.0209197998046875|cri_loss: 0.01690673828125|unsuper_loss: 0.0 average reward score: 1.0419921875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.79%) |Training time=0.78s (31.36%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 590|ppo_ep: 1|act_loss: -0.031829833984375|cri_loss: 0.0240020751953125|unsuper_loss: 0.0 average reward score: 1.7119140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.60%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 591|ppo_ep: 1|act_loss: -0.004337310791015625|cri_loss: 0.0107879638671875|unsuper_loss: 0.0 average reward score: -0.9765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.74%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 592|ppo_ep: 1|act_loss: -0.050384521484375|cri_loss: 0.0309600830078125|unsuper_loss: 0.0 average reward score: -0.552734375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.78s (31.45%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 593|ppo_ep: 1|act_loss: 0.0184478759765625|cri_loss: 0.04168701171875|unsuper_loss: 0.0 average reward score: 0.400390625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.54%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 594|ppo_ep: 1|act_loss: -0.06683349609375|cri_loss: 0.040008544921875|unsuper_loss: 0.0 average reward score: 1.2734375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.63%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 595|ppo_ep: 1|act_loss: -0.0767822265625|cri_loss: 0.11517333984375|unsuper_loss: 0.0 average reward score: 0.401123046875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.62%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 596|ppo_ep: 1|act_loss: -0.2257080078125|cri_loss: 0.1441650390625|unsuper_loss: 0.0 average reward score: 0.04931640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.84%) |Training time=0.78s (31.32%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 [2023-07-01 08:32:26,493] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, but hysteresis is 2. Reducing hysteresis to 1 epoch: 0|step: 597|ppo_ep: 1|act_loss: -0.38134765625|cri_loss: 0.15234375|unsuper_loss: 0.0 average reward score: 0.12054443359375 ------------------------------------------------------------------------------------- |E2E latency=2.45s |Gather latency=0.00s (0.00%) |Generate time=1.49s (60.90%) |Training time=0.79s (32.05%) |Others=0.17 (7.05%)|CurSamplesPerSec=13.05 |AvgSamplesPerSec=12.81 [2023-07-01 08:32:28,949] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192 epoch: 0|step: 598|ppo_ep: 1|act_loss: -0.5107421875|cri_loss: 0.173828125|unsuper_loss: 0.0 average reward score: 0.1925048828125 ------------------------------------------------------------------------------------- |E2E latency=2.46s |Gather latency=0.00s (0.00%) |Generate time=1.49s (60.78%) |Training time=0.79s (32.16%) |Others=0.17 (7.06%)|CurSamplesPerSec=13.03 |AvgSamplesPerSec=12.81 [2023-07-01 08:32:31,075] [INFO] [logging.py:96:log_dist] [Rank 0] step=600, skipped=13, lr=[3.7686483297255346e-06, 3.7686483297255346e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:32:31,255] [INFO] [timer.py:215:stop] epoch=0/micro_step=600/global_step=600, RunningAvgSamplesPerSec=51.73574276791265, CurrSamplesPerSec=50.58682692507015, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:32:31,416] [INFO] [logging.py:96:log_dist] [Rank 0] step=600, skipped=11, lr=[1.9347353301195425e-06, 1.9347353301195425e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 599|ppo_ep: 1|act_loss: -0.2230224609375|cri_loss: 0.0523681640625|unsuper_loss: 0.0 average reward score: 0.7177734375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.38%) |Training time=0.80s (31.79%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.81 epoch: 0|step: 600|ppo_ep: 1|act_loss: 0.050506591796875|cri_loss: 0.035491943359375|unsuper_loss: 0.0 average reward score: 1.2216796875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.41%) |Training time=0.79s (31.75%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 601|ppo_ep: 1|act_loss: 0.11126708984375|cri_loss: 0.03668212890625|unsuper_loss: 0.0 average reward score: 0.359619140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.70%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 602|ppo_ep: 1|act_loss: 0.14208984375|cri_loss: 0.0264892578125|unsuper_loss: 0.0 average reward score: 0.19873046875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.65%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 603|ppo_ep: 1|act_loss: 0.08056640625|cri_loss: 0.0195159912109375|unsuper_loss: 0.0 average reward score: -1.771484375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.68%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 604|ppo_ep: 1|act_loss: 0.0020618438720703125|cri_loss: 0.038360595703125|unsuper_loss: 0.0 average reward score: 0.88525390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.88%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 605|ppo_ep: 1|act_loss: 0.029327392578125|cri_loss: 0.018585205078125|unsuper_loss: 0.0 average reward score: -0.71630859375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.75%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 606|ppo_ep: 1|act_loss: -0.181396484375|cri_loss: 0.08489990234375|unsuper_loss: 0.0 average reward score: 1.130859375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.67%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 607|ppo_ep: 1|act_loss: -0.1104736328125|cri_loss: 0.025054931640625|unsuper_loss: 0.0 average reward score: 0.64111328125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.65%) |Training time=0.79s (31.57%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 608|ppo_ep: 1|act_loss: -0.138916015625|cri_loss: 0.032257080078125|unsuper_loss: 0.0 average reward score: -0.103759765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.53%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 [2023-07-01 08:32:56,060] [INFO] [logging.py:96:log_dist] [Rank 0] step=610, skipped=13, lr=[3.596211844836072e-06, 3.596211844836072e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:32:56,236] [INFO] [timer.py:215:stop] epoch=0/micro_step=610/global_step=610, RunningAvgSamplesPerSec=51.72506665688101, CurrSamplesPerSec=51.55987512019124, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:32:56,397] [INFO] [logging.py:96:log_dist] [Rank 0] step=610, skipped=11, lr=[1.8455526643329995e-06, 1.8455526643329995e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 609|ppo_ep: 1|act_loss: -0.1690673828125|cri_loss: 0.05035400390625|unsuper_loss: 0.0 average reward score: 0.223876953125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.69%) |Training time=0.79s (31.48%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 610|ppo_ep: 1|act_loss: -0.2095947265625|cri_loss: 0.0433349609375|unsuper_loss: 0.0 average reward score: 1.4287109375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.39%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 611|ppo_ep: 1|act_loss: -0.07135009765625|cri_loss: 0.0251007080078125|unsuper_loss: 0.0 average reward score: 0.0029296875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.84%) |Training time=0.78s (31.35%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 612|ppo_ep: 1|act_loss: 0.01320648193359375|cri_loss: 0.02313232421875|unsuper_loss: 0.0 average reward score: 1.3203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.38%) |Training time=0.79s (31.74%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 613|ppo_ep: 1|act_loss: 0.10357666015625|cri_loss: 0.057403564453125|unsuper_loss: 0.0 average reward score: 1.4208984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.62%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 614|ppo_ep: 1|act_loss: 0.1279296875|cri_loss: 0.058258056640625|unsuper_loss: 0.0 average reward score: 2.5390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.70%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 615|ppo_ep: 1|act_loss: -0.004261016845703125|cri_loss: 0.016510009765625|unsuper_loss: 0.0 average reward score: 1.955078125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.46%) |Training time=0.79s (31.74%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 616|ppo_ep: 1|act_loss: -0.0277099609375|cri_loss: 0.05364990234375|unsuper_loss: 0.0 average reward score: 2.4296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.70%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 617|ppo_ep: 1|act_loss: 0.038665771484375|cri_loss: 0.0241241455078125|unsuper_loss: 0.0 average reward score: 1.7587890625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.31%) |Training time=0.80s (31.83%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81 epoch: 0|step: 618|ppo_ep: 1|act_loss: -0.037811279296875|cri_loss: 0.0217742919921875|unsuper_loss: 0.0 average reward score: 2.966796875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.27%) |Training time=0.80s (31.90%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 [2023-07-01 08:33:21,074] [INFO] [logging.py:96:log_dist] [Rank 0] step=620, skipped=13, lr=[3.4254380528508618e-06, 3.4254380528508618e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:33:21,253] [INFO] [timer.py:215:stop] epoch=0/micro_step=620/global_step=620, RunningAvgSamplesPerSec=51.71367358422307, CurrSamplesPerSec=50.988746347777095, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:33:21,412] [INFO] [logging.py:96:log_dist] [Rank 0] step=620, skipped=11, lr=[1.7572555417026524e-06, 1.7572555417026524e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 619|ppo_ep: 1|act_loss: -0.12939453125|cri_loss: 0.0193328857421875|unsuper_loss: 0.0 average reward score: 2.59765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.65%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 620|ppo_ep: 1|act_loss: -0.05157470703125|cri_loss: 0.01513671875|unsuper_loss: 0.0 average reward score: 3.564453125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.02%) |Training time=0.78s (31.17%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 621|ppo_ep: 1|act_loss: 0.005176544189453125|cri_loss: 0.01505279541015625|unsuper_loss: 0.0 average reward score: 2.03515625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.78s (31.48%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 622|ppo_ep: 1|act_loss: -0.038787841796875|cri_loss: 0.0270538330078125|unsuper_loss: 0.0 average reward score: 2.75 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.54%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 623|ppo_ep: 1|act_loss: -0.01690673828125|cri_loss: 0.0223388671875|unsuper_loss: 0.0 average reward score: 3.76171875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.75%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 624|ppo_ep: 1|act_loss: 0.04852294921875|cri_loss: 0.0247039794921875|unsuper_loss: 0.0 average reward score: 2.546875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.63%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 625|ppo_ep: 1|act_loss: 0.048858642578125|cri_loss: 0.04608154296875|unsuper_loss: 0.0 average reward score: 3.525390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.72%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 626|ppo_ep: 1|act_loss: 0.019012451171875|cri_loss: 0.038665771484375|unsuper_loss: 0.0 average reward score: 3.388671875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.42%) |Training time=0.79s (31.78%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 627|ppo_ep: 1|act_loss: 0.0340576171875|cri_loss: 0.07501220703125|unsuper_loss: 0.0 average reward score: 1.6201171875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.56%) |Training time=0.79s (31.61%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 628|ppo_ep: 1|act_loss: -0.0972900390625|cri_loss: 0.046966552734375|unsuper_loss: 0.0 average reward score: 3.111328125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.34%) |Training time=0.80s (31.82%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 [2023-07-01 08:33:46,043] [INFO] [logging.py:96:log_dist] [Rank 0] step=630, skipped=13, lr=[3.256558030518954e-06, 3.256558030518954e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:33:46,223] [INFO] [timer.py:215:stop] epoch=0/micro_step=630/global_step=630, RunningAvgSamplesPerSec=51.706242726652974, CurrSamplesPerSec=51.18045668138959, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:33:46,382] [INFO] [logging.py:96:log_dist] [Rank 0] step=630, skipped=11, lr=[1.6699634384772317e-06, 1.6699634384772317e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 629|ppo_ep: 1|act_loss: -0.159423828125|cri_loss: 0.06622314453125|unsuper_loss: 0.0 average reward score: 3.0390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.68%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 630|ppo_ep: 1|act_loss: -0.1016845703125|cri_loss: 0.04388427734375|unsuper_loss: 0.0 average reward score: 1.5244140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.82%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 631|ppo_ep: 1|act_loss: -0.0015020370483398438|cri_loss: 0.0682373046875|unsuper_loss: 0.0 average reward score: 0.505859375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.61%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 632|ppo_ep: 1|act_loss: 0.04833984375|cri_loss: 0.06500244140625|unsuper_loss: 0.0 average reward score: 1.50390625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.57%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 633|ppo_ep: 1|act_loss: 0.046966552734375|cri_loss: 0.0240020751953125|unsuper_loss: 0.0 average reward score: 0.021728515625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.56%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 634|ppo_ep: 1|act_loss: 0.1358642578125|cri_loss: 0.053955078125|unsuper_loss: 0.0 average reward score: 0.98583984375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.83%) |Training time=0.78s (31.30%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 635|ppo_ep: 1|act_loss: 0.0301666259765625|cri_loss: 0.035614013671875|unsuper_loss: 0.0 average reward score: 1.46484375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.85%) |Training time=0.78s (31.30%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 636|ppo_ep: 1|act_loss: 0.164794921875|cri_loss: 0.11541748046875|unsuper_loss: 0.0 average reward score: 1.4052734375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.65%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 637|ppo_ep: 1|act_loss: 0.07781982421875|cri_loss: 0.08154296875|unsuper_loss: 0.0 average reward score: 2.16796875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.49%) |Training time=0.79s (31.70%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 638|ppo_ep: 1|act_loss: 0.10162353515625|cri_loss: 0.0595703125|unsuper_loss: 0.0 average reward score: 1.634765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.78%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 [2023-07-01 08:34:11,045] [INFO] [logging.py:96:log_dist] [Rank 0] step=640, skipped=13, lr=[3.0898002920993932e-06, 3.0898002920993932e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:34:11,221] [INFO] [timer.py:215:stop] epoch=0/micro_step=640/global_step=640, RunningAvgSamplesPerSec=51.69824265860959, CurrSamplesPerSec=50.96394492368189, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:34:11,380] [INFO] [logging.py:96:log_dist] [Rank 0] step=640, skipped=11, lr=[1.5837944709976382e-06, 1.5837944709976382e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 639|ppo_ep: 1|act_loss: 0.2626953125|cri_loss: 0.1829833984375|unsuper_loss: 0.0 average reward score: 1.0966796875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.75%) |Others=0.22 (8.73%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 640|ppo_ep: 1|act_loss: -0.033477783203125|cri_loss: 0.1353759765625|unsuper_loss: 0.0 average reward score: -1.376953125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.64%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 641|ppo_ep: 1|act_loss: -0.066650390625|cri_loss: 0.2049560546875|unsuper_loss: 0.0 average reward score: 1.287109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.63%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 642|ppo_ep: 1|act_loss: -0.2491455078125|cri_loss: 0.13037109375|unsuper_loss: 0.0 average reward score: 1.34375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.48%) |Training time=0.79s (31.70%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 643|ppo_ep: 1|act_loss: -0.038726806640625|cri_loss: 0.0667724609375|unsuper_loss: 0.0 average reward score: 2.1171875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.78s (31.48%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 644|ppo_ep: 1|act_loss: 0.0672607421875|cri_loss: 0.076416015625|unsuper_loss: 0.0 average reward score: 2.423828125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.55%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 645|ppo_ep: 1|act_loss: 0.1790771484375|cri_loss: 0.06671142578125|unsuper_loss: 0.0 average reward score: 0.13232421875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.40%) |Training time=0.80s (31.79%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.81 epoch: 0|step: 646|ppo_ep: 1|act_loss: 0.0909423828125|cri_loss: 0.0675048828125|unsuper_loss: 0.0 average reward score: 2.146484375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.34%) |Training time=0.80s (31.82%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.74 |AvgSamplesPerSec=12.81 epoch: 0|step: 647|ppo_ep: 1|act_loss: -0.008148193359375|cri_loss: 0.0309906005859375|unsuper_loss: 0.0 average reward score: 1.6806640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.70%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 648|ppo_ep: 1|act_loss: -0.0278167724609375|cri_loss: 0.061737060546875|unsuper_loss: 0.0 average reward score: 2.08203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.41%) |Training time=0.79s (31.73%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 [2023-07-01 08:34:36,046] [INFO] [logging.py:96:log_dist] [Rank 0] step=650, skipped=13, lr=[2.9253904801549233e-06, 2.9253904801549233e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:34:36,226] [INFO] [timer.py:215:stop] epoch=0/micro_step=650/global_step=650, RunningAvgSamplesPerSec=51.68767145837992, CurrSamplesPerSec=50.93491476359568, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:34:36,386] [INFO] [logging.py:96:log_dist] [Rank 0] step=650, skipped=11, lr=[1.4988652358718336e-06, 1.4988652358718336e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 649|ppo_ep: 1|act_loss: -0.035491943359375|cri_loss: 0.043975830078125|unsuper_loss: 0.0 average reward score: -0.086669921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.74%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 650|ppo_ep: 1|act_loss: -0.12213134765625|cri_loss: 0.0364990234375|unsuper_loss: 0.0 average reward score: 1.162109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.39%) |Training time=0.79s (31.80%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 651|ppo_ep: 1|act_loss: -0.0167694091796875|cri_loss: 0.028594970703125|unsuper_loss: 0.0 average reward score: 2.908203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.34%) |Training time=0.80s (31.85%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 652|ppo_ep: 1|act_loss: -0.0229644775390625|cri_loss: 0.07305908203125|unsuper_loss: 0.0 average reward score: 1.5966796875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.75%) |Others=0.22 (8.73%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 653|ppo_ep: 1|act_loss: 0.06427001953125|cri_loss: 0.035614013671875|unsuper_loss: 0.0 average reward score: 1.6513671875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.46%) |Training time=0.79s (31.79%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 654|ppo_ep: 1|act_loss: 0.06268310546875|cri_loss: 0.045928955078125|unsuper_loss: 0.0 average reward score: 2.173828125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.78s (31.47%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.81 epoch: 0|step: 655|ppo_ep: 1|act_loss: -0.005828857421875|cri_loss: 0.01739501953125|unsuper_loss: 0.0 average reward score: 1.564453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.64%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 656|ppo_ep: 1|act_loss: 0.08050537109375|cri_loss: 0.0489501953125|unsuper_loss: 0.0 average reward score: 2.177734375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.78s (31.44%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 657|ppo_ep: 1|act_loss: 0.0211639404296875|cri_loss: 0.065185546875|unsuper_loss: 0.0 average reward score: 2.453125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.84%) |Training time=0.78s (31.39%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 658|ppo_ep: 1|act_loss: -0.07470703125|cri_loss: 0.0252685546875|unsuper_loss: 0.0 average reward score: 1.8779296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.58%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 [2023-07-01 08:35:01,025] [INFO] [logging.py:96:log_dist] [Rank 0] step=660, skipped=13, lr=[2.763551060231423e-06, 2.763551060231423e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:35:01,200] [INFO] [timer.py:215:stop] epoch=0/micro_step=660/global_step=660, RunningAvgSamplesPerSec=51.68010101183225, CurrSamplesPerSec=51.22921588048604, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:35:01,359] [INFO] [logging.py:96:log_dist] [Rank 0] step=660, skipped=11, lr=[1.415290652206105e-06, 1.415290652206105e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 659|ppo_ep: 1|act_loss: -0.10577392578125|cri_loss: 0.041168212890625|unsuper_loss: 0.0 average reward score: 2.841796875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.65%) |Training time=0.79s (31.59%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 660|ppo_ep: 1|act_loss: -0.1065673828125|cri_loss: 0.20361328125|unsuper_loss: 0.0 average reward score: 1.41015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.62%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 661|ppo_ep: 1|act_loss: -0.040008544921875|cri_loss: 0.035797119140625|unsuper_loss: 0.0 average reward score: 2.140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.66%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 662|ppo_ep: 1|act_loss: 0.06243896484375|cri_loss: 0.037078857421875|unsuper_loss: 0.0 average reward score: 2.044921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.85%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 663|ppo_ep: 1|act_loss: -0.16162109375|cri_loss: 0.040679931640625|unsuper_loss: 0.0 average reward score: 2.998046875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.47%) |Training time=0.79s (31.66%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 664|ppo_ep: 1|act_loss: -0.10235595703125|cri_loss: 0.041961669921875|unsuper_loss: 0.0 average reward score: 2.595703125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.59%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 665|ppo_ep: 1|act_loss: -0.05584716796875|cri_loss: 0.036163330078125|unsuper_loss: 0.0 average reward score: 1.3466796875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.24%) |Training time=0.80s (31.92%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.81 epoch: 0|step: 666|ppo_ep: 1|act_loss: -0.049957275390625|cri_loss: 0.078369140625|unsuper_loss: 0.0 average reward score: 2.7578125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.24%) |Training time=0.80s (31.96%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 667|ppo_ep: 1|act_loss: -0.0160369873046875|cri_loss: 0.03741455078125|unsuper_loss: 0.0 average reward score: 3.44921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.50%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 668|ppo_ep: 1|act_loss: -0.00972747802734375|cri_loss: 0.017547607421875|unsuper_loss: 0.0 average reward score: 1.66015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.64%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 [2023-07-01 08:35:26,043] [INFO] [logging.py:96:log_dist] [Rank 0] step=670, skipped=13, lr=[2.604501019836226e-06, 2.604501019836226e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:35:26,219] [INFO] [timer.py:215:stop] epoch=0/micro_step=670/global_step=670, RunningAvgSamplesPerSec=51.66980695082694, CurrSamplesPerSec=51.67015695720682, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:35:26,378] [INFO] [logging.py:96:log_dist] [Rank 0] step=670, skipped=11, lr=[1.3331838061061835e-06, 1.3331838061061835e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 669|ppo_ep: 1|act_loss: -0.05511474609375|cri_loss: 0.019744873046875|unsuper_loss: 0.0 average reward score: 3.0 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.79%) |Training time=0.78s (31.43%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 670|ppo_ep: 1|act_loss: -0.00949859619140625|cri_loss: 0.0129547119140625|unsuper_loss: 0.0 average reward score: 2.0 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.79s (31.46%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 671|ppo_ep: 1|act_loss: -0.038421630859375|cri_loss: 0.030487060546875|unsuper_loss: 0.0 average reward score: 2.3203125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.52%) |Training time=0.79s (31.72%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 672|ppo_ep: 1|act_loss: -0.03607177734375|cri_loss: 0.0133514404296875|unsuper_loss: 0.0 average reward score: 2.83984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.66%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 673|ppo_ep: 1|act_loss: -0.012359619140625|cri_loss: 0.035400390625|unsuper_loss: 0.0 average reward score: 3.076171875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.39%) |Training time=0.80s (31.83%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 674|ppo_ep: 1|act_loss: -0.0247650146484375|cri_loss: 0.0128631591796875|unsuper_loss: 0.0 average reward score: 2.43359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.60%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 675|ppo_ep: 1|act_loss: 0.0275726318359375|cri_loss: 0.01461029052734375|unsuper_loss: 0.0 average reward score: 0.9697265625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.66%) |Training time=0.79s (31.56%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.81 epoch: 0|step: 676|ppo_ep: 1|act_loss: -0.0261077880859375|cri_loss: 0.01190185546875|unsuper_loss: 0.0 average reward score: 2.79296875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.50%) |Training time=0.79s (31.68%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 677|ppo_ep: 1|act_loss: 0.0261383056640625|cri_loss: 0.0257110595703125|unsuper_loss: 0.0 average reward score: 3.203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.75%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 678|ppo_ep: 1|act_loss: 0.054046630859375|cri_loss: 0.0160675048828125|unsuper_loss: 0.0 average reward score: 2.572265625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.28%) |Training time=0.80s (31.90%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81 [2023-07-01 08:35:51,025] [INFO] [logging.py:96:log_dist] [Rank 0] step=680, skipped=13, lr=[2.4484555721226048e-06, 2.4484555721226048e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:35:51,205] [INFO] [timer.py:215:stop] epoch=0/micro_step=680/global_step=680, RunningAvgSamplesPerSec=51.66037751112657, CurrSamplesPerSec=50.75463074832772, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:35:51,365] [INFO] [logging.py:96:log_dist] [Rank 0] step=680, skipped=11, lr=[1.2526557976586267e-06, 1.2526557976586267e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 679|ppo_ep: 1|act_loss: 0.01154327392578125|cri_loss: 0.0034656524658203125|unsuper_loss: 0.0 average reward score: 1.4794921875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.80s (31.76%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81 epoch: 0|step: 680|ppo_ep: 1|act_loss: 0.08856201171875|cri_loss: 0.0238037109375|unsuper_loss: 0.0 average reward score: 3.126953125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.46%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 681|ppo_ep: 1|act_loss: -0.01373291015625|cri_loss: 0.03765869140625|unsuper_loss: 0.0 average reward score: 3.00390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.59%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 682|ppo_ep: 1|act_loss: -0.006565093994140625|cri_loss: 0.021728515625|unsuper_loss: 0.0 average reward score: 1.580078125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.82%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 683|ppo_ep: 1|act_loss: -0.07672119140625|cri_loss: 0.0238189697265625|unsuper_loss: 0.0 average reward score: 2.21484375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.42%) |Training time=0.79s (31.74%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 684|ppo_ep: 1|act_loss: -0.07891845703125|cri_loss: 0.0185089111328125|unsuper_loss: 0.0 average reward score: 2.953125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.37%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 685|ppo_ep: 1|act_loss: -0.0187530517578125|cri_loss: 0.048583984375|unsuper_loss: 0.0 average reward score: 1.943359375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.56%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 686|ppo_ep: 1|act_loss: 0.031829833984375|cri_loss: 0.0274200439453125|unsuper_loss: 0.0 average reward score: 2.056640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.61%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 687|ppo_ep: 1|act_loss: 0.01371002197265625|cri_loss: 0.021240234375|unsuper_loss: 0.0 average reward score: 3.365234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.69%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 688|ppo_ep: 1|act_loss: -0.006954193115234375|cri_loss: 0.0207061767578125|unsuper_loss: 0.0 average reward score: 1.9736328125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.50%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 [2023-07-01 08:36:16,005] [INFO] [logging.py:96:log_dist] [Rank 0] step=690, skipped=13, lr=[2.295625864681438e-06, 2.295625864681438e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:36:16,181] [INFO] [timer.py:215:stop] epoch=0/micro_step=690/global_step=690, RunningAvgSamplesPerSec=51.65406902884359, CurrSamplesPerSec=51.01762460321171, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:36:16,340] [INFO] [logging.py:96:log_dist] [Rank 0] step=690, skipped=11, lr=[1.1738155905995186e-06, 1.1738155905995186e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 689|ppo_ep: 1|act_loss: -0.017364501953125|cri_loss: 0.02587890625|unsuper_loss: 0.0 average reward score: 1.4228515625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.50%) |Training time=0.79s (31.74%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 690|ppo_ep: 1|act_loss: -0.09100341796875|cri_loss: 0.024322509765625|unsuper_loss: 0.0 average reward score: 1.134765625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.65%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 691|ppo_ep: 1|act_loss: -0.12139892578125|cri_loss: 0.0238189697265625|unsuper_loss: 0.0 average reward score: 3.15625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.59%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 692|ppo_ep: 1|act_loss: 0.0005116462707519531|cri_loss: 0.032379150390625|unsuper_loss: 0.0 average reward score: 3.25 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.58%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 693|ppo_ep: 1|act_loss: -0.035400390625|cri_loss: 0.01525115966796875|unsuper_loss: 0.0 average reward score: 2.705078125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.72%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 694|ppo_ep: 1|act_loss: 0.0008740425109863281|cri_loss: 0.04547119140625|unsuper_loss: 0.0 average reward score: 1.80078125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.72%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 695|ppo_ep: 1|act_loss: 0.0579833984375|cri_loss: 0.0850830078125|unsuper_loss: 0.0 average reward score: 1.5263671875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.44%) |Training time=0.79s (31.77%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 696|ppo_ep: 1|act_loss: 0.11370849609375|cri_loss: 0.12408447265625|unsuper_loss: 0.0 average reward score: 2.4375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.20%) |Training time=0.80s (31.97%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 697|ppo_ep: 1|act_loss: -0.0389404296875|cri_loss: 0.097900390625|unsuper_loss: 0.0 average reward score: 1.2939453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.31%) |Training time=0.80s (31.85%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 698|ppo_ep: 1|act_loss: -0.08319091796875|cri_loss: 0.166259765625|unsuper_loss: 0.0 average reward score: 1.3935546875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.40%) |Training time=0.80s (31.83%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 [2023-07-01 08:36:41,010] [INFO] [logging.py:96:log_dist] [Rank 0] step=700, skipped=13, lr=[2.146218693834001e-06, 2.146218693834001e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:36:41,189] [INFO] [timer.py:215:stop] epoch=0/micro_step=700/global_step=700, RunningAvgSamplesPerSec=51.64200175761014, CurrSamplesPerSec=50.62376611439234, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:36:41,348] [INFO] [logging.py:96:log_dist] [Rank 0] step=700, skipped=11, lr=[1.0967698648738866e-06, 1.0967698648738866e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 699|ppo_ep: 1|act_loss: -0.171875|cri_loss: 0.11260986328125|unsuper_loss: 0.0 average reward score: 2.306640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.39%) |Training time=0.80s (31.85%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 700|ppo_ep: 1|act_loss: -0.1732177734375|cri_loss: 0.07244873046875|unsuper_loss: 0.0 average reward score: 0.767578125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.58%) |Training time=0.79s (31.59%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 701|ppo_ep: 1|act_loss: -0.1396484375|cri_loss: 0.0650634765625|unsuper_loss: 0.0 average reward score: 2.50390625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.66%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 702|ppo_ep: 1|act_loss: -0.036285400390625|cri_loss: 0.0423583984375|unsuper_loss: 0.0 average reward score: 2.64453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.54%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 703|ppo_ep: 1|act_loss: 0.05657958984375|cri_loss: 0.10009765625|unsuper_loss: 0.0 average reward score: 2.67578125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.40%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 704|ppo_ep: 1|act_loss: 0.015777587890625|cri_loss: 0.035736083984375|unsuper_loss: 0.0 average reward score: 1.482421875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.41%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 705|ppo_ep: 1|act_loss: 0.036834716796875|cri_loss: 0.046722412109375|unsuper_loss: 0.0 average reward score: 1.908203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.58%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 706|ppo_ep: 1|act_loss: 0.10400390625|cri_loss: 0.1390380859375|unsuper_loss: 0.0 average reward score: 1.59765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.68%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 707|ppo_ep: 1|act_loss: -0.0270843505859375|cri_loss: 0.06463623046875|unsuper_loss: 0.0 average reward score: 2.8828125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.73%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 708|ppo_ep: 1|act_loss: -0.0789794921875|cri_loss: 0.06964111328125|unsuper_loss: 0.0 average reward score: 1.78515625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.60%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 [2023-07-01 08:37:05,976] [INFO] [logging.py:96:log_dist] [Rank 0] step=710, skipped=13, lr=[2.0004362248125774e-06, 2.0004362248125774e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:37:06,156] [INFO] [timer.py:215:stop] epoch=0/micro_step=710/global_step=710, RunningAvgSamplesPerSec=51.63689936822974, CurrSamplesPerSec=51.00378221753546, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:37:06,315] [INFO] [logging.py:96:log_dist] [Rank 0] step=710, skipped=11, lr=[1.0216228722853735e-06, 1.0216228722853735e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 709|ppo_ep: 1|act_loss: -0.06561279296875|cri_loss: 0.08526611328125|unsuper_loss: 0.0 average reward score: 0.443115234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.76%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 710|ppo_ep: 1|act_loss: 0.196044921875|cri_loss: 0.1932373046875|unsuper_loss: 0.0 average reward score: 1.380859375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.63%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 711|ppo_ep: 1|act_loss: 0.0289306640625|cri_loss: 0.1053466796875|unsuper_loss: 0.0 average reward score: 1.048828125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.35%) |Training time=0.80s (31.77%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 712|ppo_ep: 1|act_loss: 0.0615234375|cri_loss: 0.115966796875|unsuper_loss: 0.0 average reward score: 1.66796875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.40%) |Training time=0.80s (31.79%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 713|ppo_ep: 1|act_loss: 0.0733642578125|cri_loss: 0.12017822265625|unsuper_loss: 0.0 average reward score: 1.048828125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.29%) |Training time=0.80s (31.93%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 714|ppo_ep: 1|act_loss: -0.06158447265625|cri_loss: 0.1793212890625|unsuper_loss: 0.0 average reward score: 0.439453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.58%) |Training time=0.79s (31.60%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 715|ppo_ep: 1|act_loss: -0.1697998046875|cri_loss: 0.1650390625|unsuper_loss: 0.0 average reward score: -0.44970703125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.65%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 716|ppo_ep: 1|act_loss: -0.1690673828125|cri_loss: 0.208984375|unsuper_loss: 0.0 average reward score: 0.8740234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.49%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 717|ppo_ep: 1|act_loss: -0.1729736328125|cri_loss: 0.2359619140625|unsuper_loss: 0.0 average reward score: -0.2587890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.72%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 718|ppo_ep: 1|act_loss: 0.0153656005859375|cri_loss: 0.148193359375|unsuper_loss: 0.0 average reward score: -0.103271484375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.56%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 [2023-07-01 08:37:30,985] [INFO] [logging.py:96:log_dist] [Rank 0] step=720, skipped=13, lr=[1.8584757182074397e-06, 1.8584757182074397e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:37:31,162] [INFO] [timer.py:215:stop] epoch=0/micro_step=720/global_step=720, RunningAvgSamplesPerSec=51.62766889502093, CurrSamplesPerSec=50.76904873941442, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:37:31,322] [INFO] [logging.py:96:log_dist] [Rank 0] step=720, skipped=11, lr=[9.48476295431443e-07, 9.48476295431443e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 719|ppo_ep: 1|act_loss: 0.033660888671875|cri_loss: 0.10284423828125|unsuper_loss: 0.0 average reward score: -0.1505126953125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.40%) |Training time=0.80s (31.82%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 720|ppo_ep: 1|act_loss: 0.29541015625|cri_loss: 0.163818359375|unsuper_loss: 0.0 average reward score: -0.421875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.40%) |Training time=0.80s (31.81%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 721|ppo_ep: 1|act_loss: 0.0809326171875|cri_loss: 0.1768798828125|unsuper_loss: 0.0 average reward score: 1.0595703125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.40%) |Training time=0.80s (31.86%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 722|ppo_ep: 1|act_loss: -0.0162200927734375|cri_loss: 0.301513671875|unsuper_loss: 0.0 average reward score: 0.89404296875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.60%) |Training time=0.79s (31.63%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 723|ppo_ep: 1|act_loss: 0.06671142578125|cri_loss: 0.1895751953125|unsuper_loss: 0.0 average reward score: 0.38720703125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.81%) |Training time=0.78s (31.39%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.81 epoch: 0|step: 724|ppo_ep: 1|act_loss: 0.2216796875|cri_loss: 0.252197265625|unsuper_loss: 0.0 average reward score: 0.059783935546875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.66%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 725|ppo_ep: 1|act_loss: 0.10699462890625|cri_loss: 0.1409912109375|unsuper_loss: 0.0 average reward score: 0.26123046875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.38%) |Training time=0.80s (31.80%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 726|ppo_ep: 1|act_loss: 0.1195068359375|cri_loss: 0.1507568359375|unsuper_loss: 0.0 average reward score: -1.546875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.58%) |Training time=0.79s (31.62%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 727|ppo_ep: 1|act_loss: -0.060546875|cri_loss: 0.366943359375|unsuper_loss: 0.0 average reward score: -1.1201171875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.51%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 728|ppo_ep: 1|act_loss: 0.079833984375|cri_loss: 0.1524658203125|unsuper_loss: 0.0 average reward score: 0.2132568359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.66%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 [2023-07-01 08:37:55,964] [INFO] [logging.py:96:log_dist] [Rank 0] step=730, skipped=13, lr=[1.7205292630503881e-06, 1.7205292630503881e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:37:56,144] [INFO] [timer.py:215:stop] epoch=0/micro_step=730/global_step=730, RunningAvgSamplesPerSec=51.62026118095024, CurrSamplesPerSec=51.04830235468179, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:37:56,304] [INFO] [logging.py:96:log_dist] [Rank 0] step=730, skipped=11, lr=[8.774291101150409e-07, 8.774291101150409e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 729|ppo_ep: 1|act_loss: 0.03680419921875|cri_loss: 0.1708984375|unsuper_loss: 0.0 average reward score: 0.17724609375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.68%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 730|ppo_ep: 1|act_loss: -0.13818359375|cri_loss: 0.16748046875|unsuper_loss: 0.0 average reward score: -0.94677734375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.80s (31.78%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 epoch: 0|step: 731|ppo_ep: 1|act_loss: -0.0872802734375|cri_loss: 0.1845703125|unsuper_loss: 0.0 average reward score: 0.32568359375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.55%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 732|ppo_ep: 1|act_loss: 0.098876953125|cri_loss: 0.29296875|unsuper_loss: 0.0 average reward score: -0.9912109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.62%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 [2023-07-01 08:38:06,298] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, but hysteresis is 2. Reducing hysteresis to 1 epoch: 0|step: 733|ppo_ep: 1|act_loss: 0.157470703125|cri_loss: 0.200439453125|unsuper_loss: 0.0 average reward score: 0.067626953125 ------------------------------------------------------------------------------------- |E2E latency=2.46s |Gather latency=0.00s (0.00%) |Generate time=1.49s (60.45%) |Training time=0.80s (32.47%) |Others=0.17 (7.08%)|CurSamplesPerSec=13.01 |AvgSamplesPerSec=12.81 epoch: 0|step: 734|ppo_ep: 1|act_loss: 0.155029296875|cri_loss: 0.2490234375|unsuper_loss: 0.0 average reward score: -0.85009765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.46%) |Training time=0.79s (31.71%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 735|ppo_ep: 1|act_loss: 0.24755859375|cri_loss: 0.218017578125|unsuper_loss: 0.0 average reward score: 0.4453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.66%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 736|ppo_ep: 1|act_loss: -0.05853271484375|cri_loss: 0.150146484375|unsuper_loss: 0.0 average reward score: -0.090087890625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.50%) |Training time=0.79s (31.69%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 [2023-07-01 08:38:16,247] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192 epoch: 0|step: 737|ppo_ep: 1|act_loss: 0.14111328125|cri_loss: 0.1685791015625|unsuper_loss: 0.0 average reward score: 0.525390625 ------------------------------------------------------------------------------------- |E2E latency=2.45s |Gather latency=0.00s (0.00%) |Generate time=1.48s (60.52%) |Training time=0.79s (32.43%) |Others=0.17 (7.05%)|CurSamplesPerSec=13.06 |AvgSamplesPerSec=12.81 epoch: 0|step: 738|ppo_ep: 1|act_loss: 0.07269287109375|cri_loss: 0.1416015625|unsuper_loss: 0.0 average reward score: -1.103515625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.60%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 [2023-07-01 08:38:20,868] [INFO] [logging.py:96:log_dist] [Rank 0] step=740, skipped=13, lr=[1.5867835168960191e-06, 1.5867835168960191e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:38:21,044] [INFO] [timer.py:215:stop] epoch=0/micro_step=740/global_step=740, RunningAvgSamplesPerSec=51.61111994665265, CurrSamplesPerSec=50.71313119023716, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:38:21,206] [INFO] [logging.py:96:log_dist] [Rank 0] step=740, skipped=13, lr=[8.221676253347249e-07, 8.221676253347249e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 739|ppo_ep: 1|act_loss: 0.0261383056640625|cri_loss: 0.169677734375|unsuper_loss: 0.0 average reward score: -0.247314453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.75%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 740|ppo_ep: 1|act_loss: -0.0654296875|cri_loss: 0.1591796875|unsuper_loss: 0.0 average reward score: -0.61767578125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.72%) |Training time=0.79s (31.51%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 741|ppo_ep: 1|act_loss: -0.16064453125|cri_loss: 0.169677734375|unsuper_loss: 0.0 average reward score: -0.465576171875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.39%) |Training time=0.80s (31.84%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 742|ppo_ep: 1|act_loss: -0.2086181640625|cri_loss: 0.144775390625|unsuper_loss: 0.0 average reward score: -0.5927734375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.69%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 743|ppo_ep: 1|act_loss: 0.037445068359375|cri_loss: 0.10546875|unsuper_loss: 0.0 average reward score: -0.2474365234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.73%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81 epoch: 0|step: 744|ppo_ep: 1|act_loss: 0.138427734375|cri_loss: 0.07501220703125|unsuper_loss: 0.0 average reward score: 0.61865234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.40%) |Training time=0.80s (31.78%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 745|ppo_ep: 1|act_loss: 0.1702880859375|cri_loss: 0.077880859375|unsuper_loss: 0.0 average reward score: -0.82177734375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.47%) |Training time=0.79s (31.75%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 746|ppo_ep: 1|act_loss: 0.1856689453125|cri_loss: 0.08074951171875|unsuper_loss: 0.0 average reward score: 0.21044921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.73%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 747|ppo_ep: 1|act_loss: 0.1865234375|cri_loss: 0.1158447265625|unsuper_loss: 0.0 average reward score: -1.3857421875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.26%) |Training time=0.80s (31.93%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.81 epoch: 0|step: 748|ppo_ep: 1|act_loss: 0.0037364959716796875|cri_loss: 0.07928466796875|unsuper_loss: 0.0 average reward score: 0.62353515625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.47%) |Training time=0.79s (31.74%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 [2023-07-01 08:38:45,868] [INFO] [logging.py:96:log_dist] [Rank 0] step=750, skipped=13, lr=[1.4574194532523914e-06, 1.4574194532523914e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:38:46,043] [INFO] [timer.py:215:stop] epoch=0/micro_step=750/global_step=750, RunningAvgSamplesPerSec=51.601553142551516, CurrSamplesPerSec=51.397851455320605, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:38:46,203] [INFO] [logging.py:96:log_dist] [Rank 0] step=750, skipped=13, lr=[7.551396130841406e-07, 7.551396130841406e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 749|ppo_ep: 1|act_loss: 0.04638671875|cri_loss: 0.10772705078125|unsuper_loss: 0.0 average reward score: -0.44140625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.54%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 750|ppo_ep: 1|act_loss: 0.0013256072998046875|cri_loss: 0.052490234375|unsuper_loss: 0.0 average reward score: -0.0712890625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.36%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 751|ppo_ep: 1|act_loss: 0.004772186279296875|cri_loss: 0.09405517578125|unsuper_loss: 0.0 average reward score: 0.298583984375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.38%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81 epoch: 0|step: 752|ppo_ep: 1|act_loss: 0.09661865234375|cri_loss: 0.0604248046875|unsuper_loss: 0.0 average reward score: 0.10296630859375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.62%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 753|ppo_ep: 1|act_loss: 0.033203125|cri_loss: 0.0548095703125|unsuper_loss: 0.0 average reward score: -0.53076171875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.74%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 754|ppo_ep: 1|act_loss: 0.0653076171875|cri_loss: 0.06781005859375|unsuper_loss: 0.0 average reward score: -0.69140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.64%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 755|ppo_ep: 1|act_loss: -0.1654052734375|cri_loss: 0.11083984375|unsuper_loss: 0.0 average reward score: -0.413818359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.62%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 756|ppo_ep: 1|act_loss: 0.027496337890625|cri_loss: 0.08026123046875|unsuper_loss: 0.0 average reward score: -0.28857421875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.85%) |Training time=0.78s (31.31%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 757|ppo_ep: 1|act_loss: -0.052581787109375|cri_loss: 0.059814453125|unsuper_loss: 0.0 average reward score: -0.463623046875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.42%) |Training time=0.79s (31.70%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 758|ppo_ep: 1|act_loss: -0.08880615234375|cri_loss: 0.069580078125|unsuper_loss: 0.0 average reward score: -0.54248046875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.35%) |Training time=0.80s (31.84%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81 [2023-07-01 08:39:10,847] [INFO] [logging.py:96:log_dist] [Rank 0] step=760, skipped=13, lr=[1.3326121167028917e-06, 1.3326121167028917e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:39:11,028] [INFO] [timer.py:215:stop] epoch=0/micro_step=760/global_step=760, RunningAvgSamplesPerSec=51.59554618091631, CurrSamplesPerSec=50.50729342630164, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:39:11,188] [INFO] [logging.py:96:log_dist] [Rank 0] step=760, skipped=13, lr=[6.904725993279232e-07, 6.904725993279232e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 759|ppo_ep: 1|act_loss: 0.0384521484375|cri_loss: 0.02911376953125|unsuper_loss: 0.0 average reward score: -1.189453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.33%) |Training time=0.80s (31.85%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81 epoch: 0|step: 760|ppo_ep: 1|act_loss: 0.06256103515625|cri_loss: 0.05865478515625|unsuper_loss: 0.0 average reward score: -0.84326171875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.83%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81 epoch: 0|step: 761|ppo_ep: 1|act_loss: -0.0215911865234375|cri_loss: 0.037811279296875|unsuper_loss: 0.0 average reward score: -1.1328125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.31%) |Training time=0.80s (31.84%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81 epoch: 0|step: 762|ppo_ep: 1|act_loss: -0.09063720703125|cri_loss: 0.06414794921875|unsuper_loss: 0.0 average reward score: -0.6279296875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.36%) |Training time=0.80s (31.78%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81 epoch: 0|step: 763|ppo_ep: 1|act_loss: 0.09454345703125|cri_loss: 0.12274169921875|unsuper_loss: 0.0 average reward score: 0.24560546875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.69%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81 epoch: 0|step: 764|ppo_ep: 1|act_loss: 0.0235748291015625|cri_loss: 0.04290771484375|unsuper_loss: 0.0 average reward score: 0.29736328125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.39%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 765|ppo_ep: 1|act_loss: -0.1361083984375|cri_loss: 0.08929443359375|unsuper_loss: 0.0 average reward score: -0.6435546875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.78s (31.44%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81 epoch: 0|step: 766|ppo_ep: 1|act_loss: 0.1015625|cri_loss: 0.0264434814453125|unsuper_loss: 0.0 average reward score: 0.68603515625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.60%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81 epoch: 0|step: 767|ppo_ep: 1|act_loss: -0.0099945068359375|cri_loss: 0.073486328125|unsuper_loss: 0.0 average reward score: -0.603515625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.46%) |Training time=0.79s (31.76%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 epoch: 0|step: 768|ppo_ep: 1|act_loss: -0.231689453125|cri_loss: 0.1632080078125|unsuper_loss: 0.0 average reward score: -1.33984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.71%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81 [2023-07-01 08:39:35,827] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, but hysteresis is 2. Reducing hysteresis to 1 [2023-07-01 08:39:35,827] [INFO] [logging.py:96:log_dist] [Rank 0] step=770, skipped=14, lr=[1.2243212249131722e-06, 1.2243212249131722e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:39:35,828] [INFO] [timer.py:215:stop] epoch=0/micro_step=770/global_step=770, RunningAvgSamplesPerSec=51.60825437477312, CurrSamplesPerSec=73.02824111153672, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:39:35,986] [INFO] [logging.py:96:log_dist] [Rank 0] step=770, skipped=13, lr=[6.282540860365757e-07, 6.282540860365757e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 769|ppo_ep: 1|act_loss: -0.097900390625|cri_loss: 0.07208251953125|unsuper_loss: 0.0 average reward score: -0.7880859375 ------------------------------------------------------------------------------------- |E2E latency=2.31s |Gather latency=0.00s (0.00%) |Generate time=1.49s (64.46%) |Training time=0.60s (26.09%) |Others=0.22 (9.45%)|CurSamplesPerSec=13.88 |AvgSamplesPerSec=12.81 [2023-07-01 08:39:38,139] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048 [2023-07-01 08:39:38,294] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096 epoch: 0|step: 770|ppo_ep: 1|act_loss: -0.00699615478515625|cri_loss: 0.0513916015625|unsuper_loss: 0.0 average reward score: -0.94140625 ------------------------------------------------------------------------------------- |E2E latency=2.27s |Gather latency=0.00s (0.00%) |Generate time=1.49s (65.58%) |Training time=0.61s (26.75%) |Others=0.17 (7.67%)|CurSamplesPerSec=14.11 |AvgSamplesPerSec=12.82 epoch: 0|step: 771|ppo_ep: 1|act_loss: -0.2410888671875|cri_loss: 0.2340087890625|unsuper_loss: 0.0 average reward score: 0.391357421875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.69%) |Training time=0.79s (31.49%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82 epoch: 0|step: 772|ppo_ep: 1|act_loss: -0.033966064453125|cri_loss: 0.11492919921875|unsuper_loss: 0.0 average reward score: -2.4765625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.38%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82 epoch: 0|step: 773|ppo_ep: 1|act_loss: -0.1956787109375|cri_loss: 0.25927734375|unsuper_loss: 0.0 average reward score: -2.830078125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.79s (31.46%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82 epoch: 0|step: 774|ppo_ep: 1|act_loss: -0.07672119140625|cri_loss: 0.041412353515625|unsuper_loss: 0.0 average reward score: 0.247314453125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.78s (31.48%) |Others=0.22 (8.89%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82 epoch: 0|step: 775|ppo_ep: 1|act_loss: -0.303466796875|cri_loss: 0.209228515625|unsuper_loss: 0.0 average reward score: -1.1796875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.27%) |Training time=0.80s (31.87%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.82 epoch: 0|step: 776|ppo_ep: 1|act_loss: -0.072509765625|cri_loss: 0.1566162109375|unsuper_loss: 0.0 average reward score: -1.9306640625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.36%) |Training time=0.80s (31.79%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.82 epoch: 0|step: 777|ppo_ep: 1|act_loss: 0.0977783203125|cri_loss: 0.155029296875|unsuper_loss: 0.0 average reward score: -1.7763671875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.76%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82 [2023-07-01 08:39:57,895] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024 epoch: 0|step: 778|ppo_ep: 1|act_loss: 0.138427734375|cri_loss: 0.20068359375|unsuper_loss: 0.0 average reward score: -1.125 ------------------------------------------------------------------------------------- |E2E latency=2.31s |Gather latency=0.00s (0.00%) |Generate time=1.49s (64.43%) |Training time=0.60s (26.02%) |Others=0.22 (9.55%)|CurSamplesPerSec=13.86 |AvgSamplesPerSec=12.82 [2023-07-01 08:40:00,214] [INFO] [logging.py:96:log_dist] [Rank 0] step=780, skipped=16, lr=[1.1313721839601206e-06, 1.1313721839601206e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:40:00,394] [INFO] [timer.py:215:stop] epoch=0/micro_step=780/global_step=780, RunningAvgSamplesPerSec=51.64180602872668, CurrSamplesPerSec=51.00333643923653, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:40:00,555] [INFO] [logging.py:96:log_dist] [Rank 0] step=780, skipped=14, lr=[5.744205443756365e-07, 5.744205443756365e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 779|ppo_ep: 1|act_loss: 0.09027099609375|cri_loss: 0.1339111328125|unsuper_loss: 0.0 average reward score: -0.81591796875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.69%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 780|ppo_ep: 1|act_loss: -0.0261688232421875|cri_loss: 0.0543212890625|unsuper_loss: 0.0 average reward score: -2.0 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.65%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82 epoch: 0|step: 781|ppo_ep: 1|act_loss: 0.0146026611328125|cri_loss: 0.1873779296875|unsuper_loss: 0.0 average reward score: -0.78955078125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.33%) |Training time=0.80s (31.86%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82 epoch: 0|step: 782|ppo_ep: 1|act_loss: 0.1781005859375|cri_loss: 0.1383056640625|unsuper_loss: 0.0 average reward score: -0.488037109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.61%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82 epoch: 0|step: 783|ppo_ep: 1|act_loss: -0.0207061767578125|cri_loss: 0.1483154296875|unsuper_loss: 0.0 average reward score: -3.36328125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.47%) |Training time=0.80s (31.78%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82 epoch: 0|step: 784|ppo_ep: 1|act_loss: 0.0156402587890625|cri_loss: 0.1343994140625|unsuper_loss: 0.0 average reward score: -2.09765625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.95%) |Training time=0.78s (31.21%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.82 epoch: 0|step: 785|ppo_ep: 1|act_loss: -0.10455322265625|cri_loss: 0.095458984375|unsuper_loss: 0.0 average reward score: -1.41796875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.78s (31.45%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82 epoch: 0|step: 786|ppo_ep: 1|act_loss: 0.0179901123046875|cri_loss: 0.1077880859375|unsuper_loss: 0.0 average reward score: -3.609375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.30%) |Training time=0.80s (31.87%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82 epoch: 0|step: 787|ppo_ep: 1|act_loss: 0.054901123046875|cri_loss: 0.1431884765625|unsuper_loss: 0.0 average reward score: -1.2724609375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.78s (31.53%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.82 epoch: 0|step: 788|ppo_ep: 1|act_loss: -0.1170654296875|cri_loss: 0.05938720703125|unsuper_loss: 0.0 average reward score: -2.380859375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.70%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82 [2023-07-01 08:40:25,465] [INFO] [logging.py:96:log_dist] [Rank 0] step=790, skipped=16, lr=[1.0196933519708125e-06, 1.0196933519708125e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:40:25,641] [INFO] [timer.py:215:stop] epoch=0/micro_step=790/global_step=790, RunningAvgSamplesPerSec=51.63518407852763, CurrSamplesPerSec=50.67183335321682, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:40:25,800] [INFO] [logging.py:96:log_dist] [Rank 0] step=790, skipped=14, lr=[5.170832921371164e-07, 5.170832921371164e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 789|ppo_ep: 1|act_loss: 0.0171356201171875|cri_loss: 0.066650390625|unsuper_loss: 0.0 average reward score: -0.09027099609375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.37%) |Training time=0.80s (31.89%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 790|ppo_ep: 1|act_loss: -0.006694793701171875|cri_loss: 0.07513427734375|unsuper_loss: 0.0 average reward score: -0.7490234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.54%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 791|ppo_ep: 1|act_loss: -0.0699462890625|cri_loss: 0.048736572265625|unsuper_loss: 0.0 average reward score: -0.6640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.76%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 792|ppo_ep: 1|act_loss: -0.0310211181640625|cri_loss: 0.037322998046875|unsuper_loss: 0.0 average reward score: -0.28076171875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.64%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82 epoch: 0|step: 793|ppo_ep: 1|act_loss: 0.049957275390625|cri_loss: 0.06402587890625|unsuper_loss: 0.0 average reward score: -1.1796875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.39%) |Training time=0.79s (31.78%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 794|ppo_ep: 1|act_loss: 0.09295654296875|cri_loss: 0.049041748046875|unsuper_loss: 0.0 average reward score: 1.130859375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.69%) |Training time=0.79s (31.48%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82 epoch: 0|step: 795|ppo_ep: 1|act_loss: -0.065185546875|cri_loss: 0.060638427734375|unsuper_loss: 0.0 average reward score: -0.9345703125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.63%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82 epoch: 0|step: 796|ppo_ep: 1|act_loss: 0.08929443359375|cri_loss: 0.040374755859375|unsuper_loss: 0.0 average reward score: -0.03204345703125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.77%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 797|ppo_ep: 1|act_loss: 0.0360107421875|cri_loss: 0.0285797119140625|unsuper_loss: 0.0 average reward score: 1.12109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.65%) |Training time=0.79s (31.54%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 798|ppo_ep: 1|act_loss: 0.06768798828125|cri_loss: 0.194580078125|unsuper_loss: 0.0 average reward score: 0.4609375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.35%) |Training time=0.80s (31.84%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82 [2023-07-01 08:40:50,465] [INFO] [logging.py:96:log_dist] [Rank 0] step=800, skipped=16, lr=[9.131635412636474e-07, 9.131635412636474e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:40:50,642] [INFO] [timer.py:215:stop] epoch=0/micro_step=800/global_step=800, RunningAvgSamplesPerSec=51.62718218371995, CurrSamplesPerSec=50.86661858599624, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:40:50,802] [INFO] [logging.py:96:log_dist] [Rank 0] step=800, skipped=14, lr=[4.624291562079719e-07, 4.624291562079719e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 799|ppo_ep: 1|act_loss: 0.1156005859375|cri_loss: 0.060760498046875|unsuper_loss: 0.0 average reward score: 0.00958251953125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.47%) |Training time=0.79s (31.76%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 800|ppo_ep: 1|act_loss: 0.0296478271484375|cri_loss: 0.00811004638671875|unsuper_loss: 0.0 average reward score: 1.1142578125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.52%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.82 epoch: 0|step: 801|ppo_ep: 1|act_loss: 0.0025157928466796875|cri_loss: 0.027435302734375|unsuper_loss: 0.0 average reward score: 1.2216796875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.78s (31.49%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.82 epoch: 0|step: 802|ppo_ep: 1|act_loss: 0.06158447265625|cri_loss: 0.0352783203125|unsuper_loss: 0.0 average reward score: -0.1055908203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.64%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 803|ppo_ep: 1|act_loss: 0.1011962890625|cri_loss: 0.0164642333984375|unsuper_loss: 0.0 average reward score: -0.1495361328125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.39%) |Training time=0.80s (31.79%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82 epoch: 0|step: 804|ppo_ep: 1|act_loss: -0.0045623779296875|cri_loss: 0.01430511474609375|unsuper_loss: 0.0 average reward score: 0.54833984375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.33%) |Training time=0.80s (31.90%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82 epoch: 0|step: 805|ppo_ep: 1|act_loss: -0.00469970703125|cri_loss: 0.05517578125|unsuper_loss: 0.0 average reward score: 0.65185546875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.78s (31.47%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82 epoch: 0|step: 806|ppo_ep: 1|act_loss: -0.0382080078125|cri_loss: 0.053131103515625|unsuper_loss: 0.0 average reward score: 1.369140625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.85%) |Training time=0.78s (31.35%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.82 epoch: 0|step: 807|ppo_ep: 1|act_loss: -0.08514404296875|cri_loss: 0.051727294921875|unsuper_loss: 0.0 average reward score: 0.53857421875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.85%) |Training time=0.78s (31.27%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82 epoch: 0|step: 808|ppo_ep: 1|act_loss: -0.0787353515625|cri_loss: 0.0428466796875|unsuper_loss: 0.0 average reward score: 0.389892578125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.59%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 [2023-07-01 08:41:15,429] [INFO] [logging.py:96:log_dist] [Rank 0] step=810, skipped=16, lr=[8.119268990291768e-07, 8.119268990291768e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:41:15,610] [INFO] [timer.py:215:stop] epoch=0/micro_step=810/global_step=810, RunningAvgSamplesPerSec=51.6220586884916, CurrSamplesPerSec=50.11381577129612, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:41:15,769] [INFO] [logging.py:96:log_dist] [Rank 0] step=810, skipped=14, lr=[4.1053208997358816e-07, 4.1053208997358816e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 809|ppo_ep: 1|act_loss: -0.05810546875|cri_loss: 0.048583984375|unsuper_loss: 0.0 average reward score: 0.169921875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.19%) |Training time=0.80s (32.03%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.82 epoch: 0|step: 810|ppo_ep: 1|act_loss: -0.1319580078125|cri_loss: 0.0921630859375|unsuper_loss: 0.0 average reward score: -1.751953125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.31%) |Training time=0.80s (31.90%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82 epoch: 0|step: 811|ppo_ep: 1|act_loss: -0.115234375|cri_loss: 0.049957275390625|unsuper_loss: 0.0 average reward score: -0.065185546875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.67%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82 epoch: 0|step: 812|ppo_ep: 1|act_loss: -0.05712890625|cri_loss: 0.05364990234375|unsuper_loss: 0.0 average reward score: -0.892578125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.72%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 813|ppo_ep: 1|act_loss: 0.05908203125|cri_loss: 0.05279541015625|unsuper_loss: 0.0 average reward score: 0.740234375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.71%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 814|ppo_ep: 1|act_loss: -0.01099395751953125|cri_loss: 0.045440673828125|unsuper_loss: 0.0 average reward score: 1.83984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.59%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 815|ppo_ep: 1|act_loss: -0.034515380859375|cri_loss: 0.0224456787109375|unsuper_loss: 0.0 average reward score: 0.0625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.57%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82 epoch: 0|step: 816|ppo_ep: 1|act_loss: 0.049652099609375|cri_loss: 0.0233001708984375|unsuper_loss: 0.0 average reward score: -0.461181640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.50%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82 epoch: 0|step: 817|ppo_ep: 1|act_loss: 0.055389404296875|cri_loss: 0.03802490234375|unsuper_loss: 0.0 average reward score: 0.720703125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.53%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 818|ppo_ep: 1|act_loss: 0.09637451171875|cri_loss: 0.0672607421875|unsuper_loss: 0.0 average reward score: 0.5283203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.56%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 [2023-07-01 08:41:40,412] [INFO] [logging.py:96:log_dist] [Rank 0] step=820, skipped=16, lr=[7.161204101870459e-07, 7.161204101870459e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:41:40,588] [INFO] [timer.py:215:stop] epoch=0/micro_step=820/global_step=820, RunningAvgSamplesPerSec=51.616404119052, CurrSamplesPerSec=51.77697640793942, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:41:40,746] [INFO] [logging.py:96:log_dist] [Rank 0] step=820, skipped=14, lr=[3.614623161842565e-07, 3.614623161842565e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 819|ppo_ep: 1|act_loss: 0.05255126953125|cri_loss: 0.01995849609375|unsuper_loss: 0.0 average reward score: 0.64306640625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.83%) |Training time=0.78s (31.41%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.82 epoch: 0|step: 820|ppo_ep: 1|act_loss: -0.04034423828125|cri_loss: 0.130126953125|unsuper_loss: 0.0 average reward score: 0.168212890625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.93%) |Training time=0.78s (31.30%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.82 epoch: 0|step: 821|ppo_ep: 1|act_loss: 0.01351165771484375|cri_loss: 0.07733154296875|unsuper_loss: 0.0 average reward score: 1.2470703125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.55%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82 epoch: 0|step: 822|ppo_ep: 1|act_loss: -0.1065673828125|cri_loss: 0.0628662109375|unsuper_loss: 0.0 average reward score: -0.47412109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.65%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 823|ppo_ep: 1|act_loss: -0.0014324188232421875|cri_loss: 0.035858154296875|unsuper_loss: 0.0 average reward score: 0.188720703125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.46%) |Training time=0.80s (31.78%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82 epoch: 0|step: 824|ppo_ep: 1|act_loss: -0.1934814453125|cri_loss: 0.1246337890625|unsuper_loss: 0.0 average reward score: -0.21630859375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.69%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 825|ppo_ep: 1|act_loss: 0.01076507568359375|cri_loss: 0.0192718505859375|unsuper_loss: 0.0 average reward score: -0.4912109375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.70%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82 epoch: 0|step: 826|ppo_ep: 1|act_loss: -0.049957275390625|cri_loss: 0.01800537109375|unsuper_loss: 0.0 average reward score: 0.06787109375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.80s (31.73%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.82 epoch: 0|step: 827|ppo_ep: 1|act_loss: -0.0654296875|cri_loss: 0.0181121826171875|unsuper_loss: 0.0 average reward score: 0.607421875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.13%) |Training time=0.80s (32.01%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.73 |AvgSamplesPerSec=12.82 epoch: 0|step: 828|ppo_ep: 1|act_loss: -0.040008544921875|cri_loss: 0.04144287109375|unsuper_loss: 0.0 average reward score: 0.72509765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.30%) |Training time=0.80s (31.91%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82 [2023-07-01 08:42:05,412] [INFO] [logging.py:96:log_dist] [Rank 0] step=830, skipped=16, lr=[6.258737120295009e-07, 6.258737120295009e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:42:05,592] [INFO] [timer.py:215:stop] epoch=0/micro_step=830/global_step=830, RunningAvgSamplesPerSec=51.60849889648195, CurrSamplesPerSec=51.36816773050813, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:42:05,751] [INFO] [logging.py:96:log_dist] [Rank 0] step=830, skipped=14, lr=[3.1528623193564286e-07, 3.1528623193564286e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 829|ppo_ep: 1|act_loss: 0.06610107421875|cri_loss: 0.022186279296875|unsuper_loss: 0.0 average reward score: 0.09521484375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.51%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82 epoch: 0|step: 830|ppo_ep: 1|act_loss: -0.0310821533203125|cri_loss: 0.0281524658203125|unsuper_loss: 0.0 average reward score: 0.420166015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.65%) |Training time=0.79s (31.48%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 831|ppo_ep: 1|act_loss: 0.047149658203125|cri_loss: 0.01361083984375|unsuper_loss: 0.0 average reward score: 0.59765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.72%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 832|ppo_ep: 1|act_loss: 0.0447998046875|cri_loss: 0.021392822265625|unsuper_loss: 0.0 average reward score: -1.0263671875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.72%) |Training time=0.79s (31.49%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82 epoch: 0|step: 833|ppo_ep: 1|act_loss: 0.03436279296875|cri_loss: 0.00853729248046875|unsuper_loss: 0.0 average reward score: 0.73583984375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.78s (31.50%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.82 epoch: 0|step: 834|ppo_ep: 1|act_loss: 0.07049560546875|cri_loss: 0.0280914306640625|unsuper_loss: 0.0 average reward score: 0.76171875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.69%) |Training time=0.79s (31.57%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82 epoch: 0|step: 835|ppo_ep: 1|act_loss: 0.030242919921875|cri_loss: 0.024688720703125|unsuper_loss: 0.0 average reward score: 0.5947265625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.61%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82 epoch: 0|step: 836|ppo_ep: 1|act_loss: 0.0147857666015625|cri_loss: 0.0347900390625|unsuper_loss: 0.0 average reward score: 1.072265625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.42%) |Training time=0.80s (31.82%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 837|ppo_ep: 1|act_loss: -0.00695037841796875|cri_loss: 0.01007843017578125|unsuper_loss: 0.0 average reward score: 1.263671875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.79%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 838|ppo_ep: 1|act_loss: -0.02899169921875|cri_loss: 0.0163726806640625|unsuper_loss: 0.0 average reward score: -0.4306640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.55%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82 [2023-07-01 08:42:30,373] [INFO] [logging.py:96:log_dist] [Rank 0] step=840, skipped=16, lr=[5.413089188070959e-07, 5.413089188070959e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:42:30,551] [INFO] [timer.py:215:stop] epoch=0/micro_step=840/global_step=840, RunningAvgSamplesPerSec=51.60365103384502, CurrSamplesPerSec=51.48618054838966, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:42:30,710] [INFO] [logging.py:96:log_dist] [Rank 0] step=840, skipped=14, lr=[2.720663188258199e-07, 2.720663188258199e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 839|ppo_ep: 1|act_loss: -0.01837158203125|cri_loss: 0.0267486572265625|unsuper_loss: 0.0 average reward score: 0.833984375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.54%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82 epoch: 0|step: 840|ppo_ep: 1|act_loss: 0.0253448486328125|cri_loss: 0.050506591796875|unsuper_loss: 0.0 average reward score: 1.953125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.67%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 841|ppo_ep: 1|act_loss: -0.10357666015625|cri_loss: 0.0260772705078125|unsuper_loss: 0.0 average reward score: 1.794921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.48%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 842|ppo_ep: 1|act_loss: -0.1046142578125|cri_loss: 0.1341552734375|unsuper_loss: 0.0 average reward score: 0.62646484375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.54%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 843|ppo_ep: 1|act_loss: 0.09100341796875|cri_loss: 0.03955078125|unsuper_loss: 0.0 average reward score: -0.39794921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.65%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82 epoch: 0|step: 844|ppo_ep: 1|act_loss: -0.08953857421875|cri_loss: 0.035308837890625|unsuper_loss: 0.0 average reward score: 1.5791015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.58%) |Training time=0.79s (31.57%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 845|ppo_ep: 1|act_loss: -0.157958984375|cri_loss: 0.06439208984375|unsuper_loss: 0.0 average reward score: 0.45654296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.46%) |Training time=0.79s (31.71%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82 epoch: 0|step: 846|ppo_ep: 1|act_loss: -0.076171875|cri_loss: 0.04150390625|unsuper_loss: 0.0 average reward score: 1.568359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.47%) |Training time=0.79s (31.74%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 847|ppo_ep: 1|act_loss: -0.0772705078125|cri_loss: 0.03521728515625|unsuper_loss: 0.0 average reward score: 1.546875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.63%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82 epoch: 0|step: 848|ppo_ep: 1|act_loss: -0.0516357421875|cri_loss: 0.033599853515625|unsuper_loss: 0.0 average reward score: -0.73828125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.59%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82 [2023-07-01 08:42:55,362] [INFO] [logging.py:96:log_dist] [Rank 0] step=850, skipped=16, lr=[4.6254045649395126e-07, 4.6254045649395126e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:42:55,539] [INFO] [timer.py:215:stop] epoch=0/micro_step=850/global_step=850, RunningAvgSamplesPerSec=51.598060108699976, CurrSamplesPerSec=51.04028493601603, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:42:55,698] [INFO] [logging.py:96:log_dist] [Rank 0] step=850, skipped=14, lr=[2.3186105841041418e-07, 2.3186105841041418e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 849|ppo_ep: 1|act_loss: -0.11651611328125|cri_loss: 0.045501708984375|unsuper_loss: 0.0 average reward score: 1.197265625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.70%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 850|ppo_ep: 1|act_loss: -0.0482177734375|cri_loss: 0.055572509765625|unsuper_loss: 0.0 average reward score: 1.2587890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.30%) |Training time=0.80s (31.90%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82 epoch: 0|step: 851|ppo_ep: 1|act_loss: -0.025909423828125|cri_loss: 0.0386962890625|unsuper_loss: 0.0 average reward score: 0.958984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.72%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 852|ppo_ep: 1|act_loss: -0.15966796875|cri_loss: 0.06219482421875|unsuper_loss: 0.0 average reward score: 1.181640625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.53%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82 epoch: 0|step: 853|ppo_ep: 1|act_loss: -0.2315673828125|cri_loss: 0.11053466796875|unsuper_loss: 0.0 average reward score: 0.57666015625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.78s (31.45%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.82 epoch: 0|step: 854|ppo_ep: 1|act_loss: -0.4189453125|cri_loss: 0.18017578125|unsuper_loss: 0.0 average reward score: -2.33984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.42%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82 epoch: 0|step: 855|ppo_ep: 1|act_loss: -0.0657958984375|cri_loss: 0.08935546875|unsuper_loss: 0.0 average reward score: -0.64306640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.62%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 856|ppo_ep: 1|act_loss: -0.0465087890625|cri_loss: 0.051605224609375|unsuper_loss: 0.0 average reward score: -0.497314453125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.75%) |Training time=0.78s (31.44%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82 epoch: 0|step: 857|ppo_ep: 1|act_loss: -0.1798095703125|cri_loss: 0.091796875|unsuper_loss: 0.0 average reward score: 1.6103515625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.40%) |Training time=0.80s (31.81%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 858|ppo_ep: 1|act_loss: -0.178466796875|cri_loss: 0.058807373046875|unsuper_loss: 0.0 average reward score: -0.4541015625 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.40%) |Training time=0.80s (31.80%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82 [2023-07-01 08:43:20,340] [INFO] [logging.py:96:log_dist] [Rank 0] step=860, skipped=16, lr=[3.8967490795613135e-07, 3.8967490795613135e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:43:20,521] [INFO] [timer.py:215:stop] epoch=0/micro_step=860/global_step=860, RunningAvgSamplesPerSec=51.59237982285124, CurrSamplesPerSec=50.91605612772703, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:43:20,681] [INFO] [logging.py:96:log_dist] [Rank 0] step=860, skipped=14, lr=[1.9472485307027945e-07, 1.9472485307027945e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 859|ppo_ep: 1|act_loss: -0.278564453125|cri_loss: 0.06634521484375|unsuper_loss: 0.0 average reward score: -0.243408203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.75%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82 epoch: 0|step: 860|ppo_ep: 1|act_loss: -0.2164306640625|cri_loss: 0.0438232421875|unsuper_loss: 0.0 average reward score: 0.44384765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.65%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 861|ppo_ep: 1|act_loss: -0.11895751953125|cri_loss: 0.0361328125|unsuper_loss: 0.0 average reward score: -0.544921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.48%) |Training time=0.79s (31.76%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82 epoch: 0|step: 862|ppo_ep: 1|act_loss: 0.025390625|cri_loss: 0.01168060302734375|unsuper_loss: 0.0 average reward score: 1.1591796875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.56%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82 epoch: 0|step: 863|ppo_ep: 1|act_loss: -0.043853759765625|cri_loss: 0.018585205078125|unsuper_loss: 0.0 average reward score: 1.791015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.41%) |Training time=0.80s (31.84%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 864|ppo_ep: 1|act_loss: 0.0258026123046875|cri_loss: 0.01141357421875|unsuper_loss: 0.0 average reward score: 0.355224609375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.63%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 865|ppo_ep: 1|act_loss: 0.043121337890625|cri_loss: 0.0229034423828125|unsuper_loss: 0.0 average reward score: -0.026123046875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.79s (31.50%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82 epoch: 0|step: 866|ppo_ep: 1|act_loss: -0.0021762847900390625|cri_loss: 0.01274871826171875|unsuper_loss: 0.0 average reward score: 0.68505859375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.61%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82 epoch: 0|step: 867|ppo_ep: 1|act_loss: 0.0406494140625|cri_loss: 0.0157928466796875|unsuper_loss: 0.0 average reward score: 0.3671875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.56%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82 epoch: 0|step: 868|ppo_ep: 1|act_loss: 0.0023136138916015625|cri_loss: 0.0426025390625|unsuper_loss: 0.0 average reward score: 0.7802734375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.63%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 [2023-07-01 08:43:45,314] [INFO] [logging.py:96:log_dist] [Rank 0] step=870, skipped=16, lr=[3.2281086873267354e-07, 3.2281086873267354e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:43:45,490] [INFO] [timer.py:215:stop] epoch=0/micro_step=870/global_step=870, RunningAvgSamplesPerSec=51.5877933467114, CurrSamplesPerSec=51.501807697237, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:43:45,650] [INFO] [logging.py:96:log_dist] [Rank 0] step=870, skipped=14, lr=[1.607079523987662e-07, 1.607079523987662e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 869|ppo_ep: 1|act_loss: 0.033447265625|cri_loss: 0.0211181640625|unsuper_loss: 0.0 average reward score: 0.86181640625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.65%) |Training time=0.79s (31.56%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82 epoch: 0|step: 870|ppo_ep: 1|act_loss: 0.01003265380859375|cri_loss: 0.034332275390625|unsuper_loss: 0.0 average reward score: -1.119140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.70%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82 epoch: 0|step: 871|ppo_ep: 1|act_loss: -0.0030384063720703125|cri_loss: 0.0274658203125|unsuper_loss: 0.0 average reward score: 1.5908203125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.67%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 872|ppo_ep: 1|act_loss: 0.005985260009765625|cri_loss: 0.0111846923828125|unsuper_loss: 0.0 average reward score: 1.9921875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.36%) |Training time=0.80s (31.78%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82 epoch: 0|step: 873|ppo_ep: 1|act_loss: -0.0399169921875|cri_loss: 0.01165771484375|unsuper_loss: 0.0 average reward score: 1.345703125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.24%) |Training time=0.80s (31.95%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.82 epoch: 0|step: 874|ppo_ep: 1|act_loss: -0.0528564453125|cri_loss: 0.0241241455078125|unsuper_loss: 0.0 average reward score: 2.185546875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.36%) |Training time=0.80s (31.86%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82 epoch: 0|step: 875|ppo_ep: 1|act_loss: 0.0750732421875|cri_loss: 0.0255889892578125|unsuper_loss: 0.0 average reward score: 0.236328125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.43%) |Training time=0.79s (31.76%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 876|ppo_ep: 1|act_loss: -0.0244140625|cri_loss: 0.01404571533203125|unsuper_loss: 0.0 average reward score: 1.009765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.50%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 877|ppo_ep: 1|act_loss: -0.1212158203125|cri_loss: 0.0931396484375|unsuper_loss: 0.0 average reward score: 1.9169921875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.80s (31.73%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82 epoch: 0|step: 878|ppo_ep: 1|act_loss: -0.0438232421875|cri_loss: 0.015869140625|unsuper_loss: 0.0 average reward score: -1.5380859375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.65%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82 [2023-07-01 08:44:10,336] [INFO] [logging.py:96:log_dist] [Rank 0] step=880, skipped=16, lr=[2.6203881362437934e-07, 2.6203881362437934e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:44:10,513] [INFO] [timer.py:215:stop] epoch=0/micro_step=880/global_step=880, RunningAvgSamplesPerSec=51.57930822120318, CurrSamplesPerSec=50.97633293859113, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:44:10,673] [INFO] [logging.py:96:log_dist] [Rank 0] step=880, skipped=14, lr=[1.298563852081905e-07, 1.298563852081905e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 879|ppo_ep: 1|act_loss: -0.07330322265625|cri_loss: 0.0267181396484375|unsuper_loss: 0.0 average reward score: 0.132568359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.71%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 880|ppo_ep: 1|act_loss: -0.05224609375|cri_loss: 0.0175323486328125|unsuper_loss: 0.0 average reward score: 0.99609375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.95%) |Training time=0.78s (31.29%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.82 epoch: 0|step: 881|ppo_ep: 1|act_loss: 0.037750244140625|cri_loss: 0.0272369384765625|unsuper_loss: 0.0 average reward score: 2.283203125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.33%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.82 epoch: 0|step: 882|ppo_ep: 1|act_loss: -0.028076171875|cri_loss: 0.018157958984375|unsuper_loss: 0.0 average reward score: 1.2802734375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.69%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 883|ppo_ep: 1|act_loss: 0.0125579833984375|cri_loss: 0.03961181640625|unsuper_loss: 0.0 average reward score: 0.75537109375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.65%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82 epoch: 0|step: 884|ppo_ep: 1|act_loss: -0.0096435546875|cri_loss: 0.01611328125|unsuper_loss: 0.0 average reward score: -1.69921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.67%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82 epoch: 0|step: 885|ppo_ep: 1|act_loss: -0.038299560546875|cri_loss: 0.0307769775390625|unsuper_loss: 0.0 average reward score: 0.420166015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.73%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82 epoch: 0|step: 886|ppo_ep: 1|act_loss: 0.074951171875|cri_loss: 0.037750244140625|unsuper_loss: 0.0 average reward score: -0.012451171875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.65%) |Training time=0.79s (31.52%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 887|ppo_ep: 1|act_loss: -0.0635986328125|cri_loss: 0.09283447265625|unsuper_loss: 0.0 average reward score: 1.556640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.56%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82 epoch: 0|step: 888|ppo_ep: 1|act_loss: 0.02508544921875|cri_loss: 0.0252838134765625|unsuper_loss: 0.0 average reward score: -0.7734375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.66%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 [2023-07-01 08:44:35,288] [INFO] [logging.py:96:log_dist] [Rank 0] step=890, skipped=16, lr=[2.0744097427091748e-07, 2.0744097427091748e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:44:35,468] [INFO] [timer.py:215:stop] epoch=0/micro_step=890/global_step=890, RunningAvgSamplesPerSec=51.57652001427069, CurrSamplesPerSec=50.996631323297976, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:44:35,627] [INFO] [logging.py:96:log_dist] [Rank 0] step=890, skipped=14, lr=[1.0221189724751502e-07, 1.0221189724751502e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 889|ppo_ep: 1|act_loss: -0.0243377685546875|cri_loss: 0.0635986328125|unsuper_loss: 0.0 average reward score: -0.226318359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.71%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 890|ppo_ep: 1|act_loss: -0.000568389892578125|cri_loss: 0.021392822265625|unsuper_loss: 0.0 average reward score: 0.23974609375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.57%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82 epoch: 0|step: 891|ppo_ep: 1|act_loss: -0.0159454345703125|cri_loss: 0.022857666015625|unsuper_loss: 0.0 average reward score: 0.48779296875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.80s (31.79%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82 epoch: 0|step: 892|ppo_ep: 1|act_loss: -0.0914306640625|cri_loss: 0.0703125|unsuper_loss: 0.0 average reward score: -0.37548828125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.34%) |Training time=0.80s (31.86%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82 epoch: 0|step: 893|ppo_ep: 1|act_loss: -0.045745849609375|cri_loss: 0.02825927734375|unsuper_loss: 0.0 average reward score: 0.36669921875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.71%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 894|ppo_ep: 1|act_loss: -0.035736083984375|cri_loss: 0.018524169921875|unsuper_loss: 0.0 average reward score: 1.7275390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.61%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 895|ppo_ep: 1|act_loss: -0.005207061767578125|cri_loss: 0.0181121826171875|unsuper_loss: 0.0 average reward score: 0.04931640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.66%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 896|ppo_ep: 1|act_loss: 0.031280517578125|cri_loss: 0.0194244384765625|unsuper_loss: 0.0 average reward score: 1.267578125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.61%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 897|ppo_ep: 1|act_loss: 0.02276611328125|cri_loss: 0.0172882080078125|unsuper_loss: 0.0 average reward score: -1.1484375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.53%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82 epoch: 0|step: 898|ppo_ep: 1|act_loss: -0.0015802383422851562|cri_loss: 0.00897216796875|unsuper_loss: 0.0 average reward score: 0.467041015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.74%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 [2023-07-01 08:45:00,289] [INFO] [logging.py:96:log_dist] [Rank 0] step=900, skipped=16, lr=[1.590912278818792e-07, 1.590912278818792e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:45:00,465] [INFO] [timer.py:215:stop] epoch=0/micro_step=900/global_step=900, RunningAvgSamplesPerSec=51.570055453634914, CurrSamplesPerSec=50.599642682512595, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:45:00,626] [INFO] [logging.py:96:log_dist] [Rank 0] step=900, skipped=14, lr=[7.781189471550543e-08, 7.781189471550543e-08], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 899|ppo_ep: 1|act_loss: -0.060302734375|cri_loss: 0.026123046875|unsuper_loss: 0.0 average reward score: 0.7802734375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.33%) |Training time=0.80s (31.87%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 900|ppo_ep: 1|act_loss: -0.0238494873046875|cri_loss: 0.0219268798828125|unsuper_loss: 0.0 average reward score: -0.389892578125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.79s (31.47%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82 epoch: 0|step: 901|ppo_ep: 1|act_loss: -0.0010786056518554688|cri_loss: 0.00994873046875|unsuper_loss: 0.0 average reward score: 0.892578125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.54%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 902|ppo_ep: 1|act_loss: 0.0167388916015625|cri_loss: 0.01763916015625|unsuper_loss: 0.0 average reward score: 0.5302734375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.79s (31.52%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82 epoch: 0|step: 903|ppo_ep: 1|act_loss: 0.0302581787109375|cri_loss: 0.0230712890625|unsuper_loss: 0.0 average reward score: 1.37890625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.55%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82 epoch: 0|step: 904|ppo_ep: 1|act_loss: -0.052398681640625|cri_loss: 0.052215576171875|unsuper_loss: 0.0 average reward score: 0.705078125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.59%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 905|ppo_ep: 1|act_loss: 0.002933502197265625|cri_loss: 0.016021728515625|unsuper_loss: 0.0 average reward score: -0.1961669921875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.26%) |Training time=0.80s (31.94%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.82 epoch: 0|step: 906|ppo_ep: 1|act_loss: -0.050933837890625|cri_loss: 0.0238189697265625|unsuper_loss: 0.0 average reward score: 1.1953125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.11%) |Training time=0.81s (32.11%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.82 epoch: 0|step: 907|ppo_ep: 1|act_loss: 0.0265960693359375|cri_loss: 0.0195159912109375|unsuper_loss: 0.0 average reward score: -0.09381103515625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.51%) |Training time=0.79s (31.70%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82 epoch: 0|step: 908|ppo_ep: 1|act_loss: 0.0308837890625|cri_loss: 0.0213623046875|unsuper_loss: 0.0 average reward score: 1.70703125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.63%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 [2023-07-01 08:45:25,270] [INFO] [logging.py:96:log_dist] [Rank 0] step=910, skipped=16, lr=[1.1705499727233991e-07, 1.1705499727233991e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:45:25,449] [INFO] [timer.py:215:stop] epoch=0/micro_step=910/global_step=910, RunningAvgSamplesPerSec=51.564973884615625, CurrSamplesPerSec=51.461879529159155, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:45:25,609] [INFO] [logging.py:96:log_dist] [Rank 0] step=910, skipped=14, lr=[5.6689393645807666e-08, 5.6689393645807666e-08], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 909|ppo_ep: 1|act_loss: -0.0219268798828125|cri_loss: 0.01120758056640625|unsuper_loss: 0.0 average reward score: 0.9296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.50%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 910|ppo_ep: 1|act_loss: -0.00641632080078125|cri_loss: 0.00908660888671875|unsuper_loss: 0.0 average reward score: 1.1748046875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.60%) |Others=0.22 (8.90%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82 epoch: 0|step: 911|ppo_ep: 1|act_loss: 0.07330322265625|cri_loss: 0.026702880859375|unsuper_loss: 0.0 average reward score: 0.471923828125 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.80s (31.76%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82 epoch: 0|step: 912|ppo_ep: 1|act_loss: -0.0416259765625|cri_loss: 0.034393310546875|unsuper_loss: 0.0 average reward score: 0.239501953125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.58%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82 epoch: 0|step: 913|ppo_ep: 1|act_loss: -0.057342529296875|cri_loss: 0.0198822021484375|unsuper_loss: 0.0 average reward score: -0.6708984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.56%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 914|ppo_ep: 1|act_loss: 0.0325927734375|cri_loss: 0.0117340087890625|unsuper_loss: 0.0 average reward score: -0.008056640625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.72%) |Training time=0.79s (31.48%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 915|ppo_ep: 1|act_loss: 0.057861328125|cri_loss: 0.02679443359375|unsuper_loss: 0.0 average reward score: 1.568359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.66%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 916|ppo_ep: 1|act_loss: -0.0123748779296875|cri_loss: 0.0109405517578125|unsuper_loss: 0.0 average reward score: 1.1484375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.67%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82 epoch: 0|step: 917|ppo_ep: 1|act_loss: 0.01230621337890625|cri_loss: 0.01358795166015625|unsuper_loss: 0.0 average reward score: 0.7021484375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.87%) |Training time=0.78s (31.33%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.82 epoch: 0|step: 918|ppo_ep: 1|act_loss: -0.0010576248168945312|cri_loss: 0.01271820068359375|unsuper_loss: 0.0 average reward score: 0.3583984375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.59%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 [2023-07-01 08:45:50,256] [INFO] [logging.py:96:log_dist] [Rank 0] step=920, skipped=16, lr=[8.13891623382061e-08, 8.13891623382061e-08], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:45:50,434] [INFO] [timer.py:215:stop] epoch=0/micro_step=920/global_step=920, RunningAvgSamplesPerSec=51.560951397224706, CurrSamplesPerSec=50.731532967172114, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:45:50,594] [INFO] [logging.py:96:log_dist] [Rank 0] step=920, skipped=14, lr=[3.887297523242184e-08, 3.887297523242184e-08], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 919|ppo_ep: 1|act_loss: 0.009857177734375|cri_loss: 0.019012451171875|unsuper_loss: 0.0 average reward score: 0.151123046875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.40%) |Training time=0.80s (31.77%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82 epoch: 0|step: 920|ppo_ep: 1|act_loss: -0.019256591796875|cri_loss: 0.01953125|unsuper_loss: 0.0 average reward score: 1.4775390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.70%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82 epoch: 0|step: 921|ppo_ep: 1|act_loss: 0.057342529296875|cri_loss: 0.0191802978515625|unsuper_loss: 0.0 average reward score: 0.003662109375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.61%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82 epoch: 0|step: 922|ppo_ep: 1|act_loss: 0.06622314453125|cri_loss: 0.0276641845703125|unsuper_loss: 0.0 average reward score: 1.396484375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.61%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82 epoch: 0|step: 923|ppo_ep: 1|act_loss: -0.037200927734375|cri_loss: 0.0157318115234375|unsuper_loss: 0.0 average reward score: 1.4443359375 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.58%) |Training time=0.79s (31.57%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 924|ppo_ep: 1|act_loss: -0.0157012939453125|cri_loss: 0.0157928466796875|unsuper_loss: 0.0 average reward score: 0.2183837890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.51%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 925|ppo_ep: 1|act_loss: -0.0152130126953125|cri_loss: 0.0086669921875|unsuper_loss: 0.0 average reward score: 0.62841796875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.80s (31.79%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82 epoch: 0|step: 926|ppo_ep: 1|act_loss: 0.04119873046875|cri_loss: 0.0268096923828125|unsuper_loss: 0.0 average reward score: 1.16015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.54%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 927|ppo_ep: 1|act_loss: 0.02349853515625|cri_loss: 0.0118255615234375|unsuper_loss: 0.0 average reward score: 0.32275390625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.63%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82 epoch: 0|step: 928|ppo_ep: 1|act_loss: -0.035491943359375|cri_loss: 0.026763916015625|unsuper_loss: 0.0 average reward score: -0.451416015625 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.49%) |Training time=0.79s (31.70%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82 [2023-07-01 08:46:15,231] [INFO] [logging.py:96:log_dist] [Rank 0] step=930, skipped=16, lr=[5.2141983091115555e-08, 5.2141983091115555e-08], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:46:15,407] [INFO] [timer.py:215:stop] epoch=0/micro_step=930/global_step=930, RunningAvgSamplesPerSec=51.55690491493285, CurrSamplesPerSec=51.13309535329369, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:46:15,568] [INFO] [logging.py:96:log_dist] [Rank 0] step=930, skipped=14, lr=[2.4386747156034395e-08, 2.4386747156034395e-08], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 929|ppo_ep: 1|act_loss: -0.0701904296875|cri_loss: 0.0291595458984375|unsuper_loss: 0.0 average reward score: -0.09228515625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.64%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82 epoch: 0|step: 930|ppo_ep: 1|act_loss: -0.0335693359375|cri_loss: 0.01067352294921875|unsuper_loss: 0.0 average reward score: 1.45703125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.55%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.82 epoch: 0|step: 931|ppo_ep: 1|act_loss: -0.0938720703125|cri_loss: 0.0655517578125|unsuper_loss: 0.0 average reward score: 0.4677734375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.75%) |Training time=0.78s (31.47%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82 epoch: 0|step: 932|ppo_ep: 1|act_loss: 0.0243988037109375|cri_loss: 0.0057373046875|unsuper_loss: 0.0 average reward score: 0.1580810546875 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.78s (31.43%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82 epoch: 0|step: 933|ppo_ep: 1|act_loss: -0.06317138671875|cri_loss: 0.03302001953125|unsuper_loss: 0.0 average reward score: 1.09765625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.38%) |Training time=0.80s (31.80%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82 epoch: 0|step: 934|ppo_ep: 1|act_loss: 0.0249786376953125|cri_loss: 0.00763702392578125|unsuper_loss: 0.0 average reward score: 0.52197265625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.74%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 935|ppo_ep: 1|act_loss: 0.0228118896484375|cri_loss: 0.0472412109375|unsuper_loss: 0.0 average reward score: -1.62890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.52%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 936|ppo_ep: 1|act_loss: -0.049560546875|cri_loss: 0.0103912353515625|unsuper_loss: 0.0 average reward score: 1.54296875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.54%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 937|ppo_ep: 1|act_loss: -0.04327392578125|cri_loss: 0.0217437744140625|unsuper_loss: 0.0 average reward score: -0.12451171875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.45%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82 epoch: 0|step: 938|ppo_ep: 1|act_loss: 0.04107666015625|cri_loss: 0.016998291015625|unsuper_loss: 0.0 average reward score: 0.767578125 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.80s (31.75%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82 [2023-07-01 08:46:40,218] [INFO] [logging.py:96:log_dist] [Rank 0] step=940, skipped=16, lr=[2.935303435704569e-08, 2.935303435704569e-08], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:46:40,398] [INFO] [timer.py:215:stop] epoch=0/micro_step=940/global_step=940, RunningAvgSamplesPerSec=51.55280382459751, CurrSamplesPerSec=50.4776799280318, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:46:40,557] [INFO] [logging.py:96:log_dist] [Rank 0] step=940, skipped=14, lr=[1.3250310963527358e-08, 1.3250310963527358e-08], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 939|ppo_ep: 1|act_loss: 0.01678466796875|cri_loss: 0.01320648193359375|unsuper_loss: 0.0 average reward score: -0.998046875 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.35%) |Training time=0.80s (31.89%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82 epoch: 0|step: 940|ppo_ep: 1|act_loss: -0.016937255859375|cri_loss: 0.037261962890625|unsuper_loss: 0.0 average reward score: 0.9150390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.63%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 941|ppo_ep: 1|act_loss: -0.07684326171875|cri_loss: 0.05615234375|unsuper_loss: 0.0 average reward score: -1.2978515625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.54%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82 epoch: 0|step: 942|ppo_ep: 1|act_loss: 0.015228271484375|cri_loss: 0.00942230224609375|unsuper_loss: 0.0 average reward score: 1.7900390625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.56%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 943|ppo_ep: 1|act_loss: -0.0157012939453125|cri_loss: 0.00725555419921875|unsuper_loss: 0.0 average reward score: 0.22119140625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.62%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82 epoch: 0|step: 944|ppo_ep: 1|act_loss: -0.030517578125|cri_loss: 0.0161285400390625|unsuper_loss: 0.0 average reward score: 0.79736328125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.64%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82 epoch: 0|step: 945|ppo_ep: 1|act_loss: -0.0014429092407226562|cri_loss: 0.016357421875|unsuper_loss: 0.0 average reward score: 0.77783203125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.58%) |Training time=0.79s (31.63%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82 epoch: 0|step: 946|ppo_ep: 1|act_loss: -0.004383087158203125|cri_loss: 0.0068206787109375|unsuper_loss: 0.0 average reward score: -0.0230712890625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.40%) |Training time=0.79s (31.79%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82 epoch: 0|step: 947|ppo_ep: 1|act_loss: -0.0028743743896484375|cri_loss: 0.0187835693359375|unsuper_loss: 0.0 average reward score: 0.63818359375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.54%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82 epoch: 0|step: 948|ppo_ep: 1|act_loss: -0.017242431640625|cri_loss: 0.0217742919921875|unsuper_loss: 0.0 average reward score: 0.158935546875 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.65%) |Training time=0.79s (31.48%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82 [2023-07-01 08:47:05,179] [INFO] [logging.py:96:log_dist] [Rank 0] step=950, skipped=16, lr=[1.3053152226982066e-08, 1.3053152226982066e-08], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-07-01 08:47:05,355] [INFO] [timer.py:215:stop] epoch=0/micro_step=950/global_step=950, RunningAvgSamplesPerSec=51.55028020879753, CurrSamplesPerSec=51.49623536272809, MemAllocated=12.09GB, MaxMemAllocated=21.86GB [2023-07-01 08:47:05,513] [INFO] [logging.py:96:log_dist] [Rank 0] step=950, skipped=14, lr=[5.478735544813263e-09, 5.478735544813263e-09], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 949|ppo_ep: 1|act_loss: 0.01763916015625|cri_loss: 0.0254058837890625|unsuper_loss: 0.0 average reward score: 0.6220703125 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.69%) |Training time=0.79s (31.56%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82 epoch: 0|step: 950|ppo_ep: 1|act_loss: 0.014923095703125|cri_loss: 0.01549530029296875|unsuper_loss: 0.0 average reward score: 0.87109375 ------------------------------------------------------------------------------------- |E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.99%) |Training time=0.78s (31.20%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.87 |AvgSamplesPerSec=12.82 epoch: 0|step: 951|ppo_ep: 1|act_loss: 0.09979248046875|cri_loss: 0.053955078125|unsuper_loss: 0.0 average reward score: 1.1416015625 ------------------------------------------------------------------------------------- |E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.72%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82 epoch: 0|step: 952|ppo_ep: 1|act_loss: -0.037689208984375|cri_loss: 0.0236358642578125|unsuper_loss: 0.0 average reward score: 1.1552734375 ------------------------------------------------------------------------------------- |E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.36%) |Training time=0.80s (31.81%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82 epoch: 0|step: 953|ppo_ep: 1|act_loss: -0.0743408203125|cri_loss: 0.01526641845703125|unsuper_loss: 0.0 average reward score: 1.4033203125 ------------------------------------------------------------------------------------- saving model ...saving model ...saving model ...saving model ...saving model ...saving model ...saving model ... saving model ... [2023-07-01 08:47:19,072] [INFO] [launch.py:346:main] Process 8928 exits successfully. [2023-07-01 08:47:19,072] [INFO] [launch.py:346:main] Process 8930 exits successfully. [2023-07-01 08:47:19,072] [INFO] [launch.py:346:main] Process 8929 exits successfully. [2023-07-01 08:47:19,072] [INFO] [launch.py:346:main] Process 8931 exits successfully. [2023-07-01 08:47:20,074] [INFO] [launch.py:346:main] Process 8925 exits successfully. [2023-07-01 08:47:20,074] [INFO] [launch.py:346:main] Process 8926 exits successfully. [2023-07-01 08:47:20,074] [INFO] [launch.py:346:main] Process 8927 exits successfully. [2023-07-01 08:47:27,082] [INFO] [launch.py:346:main] Process 8924 exits successfully.