[2023-06-30 05:29:08,875] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 05:29:09,890] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-06-30 05:29:09,947] [INFO] [runner.py:555:main] cmd = /home/mxfeng/miniconda3/envs/safe-rlhf/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=12346 --enable_each_rank_log=None main.py --data_path Dahoas/rm-static --data_split 2,4,4 --actor_model_name_or_path ../step1_supervised_finetuning/output --critic_model_name_or_path ../step2_reward_model_finetuning/output --num_padding_at_beginning 1 --per_device_train_batch_size 4 --per_device_mini_train_batch_size 4 --generation_batch_numbers 1 --ppo_epochs 1 --max_answer_seq_len 256 --max_prompt_seq_len 256 --actor_learning_rate 9.65e-6 --critic_learning_rate 5e-6 --num_train_epochs 1 --lr_scheduler_type cosine --gradient_accumulation_steps 1 --disable_actor_dropout --num_warmup_steps 100 --deepspeed --seed 1234 --enable_hybrid_engine --actor_zero_stage 2 --critic_zero_stage 2 --enable_ema --output_dir ./output
[2023-06-30 05:29:11,211] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 05:29:12,241] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2023-06-30 05:29:12,241] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=8, node_rank=0
[2023-06-30 05:29:12,241] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2023-06-30 05:29:12,241] [INFO] [launch.py:163:main] dist_world_size=8
[2023-06-30 05:29:12,241] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
[2023-06-30 05:29:14,003] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 05:29:14,062] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 05:29:14,079] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 05:29:14,080] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 05:29:14,082] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 05:29:14,086] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 05:29:14,087] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 05:29:14,088] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 05:29:16,473] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-30 05:29:16,473] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-06-30 05:29:16,545] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-30 05:29:16,545] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-06-30 05:29:16,550] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-30 05:29:16,550] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-06-30 05:29:16,550] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2023-06-30 05:29:16,556] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-30 05:29:16,556] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-06-30 05:29:16,579] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-30 05:29:16,579] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-06-30 05:29:16,583] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-30 05:29:16,584] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-06-30 05:29:16,585] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-30 05:29:16,585] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-06-30 05:29:16,591] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-30 05:29:16,591] [INFO] [comm.py:594:init_distributed] cdb=None
************************[start] Initializing Actor Model [start] *************************
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/mxfeng/.cache/torch_extensions/py310_cu117/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_adam...
Time to load fused_adam op: 0.19387316703796387 seconds
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/mxfeng/.cache/torch_extensions/py310_cu117/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_adam...
Time to load fused_adam op: 0.1082601547241211 seconds
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/mxfeng/.cache/torch_extensions/py310_cu117/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_adam...
Time to load fused_adam op: 0.10686612129211426 seconds
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/mxfeng/.cache/torch_extensions/py310_cu117/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_adam...
Time to load fused_adam op: 0.09968805313110352 seconds
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/mxfeng/.cache/torch_extensions/py310_cu117/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_adam...
Time to load fused_adam op: 0.11735749244689941 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 0.1028139591217041 seconds
[2023-06-30 05:29:48,215] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.5, git-hash=unknown, git-branch=unknown
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/mxfeng/.cache/torch_extensions/py310_cu117/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_adam...
Time to load fused_adam op: 0.15515780448913574 seconds
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/mxfeng/.cache/torch_extensions/py310_cu117/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_adam...
Time to load fused_adam op: 0.16137456893920898 seconds
[2023-06-30 05:29:52,046] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-06-30 05:29:52,048] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer
[2023-06-30 05:29:52,048] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2023-06-30 05:29:52,071] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam
[2023-06-30 05:29:52,071] [INFO] [utils.py:54:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'deepspeed.ops.adam.fused_adam.FusedAdam'>
[2023-06-30 05:29:52,071] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer
[2023-06-30 05:29:52,071] [INFO] [stage_1_and_2.py:133:__init__] Reduce bucket size 500,000,000
[2023-06-30 05:29:52,071] [INFO] [stage_1_and_2.py:134:__init__] Allgather bucket size 500,000,000
[2023-06-30 05:29:52,071] [INFO] [stage_1_and_2.py:135:__init__] CPU Offload: False
[2023-06-30 05:29:52,071] [INFO] [stage_1_and_2.py:136:__init__] Round robin gradient partitioning: False
Rank: 4 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 3 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 5 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 0 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 6 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 2 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 7 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 1 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
[2023-06-30 05:29:58,131] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states
[2023-06-30 05:29:58,132] [INFO] [utils.py:786:see_memory_usage] MA 3.06 GB         Max_MA 3.06 GB         CA 3.07 GB         Max_CA 3 GB 
[2023-06-30 05:29:58,133] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 43.48 GB, percent = 8.6%
[2023-06-30 05:29:58,401] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states
[2023-06-30 05:29:58,402] [INFO] [utils.py:786:see_memory_usage] MA 4.29 GB         Max_MA 4.91 GB         CA 4.91 GB         Max_CA 5 GB 
[2023-06-30 05:29:58,402] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 43.49 GB, percent = 8.6%
[2023-06-30 05:29:58,402] [INFO] [stage_1_and_2.py:488:__init__] optimizer state initialized
[2023-06-30 05:29:58,627] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer
[2023-06-30 05:29:58,628] [INFO] [utils.py:786:see_memory_usage] MA 4.29 GB         Max_MA 4.29 GB         CA 4.91 GB         Max_CA 5 GB 
[2023-06-30 05:29:58,628] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 43.49 GB, percent = 8.6%
[2023-06-30 05:29:58,630] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2023-06-30 05:29:58,630] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2023-06-30 05:29:58,630] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x7fb8adf08790>
[2023-06-30 05:29:58,631] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:29:58,631] [INFO] [config.py:960:print] DeepSpeedEngine configuration:
[2023-06-30 05:29:58,631] [INFO] [config.py:964:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2023-06-30 05:29:58,631] [INFO] [config.py:964:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-06-30 05:29:58,631] [INFO] [config.py:964:print]   amp_enabled .................. False
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   amp_params ................... False
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   bfloat16_enabled ............. False
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   checkpoint_parallel_write_pipeline  False
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   checkpoint_tag_validation_enabled  True
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   checkpoint_tag_validation_fail  False
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7fb8adf08970>
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   communication_data_type ...... None
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   curriculum_enabled_legacy .... False
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   curriculum_params_legacy ..... False
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   data_efficiency_enabled ...... False
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   dataloader_drop_last ......... False
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   disable_allgather ............ False
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   dump_state ................... False
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1}
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   eigenvalue_enabled ........... False
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   eigenvalue_gas_boundary_resolution  1
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-06-30 05:29:58,632] [INFO] [config.py:964:print]   eigenvalue_layer_num ......... 0
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   eigenvalue_max_iter .......... 100
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   eigenvalue_stability ......... 1e-06
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   eigenvalue_tol ............... 0.01
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   eigenvalue_verbose ........... False
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   elasticity_enabled ........... False
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   fp16_auto_cast ............... False
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   fp16_enabled ................. True
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   fp16_master_weights_and_gradients  False
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   global_rank .................. 0
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   grad_accum_dtype ............. None
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   gradient_accumulation_steps .. 1
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   gradient_clipping ............ 1.0
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   gradient_predivide_factor .... 1.0
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   hybrid_engine ................ enabled=True max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   initial_dynamic_scale ........ 65536
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   load_universal_checkpoint .... False
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   loss_scale ................... 0
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   memory_breakdown ............. False
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   mics_hierarchial_params_gather  False
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   mics_shard_size .............. -1
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   optimizer_legacy_fusion ...... False
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   optimizer_name ............... None
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   optimizer_params ............. None
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   pld_enabled .................. False
[2023-06-30 05:29:58,633] [INFO] [config.py:964:print]   pld_params ................... False
[2023-06-30 05:29:58,634] [INFO] [config.py:964:print]   prescale_gradients ........... False
[2023-06-30 05:29:58,634] [INFO] [config.py:964:print]   scheduler_name ............... None
[2023-06-30 05:29:58,634] [INFO] [config.py:964:print]   scheduler_params ............. None
[2023-06-30 05:29:58,634] [INFO] [config.py:964:print]   sparse_attention ............. None
[2023-06-30 05:29:58,634] [INFO] [config.py:964:print]   sparse_gradients_enabled ..... False
[2023-06-30 05:29:58,634] [INFO] [config.py:964:print]   steps_per_print .............. 10
[2023-06-30 05:29:58,634] [INFO] [config.py:964:print]   train_batch_size ............. 32
[2023-06-30 05:29:58,634] [INFO] [config.py:964:print]   train_micro_batch_size_per_gpu  4
[2023-06-30 05:29:58,634] [INFO] [config.py:964:print]   use_node_local_storage ....... False
[2023-06-30 05:29:58,634] [INFO] [config.py:964:print]   wall_clock_breakdown ......... False
[2023-06-30 05:29:58,634] [INFO] [config.py:964:print]   world_size ................... 8
[2023-06-30 05:29:58,634] [INFO] [config.py:964:print]   zero_allow_untested_optimizer  False
[2023-06-30 05:29:58,634] [INFO] [config.py:964:print]   zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False
[2023-06-30 05:29:58,634] [INFO] [config.py:964:print]   zero_enabled ................. True
[2023-06-30 05:29:58,634] [INFO] [config.py:964:print]   zero_force_ds_cpu_optimizer .. True
[2023-06-30 05:29:58,634] [INFO] [config.py:964:print]   zero_optimization_stage ...... 2
[2023-06-30 05:29:58,634] [INFO] [config.py:950:print_user_config]   json = {
    "train_batch_size": 32, 
    "train_micro_batch_size_per_gpu": 4, 
    "steps_per_print": 10, 
    "zero_optimization": {
        "stage": 2, 
        "offload_param": {
            "device": "none"
        }, 
        "offload_optimizer": {
            "device": "none"
        }, 
        "stage3_param_persistence_threshold": 1.000000e+04, 
        "stage3_max_live_parameters": 3.000000e+07, 
        "stage3_prefetch_bucket_size": 3.000000e+07, 
        "memory_efficient_linear": false
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale_window": 100
    }, 
    "gradient_clipping": 1.0, 
    "prescale_gradients": false, 
    "wall_clock_breakdown": false, 
    "hybrid_engine": {
        "enabled": true, 
        "max_out_tokens": 512, 
        "inference_tp_size": 1, 
        "release_inference_cache": false, 
        "pin_parameters": true, 
        "tp_gather_partition_size": 8
    }
}
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/mxfeng/.cache/torch_extensions/py310_cu117/transformer_inference/build.ninja...
Building extension module transformer_inference...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/11] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/TH -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -c /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pointwise_ops.cu -o pointwise_ops.cuda.o 
[2/11] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/TH -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -c /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.cu -o relu.cuda.o 
[3/11] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/TH -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -c /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.cu -o dequantize.cuda.o 
[4/11] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/TH -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -c /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu -o transform.cuda.o 
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(63): warning: variable "lane" was declared but never referenced

/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(103): warning: variable "half_dim" was declared but never referenced
          detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int) [with T=__half]" 
(265): here

/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(120): warning: variable "vals_half" was declared but never referenced
          detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int) [with T=__half]" 
(265): here

/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(121): warning: variable "output_half" was declared but never referenced
          detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int) [with T=__half]" 
(265): here

/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu(138): warning: variable "lane" was declared but never referenced
          detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int) [with T=__half]" 
(265): here

[5/11] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/TH -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -c /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu -o gelu.cuda.o 
[6/11] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/TH -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -c /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu -o apply_rotary_pos_emb.cuda.o 
ptxas /tmp/tmpxft_0014957e_00000000-6_apply_rotary_pos_emb.ptx, line 68; warning : ld
ptxas /tmp/tmpxft_0014957e_00000000-6_apply_rotary_pos_emb.ptx, line 75; warning : ld
ptxas /tmp/tmpxft_0014957e_00000000-6_apply_rotary_pos_emb.ptx, line 203; warning : ld
ptxas /tmp/tmpxft_0014957e_00000000-6_apply_rotary_pos_emb.ptx, line 210; warning : ld
ptxas /tmp/tmpxft_0014957e_00000000-6_apply_rotary_pos_emb.ptx, line 338; warning : ld
ptxas /tmp/tmpxft_0014957e_00000000-6_apply_rotary_pos_emb.ptx, line 345; warning : ld
ptxas /tmp/tmpxft_0014957e_00000000-6_apply_rotary_pos_emb.ptx, line 470; warning : ld
ptxas /tmp/tmpxft_0014957e_00000000-6_apply_rotary_pos_emb.ptx, line 477; warning : ld
ptxas /tmp/tmpxft_0014957e_00000000-6_apply_rotary_pos_emb.ptx, line 2477; warning : ld
ptxas /tmp/tmpxft_0014957e_00000000-6_apply_rotary_pos_emb.ptx, line 2485; warning : ld
ptxas /tmp/tmpxft_0014957e_00000000-6_apply_rotary_pos_emb.ptx, line 2712; warning : ld
ptxas /tmp/tmpxft_0014957e_00000000-6_apply_rotary_pos_emb.ptx, line 2720; warning : ld
ptxas /tmp/tmpxft_0014957e_00000000-6_apply_rotary_pos_emb.ptx, line 2947; warning : ld
ptxas /tmp/tmpxft_0014957e_00000000-6_apply_rotary_pos_emb.ptx, line 2955; warning : ld
ptxas /tmp/tmpxft_0014957e_00000000-6_apply_rotary_pos_emb.ptx, line 3179; warning : ld
ptxas /tmp/tmpxft_0014957e_00000000-6_apply_rotary_pos_emb.ptx, line 3187; warning : ld
[7/11] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/TH -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -c /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/rms_norm.cu -o rms_norm.cuda.o 
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 1988; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 2019; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 2425; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 2462; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 3329; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 3365; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 4090; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 4127; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 5459; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 5495; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 6540; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 6577; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 8354; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 8389; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 9735; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 9771; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 13693; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 13724; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 14192; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 14229; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 15107; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 15143; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 15964; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 16001; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 17357; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 17393; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 18543; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 18580; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 20404; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 20440; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 21931; warning : ld
ptxas /tmp/tmpxft_00149579_00000000-6_rms_norm.ptx, line 21968; warning : ld
[8/11] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/TH -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -c /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.cu -o softmax.cuda.o 
[9/11] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/TH -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -c /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/layer_norm.cu -o layer_norm.cuda.o 
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 2417; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 2450; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 2623; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 2656; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 3465; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 3501; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 3805; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 3841; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 5210; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 5245; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 5681; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 5717; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 7611; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 7646; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 8191; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 8227; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 11094; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 11130; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 11270; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 11306; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 11958; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 11993; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 12213; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 12249; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 13268; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 13303; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 13569; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 13605; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 14994; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 15029; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 15355; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 15390; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 20022; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 20058; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 20245; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 20281; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 21704; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 21740; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 22069; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 22105; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 24361; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 24396; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 24798; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 24834; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 27891; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 27926; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 28440; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 28476; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 31738; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 31774; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 31928; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 31964; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 32959; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 32994; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 33203; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 33239; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 34735; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 34770; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 35016; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 35052; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 37056; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 37091; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 37397; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 37432; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 42085; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 42116; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 42284; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 42315; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 43778; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 43814; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 44134; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 44170; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 46596; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 46631; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 47081; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 47117; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 50437; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 50472; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 51038; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 51074; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 54591; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 54627; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 54748; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 54784; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 55921; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 55956; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 56133; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 56169; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 57995; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 58030; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 58263; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 58299; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 60798; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 60833; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 61134; warning : ld
ptxas /tmp/tmpxft_00149576_00000000-6_layer_norm.ptx, line 61169; warning : ld
[10/11] c++ -MMD -MF pt_binding.o.d -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/TH -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/mxfeng/miniconda3/envs/safe-rlhf/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -c /home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp -o pt_binding.o 
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp: In instantiation of ‘std::vector<at::Tensor> ds_softmax_context(at::Tensor&, at::Tensor&, int, bool, bool, int, float, bool, bool, int, bool, unsigned int, unsigned int, at::Tensor&) [with T = float]’:
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1978:5:   required from here
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:536:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  536 |                                      {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
      |                                       ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:536:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:537:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  537 |                                       k * InferenceContext::Instance().GetMaxTokenLength(),
      |                                       ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:537:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:545:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  545 |                          {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
      |                           ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:545:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:546:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  546 |                           k * InferenceContext::Instance().GetMaxTokenLength(),
      |                           ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:546:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp: In instantiation of ‘std::vector<at::Tensor> ds_rms_mlp_gemm(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, float, at::Tensor&, at::Tensor&, bool, int, bool) [with T = float]’:
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1978:5:   required from here
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1571:72: warning: narrowing conversion of ‘(size_t)mlp_1_out_neurons’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
 1571 |         at::from_blob(intermediate_ptr, {input.size(0), input.size(1), mlp_1_out_neurons}, options);
      |                                                                        ^~~~~~~~~~~~~~~~~
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1571:72: warning: narrowing conversion of ‘mlp_1_out_neurons’ from ‘const size_t’ {aka ‘const long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp: In instantiation of ‘std::vector<at::Tensor> ds_softmax_context(at::Tensor&, at::Tensor&, int, bool, bool, int, float, bool, bool, int, bool, unsigned int, unsigned int, at::Tensor&) [with T = __half]’:
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1979:5:   required from here
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:536:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  536 |                                      {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
      |                                       ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:536:50: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:537:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  537 |                                       k * InferenceContext::Instance().GetMaxTokenLength(),
      |                                       ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:537:41: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:545:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  545 |                          {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
      |                           ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:545:38: warning: narrowing conversion of ‘(((size_t)hidden_dim) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:546:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  546 |                           k * InferenceContext::Instance().GetMaxTokenLength(),
      |                           ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:546:29: warning: narrowing conversion of ‘(((size_t)k) * (& InferenceContext::Instance())->InferenceContext::GetMaxTokenLength())’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp: In instantiation of ‘std::vector<at::Tensor> ds_rms_mlp_gemm(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, float, at::Tensor&, at::Tensor&, bool, int, bool) [with T = __half]’:
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1979:5:   required from here
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1571:72: warning: narrowing conversion of ‘(size_t)mlp_1_out_neurons’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
 1571 |         at::from_blob(intermediate_ptr, {input.size(0), input.size(1), mlp_1_out_neurons}, options);
      |                                                                        ^~~~~~~~~~~~~~~~~
/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:1571:72: warning: narrowing conversion of ‘mlp_1_out_neurons’ from ‘const size_t’ {aka ‘const long unsigned int’} to ‘long int’ [-Wnarrowing]
[11/11] c++ pt_binding.o gelu.cuda.o relu.cuda.o layer_norm.cuda.o rms_norm.cuda.o softmax.cuda.o dequantize.cuda.o apply_rotary_pos_emb.cuda.o transform.cuda.o pointwise_ops.cuda.o -shared -lcurand -L/home/mxfeng/miniconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o transformer_inference.so
Loading extension module transformer_inference...
Time to load transformer_inference op: 34.986114740371704 seconds
[2023-06-30 05:30:33,650] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 2048, 'intermediate_size': 8192, 'heads': 32, 'num_hidden_layers': -1, 'dtype': torch.float16, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.ReLU: 2>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 512, 'min_out_tokens': 512, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': True, 'transposed_mode': True}
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.0026161670684814453 seconds
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.002323150634765625 seconds
******************[end] Initialized Actor Model [end] (duration: 63.60s)******************
*************************[start] Initializing Ref Model [start] **************************
Loading extension module transformer_inference...
Loading extension module transformer_inference...
Loading extension module transformer_inference...
Time to load transformer_inference op: 35.06632614135742 secondsTime to load transformer_inference op: 35.06630563735962 seconds

Time to load transformer_inference op: 35.064476013183594 seconds
Loading extension module transformer_inference...
Loading extension module transformer_inference...Loading extension module transformer_inference...

Loading extension module transformer_inference...
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Time to load transformer_inference op: 35.070327281951904 seconds
Time to load transformer_inference op: 35.07048058509827 seconds
Time to load transformer_inference op: 35.069727659225464 seconds
Time to load transformer_inference op: 35.069236516952515 seconds
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.0032761096954345703 seconds
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.0032486915588378906 seconds
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.0033745765686035156 seconds
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.0033402442932128906 seconds
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...No modifications detected for re-loaded extension module transformer_inference, skipping build step...

Loading extension module transformer_inference...
Time to load transformer_inference op: 0.003381490707397461 seconds
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.0030279159545898438 seconds
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.005390167236328125 seconds
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.0033254623413085938 seconds
Time to load transformer_inference op: 0.0033121109008789062 seconds
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.00341033935546875 seconds
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Time to load transformer_inference op: 0.003536701202392578 seconds
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.0036079883575439453 seconds
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.0030517578125 seconds
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.002991199493408203 seconds
[2023-06-30 05:30:50,001] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.5, git-hash=unknown, git-branch=unknown
[2023-06-30 05:30:53,129] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-06-30 05:30:53,131] [INFO] [config.py:960:print] DeepSpeedEngine configuration:
[2023-06-30 05:30:53,131] [INFO] [config.py:964:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2023-06-30 05:30:53,131] [INFO] [config.py:964:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-06-30 05:30:53,131] [INFO] [config.py:964:print]   amp_enabled .................. False
[2023-06-30 05:30:53,131] [INFO] [config.py:964:print]   amp_params ................... False
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   bfloat16_enabled ............. False
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   checkpoint_parallel_write_pipeline  False
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   checkpoint_tag_validation_enabled  True
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   checkpoint_tag_validation_fail  False
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7fb8adecb010>
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   communication_data_type ...... None
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   curriculum_enabled_legacy .... False
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   curriculum_params_legacy ..... False
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   data_efficiency_enabled ...... False
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   dataloader_drop_last ......... False
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   disable_allgather ............ False
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   dump_state ................... False
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   dynamic_loss_scale_args ...... None
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   eigenvalue_enabled ........... False
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   eigenvalue_gas_boundary_resolution  1
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   eigenvalue_layer_num ......... 0
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   eigenvalue_max_iter .......... 100
[2023-06-30 05:30:53,132] [INFO] [config.py:964:print]   eigenvalue_stability ......... 1e-06
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   eigenvalue_tol ............... 0.01
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   eigenvalue_verbose ........... False
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   elasticity_enabled ........... False
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   fp16_auto_cast ............... False
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   fp16_enabled ................. True
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   fp16_master_weights_and_gradients  False
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   global_rank .................. 0
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   grad_accum_dtype ............. None
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   gradient_accumulation_steps .. 1
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   gradient_clipping ............ 1.0
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   gradient_predivide_factor .... 1.0
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   initial_dynamic_scale ........ 65536
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   load_universal_checkpoint .... False
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   loss_scale ................... 0
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   memory_breakdown ............. False
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   mics_hierarchial_params_gather  False
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   mics_shard_size .............. -1
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   optimizer_legacy_fusion ...... False
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   optimizer_name ............... None
[2023-06-30 05:30:53,133] [INFO] [config.py:964:print]   optimizer_params ............. None
[2023-06-30 05:30:53,134] [INFO] [config.py:964:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-06-30 05:30:53,134] [INFO] [config.py:964:print]   pld_enabled .................. False
[2023-06-30 05:30:53,134] [INFO] [config.py:964:print]   pld_params ................... False
[2023-06-30 05:30:53,134] [INFO] [config.py:964:print]   prescale_gradients ........... False
[2023-06-30 05:30:53,134] [INFO] [config.py:964:print]   scheduler_name ............... None
[2023-06-30 05:30:53,134] [INFO] [config.py:964:print]   scheduler_params ............. None
[2023-06-30 05:30:53,134] [INFO] [config.py:964:print]   sparse_attention ............. None
[2023-06-30 05:30:53,134] [INFO] [config.py:964:print]   sparse_gradients_enabled ..... False
[2023-06-30 05:30:53,134] [INFO] [config.py:964:print]   steps_per_print .............. 10
[2023-06-30 05:30:53,134] [INFO] [config.py:964:print]   train_batch_size ............. 32
[2023-06-30 05:30:53,134] [INFO] [config.py:964:print]   train_micro_batch_size_per_gpu  4
[2023-06-30 05:30:53,134] [INFO] [config.py:964:print]   use_node_local_storage ....... False
[2023-06-30 05:30:53,134] [INFO] [config.py:964:print]   wall_clock_breakdown ......... False
[2023-06-30 05:30:53,134] [INFO] [config.py:964:print]   world_size ................... 8
[2023-06-30 05:30:53,134] [INFO] [config.py:964:print]   zero_allow_untested_optimizer  False
[2023-06-30 05:30:53,134] [INFO] [config.py:964:print]   zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False
[2023-06-30 05:30:53,134] [INFO] [config.py:964:print]   zero_enabled ................. False
[2023-06-30 05:30:53,134] [INFO] [config.py:964:print]   zero_force_ds_cpu_optimizer .. True
[2023-06-30 05:30:53,134] [INFO] [config.py:964:print]   zero_optimization_stage ...... 0
[2023-06-30 05:30:53,134] [INFO] [config.py:950:print_user_config]   json = {
    "train_batch_size": 32, 
    "train_micro_batch_size_per_gpu": 4, 
    "steps_per_print": 10, 
    "zero_optimization": {
        "stage": 0, 
        "stage3_param_persistence_threshold": 1.000000e+04, 
        "offload_param": {
            "device": "none"
        }, 
        "memory_efficient_linear": false
    }, 
    "fp16": {
        "enabled": true
    }, 
    "gradient_clipping": 1.0, 
    "prescale_gradients": false, 
    "wall_clock_breakdown": false
}
*******************[end] Initialized Ref Model [end] (duration: 19.44s)*******************
*************************[start] Initializing EMA Model [start] **************************
[2023-06-30 05:31:08,623] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.5, git-hash=unknown, git-branch=unknown
[2023-06-30 05:31:11,418] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-06-30 05:31:11,419] [INFO] [config.py:960:print] DeepSpeedEngine configuration:
[2023-06-30 05:31:11,420] [INFO] [config.py:964:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2023-06-30 05:31:11,420] [INFO] [config.py:964:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-06-30 05:31:11,420] [INFO] [config.py:964:print]   amp_enabled .................. False
[2023-06-30 05:31:11,420] [INFO] [config.py:964:print]   amp_params ................... False
[2023-06-30 05:31:11,420] [INFO] [config.py:964:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2023-06-30 05:31:11,420] [INFO] [config.py:964:print]   bfloat16_enabled ............. False
[2023-06-30 05:31:11,420] [INFO] [config.py:964:print]   checkpoint_parallel_write_pipeline  False
[2023-06-30 05:31:11,420] [INFO] [config.py:964:print]   checkpoint_tag_validation_enabled  True
[2023-06-30 05:31:11,420] [INFO] [config.py:964:print]   checkpoint_tag_validation_fail  False
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7fb8adef5ba0>
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   communication_data_type ...... None
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   curriculum_enabled_legacy .... False
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   curriculum_params_legacy ..... False
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   data_efficiency_enabled ...... False
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   dataloader_drop_last ......... False
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   disable_allgather ............ False
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   dump_state ................... False
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   dynamic_loss_scale_args ...... None
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   eigenvalue_enabled ........... False
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   eigenvalue_gas_boundary_resolution  1
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   eigenvalue_layer_num ......... 0
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   eigenvalue_max_iter .......... 100
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   eigenvalue_stability ......... 1e-06
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   eigenvalue_tol ............... 0.01
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   eigenvalue_verbose ........... False
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   elasticity_enabled ........... False
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   fp16_auto_cast ............... False
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   fp16_enabled ................. True
[2023-06-30 05:31:11,421] [INFO] [config.py:964:print]   fp16_master_weights_and_gradients  False
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   global_rank .................. 0
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   grad_accum_dtype ............. None
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   gradient_accumulation_steps .. 1
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   gradient_clipping ............ 1.0
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   gradient_predivide_factor .... 1.0
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   initial_dynamic_scale ........ 65536
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   load_universal_checkpoint .... False
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   loss_scale ................... 0
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   memory_breakdown ............. False
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   mics_hierarchial_params_gather  False
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   mics_shard_size .............. -1
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   optimizer_legacy_fusion ...... False
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   optimizer_name ............... None
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   optimizer_params ............. None
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   pld_enabled .................. False
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   pld_params ................... False
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   prescale_gradients ........... False
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   scheduler_name ............... None
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   scheduler_params ............. None
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   sparse_attention ............. None
[2023-06-30 05:31:11,422] [INFO] [config.py:964:print]   sparse_gradients_enabled ..... False
[2023-06-30 05:31:11,423] [INFO] [config.py:964:print]   steps_per_print .............. 10
[2023-06-30 05:31:11,423] [INFO] [config.py:964:print]   train_batch_size ............. 32
[2023-06-30 05:31:11,423] [INFO] [config.py:964:print]   train_micro_batch_size_per_gpu  4
[2023-06-30 05:31:11,423] [INFO] [config.py:964:print]   use_node_local_storage ....... False
[2023-06-30 05:31:11,423] [INFO] [config.py:964:print]   wall_clock_breakdown ......... False
[2023-06-30 05:31:11,423] [INFO] [config.py:964:print]   world_size ................... 8
[2023-06-30 05:31:11,423] [INFO] [config.py:964:print]   zero_allow_untested_optimizer  False
[2023-06-30 05:31:11,423] [INFO] [config.py:964:print]   zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False
[2023-06-30 05:31:11,423] [INFO] [config.py:964:print]   zero_enabled ................. False
[2023-06-30 05:31:11,423] [INFO] [config.py:964:print]   zero_force_ds_cpu_optimizer .. True
[2023-06-30 05:31:11,423] [INFO] [config.py:964:print]   zero_optimization_stage ...... 0
[2023-06-30 05:31:11,423] [INFO] [config.py:950:print_user_config]   json = {
    "train_batch_size": 32, 
    "train_micro_batch_size_per_gpu": 4, 
    "steps_per_print": 10, 
    "zero_optimization": {
        "stage": 0, 
        "stage3_param_persistence_threshold": 1.000000e+04, 
        "offload_param": {
            "device": "none"
        }, 
        "memory_efficient_linear": false
    }, 
    "fp16": {
        "enabled": true
    }, 
    "gradient_clipping": 1.0, 
    "prescale_gradients": false, 
    "wall_clock_breakdown": false
}
*******************[end] Initialized EMA Model [end] (duration: 18.29s)*******************
************************[start] Initializing Critic Model [start] ************************
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.000804901123046875 seconds
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.0010590553283691406 seconds
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.0012905597686767578 seconds
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.0024480819702148438 seconds
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.001373291015625 seconds
[2023-06-30 05:31:17,398] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.5, git-hash=unknown, git-branch=unknown
[2023-06-30 05:31:17,698] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-06-30 05:31:17,700] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer
[2023-06-30 05:31:17,700] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2023-06-30 05:31:17,717] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam
[2023-06-30 05:31:17,718] [INFO] [utils.py:54:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'deepspeed.ops.adam.fused_adam.FusedAdam'>
[2023-06-30 05:31:17,718] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer
[2023-06-30 05:31:17,718] [INFO] [stage_1_and_2.py:133:__init__] Reduce bucket size 500,000,000
[2023-06-30 05:31:17,718] [INFO] [stage_1_and_2.py:134:__init__] Allgather bucket size 500,000,000
[2023-06-30 05:31:17,718] [INFO] [stage_1_and_2.py:135:__init__] CPU Offload: False
[2023-06-30 05:31:17,718] [INFO] [stage_1_and_2.py:136:__init__] Round robin gradient partitioning: False
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.0011928081512451172 seconds
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.0015544891357421875 seconds
Installed CUDA version 11.1 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/mxfeng/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.001184701919555664 seconds
Rank: 4 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 3 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 5 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 6 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 2 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 1 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 0 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 7 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
[2023-06-30 05:31:21,246] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states
[2023-06-30 05:31:21,247] [INFO] [utils.py:786:see_memory_usage] MA 10.58 GB         Max_MA 10.58 GB         CA 10.97 GB         Max_CA 11 GB 
[2023-06-30 05:31:21,247] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 68.39 GB, percent = 13.6%
[2023-06-30 05:31:21,495] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states
[2023-06-30 05:31:21,496] [INFO] [utils.py:786:see_memory_usage] MA 10.89 GB         Max_MA 11.05 GB         CA 11.43 GB         Max_CA 11 GB 
[2023-06-30 05:31:21,497] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 68.62 GB, percent = 13.6%
[2023-06-30 05:31:21,497] [INFO] [stage_1_and_2.py:488:__init__] optimizer state initialized
[2023-06-30 05:31:21,737] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer
[2023-06-30 05:31:21,738] [INFO] [utils.py:786:see_memory_usage] MA 10.89 GB         Max_MA 10.89 GB         CA 11.43 GB         Max_CA 11 GB 
[2023-06-30 05:31:21,738] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 69.06 GB, percent = 13.7%
[2023-06-30 05:31:21,740] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2023-06-30 05:31:21,740] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2023-06-30 05:31:21,741] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x7fb8adf09120>
[2023-06-30 05:31:21,741] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:31:21,741] [INFO] [config.py:960:print] DeepSpeedEngine configuration:
[2023-06-30 05:31:21,741] [INFO] [config.py:964:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2023-06-30 05:31:21,742] [INFO] [config.py:964:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-06-30 05:31:21,742] [INFO] [config.py:964:print]   amp_enabled .................. False
[2023-06-30 05:31:21,742] [INFO] [config.py:964:print]   amp_params ................... False
[2023-06-30 05:31:21,742] [INFO] [config.py:964:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2023-06-30 05:31:21,742] [INFO] [config.py:964:print]   bfloat16_enabled ............. False
[2023-06-30 05:31:21,742] [INFO] [config.py:964:print]   checkpoint_parallel_write_pipeline  False
[2023-06-30 05:31:21,742] [INFO] [config.py:964:print]   checkpoint_tag_validation_enabled  True
[2023-06-30 05:31:21,742] [INFO] [config.py:964:print]   checkpoint_tag_validation_fail  False
[2023-06-30 05:31:21,742] [INFO] [config.py:964:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7fb99428d720>
[2023-06-30 05:31:21,742] [INFO] [config.py:964:print]   communication_data_type ...... None
[2023-06-30 05:31:21,742] [INFO] [config.py:964:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-06-30 05:31:21,742] [INFO] [config.py:964:print]   curriculum_enabled_legacy .... False
[2023-06-30 05:31:21,742] [INFO] [config.py:964:print]   curriculum_params_legacy ..... False
[2023-06-30 05:31:21,742] [INFO] [config.py:964:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-06-30 05:31:21,742] [INFO] [config.py:964:print]   data_efficiency_enabled ...... False
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   dataloader_drop_last ......... False
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   disable_allgather ............ False
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   dump_state ................... False
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1}
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   eigenvalue_enabled ........... False
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   eigenvalue_gas_boundary_resolution  1
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   eigenvalue_layer_num ......... 0
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   eigenvalue_max_iter .......... 100
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   eigenvalue_stability ......... 1e-06
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   eigenvalue_tol ............... 0.01
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   eigenvalue_verbose ........... False
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   elasticity_enabled ........... False
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   fp16_auto_cast ............... False
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   fp16_enabled ................. True
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   fp16_master_weights_and_gradients  False
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   global_rank .................. 0
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   grad_accum_dtype ............. None
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   gradient_accumulation_steps .. 1
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   gradient_clipping ............ 1.0
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   gradient_predivide_factor .... 1.0
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   initial_dynamic_scale ........ 65536
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   load_universal_checkpoint .... False
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   loss_scale ................... 0
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   memory_breakdown ............. False
[2023-06-30 05:31:21,743] [INFO] [config.py:964:print]   mics_hierarchial_params_gather  False
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   mics_shard_size .............. -1
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   optimizer_legacy_fusion ...... False
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   optimizer_name ............... None
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   optimizer_params ............. None
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   pld_enabled .................. False
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   pld_params ................... False
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   prescale_gradients ........... False
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   scheduler_name ............... None
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   scheduler_params ............. None
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   sparse_attention ............. None
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   sparse_gradients_enabled ..... False
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   steps_per_print .............. 10
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   train_batch_size ............. 32
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   train_micro_batch_size_per_gpu  4
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   use_node_local_storage ....... False
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   wall_clock_breakdown ......... False
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   world_size ................... 8
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   zero_allow_untested_optimizer  False
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False
[2023-06-30 05:31:21,744] [INFO] [config.py:964:print]   zero_enabled ................. True
[2023-06-30 05:31:21,745] [INFO] [config.py:964:print]   zero_force_ds_cpu_optimizer .. True
[2023-06-30 05:31:21,745] [INFO] [config.py:964:print]   zero_optimization_stage ...... 2
[2023-06-30 05:31:21,745] [INFO] [config.py:950:print_user_config]   json = {
    "train_batch_size": 32, 
    "train_micro_batch_size_per_gpu": 4, 
    "steps_per_print": 10, 
    "zero_optimization": {
        "stage": 2, 
        "offload_param": {
            "device": "none"
        }, 
        "offload_optimizer": {
            "device": "none"
        }, 
        "stage3_param_persistence_threshold": 1.000000e+04, 
        "stage3_max_live_parameters": 3.000000e+07, 
        "stage3_prefetch_bucket_size": 3.000000e+07, 
        "memory_efficient_linear": false
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale_window": 100
    }, 
    "gradient_clipping": 1.0, 
    "prescale_gradients": false, 
    "wall_clock_breakdown": false, 
    "hybrid_engine": {
        "enabled": false, 
        "max_out_tokens": 512, 
        "inference_tp_size": 1, 
        "release_inference_cache": false, 
        "pin_parameters": true, 
        "tp_gather_partition_size": 8
    }
}
*****************[end] Initialized Critic Model [end] (duration: 10.32s)******************
************************[start] Initializing Reward Model [start] ************************
[2023-06-30 05:31:29,125] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.5, git-hash=unknown, git-branch=unknown
[2023-06-30 05:31:29,953] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-06-30 05:31:29,955] [INFO] [config.py:960:print] DeepSpeedEngine configuration:
[2023-06-30 05:31:29,955] [INFO] [config.py:964:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2023-06-30 05:31:29,955] [INFO] [config.py:964:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-06-30 05:31:29,955] [INFO] [config.py:964:print]   amp_enabled .................. False
[2023-06-30 05:31:29,955] [INFO] [config.py:964:print]   amp_params ................... False
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   bfloat16_enabled ............. False
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   checkpoint_parallel_write_pipeline  False
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   checkpoint_tag_validation_enabled  True
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   checkpoint_tag_validation_fail  False
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7fb94c1c4f40>
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   communication_data_type ...... None
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   curriculum_enabled_legacy .... False
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   curriculum_params_legacy ..... False
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   data_efficiency_enabled ...... False
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   dataloader_drop_last ......... False
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   disable_allgather ............ False
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   dump_state ................... False
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   dynamic_loss_scale_args ...... None
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   eigenvalue_enabled ........... False
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   eigenvalue_gas_boundary_resolution  1
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   eigenvalue_layer_num ......... 0
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   eigenvalue_max_iter .......... 100
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   eigenvalue_stability ......... 1e-06
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   eigenvalue_tol ............... 0.01
[2023-06-30 05:31:29,956] [INFO] [config.py:964:print]   eigenvalue_verbose ........... False
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   elasticity_enabled ........... False
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   fp16_auto_cast ............... False
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   fp16_enabled ................. True
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   fp16_master_weights_and_gradients  False
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   global_rank .................. 0
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   grad_accum_dtype ............. None
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   gradient_accumulation_steps .. 1
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   gradient_clipping ............ 1.0
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   gradient_predivide_factor .... 1.0
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   initial_dynamic_scale ........ 65536
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   load_universal_checkpoint .... False
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   loss_scale ................... 0
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   memory_breakdown ............. False
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   mics_hierarchial_params_gather  False
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   mics_shard_size .............. -1
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   optimizer_legacy_fusion ...... False
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   optimizer_name ............... None
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   optimizer_params ............. None
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   pld_enabled .................. False
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   pld_params ................... False
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   prescale_gradients ........... False
[2023-06-30 05:31:29,957] [INFO] [config.py:964:print]   scheduler_name ............... None
[2023-06-30 05:31:29,958] [INFO] [config.py:964:print]   scheduler_params ............. None
[2023-06-30 05:31:29,958] [INFO] [config.py:964:print]   sparse_attention ............. None
[2023-06-30 05:31:29,958] [INFO] [config.py:964:print]   sparse_gradients_enabled ..... False
[2023-06-30 05:31:29,958] [INFO] [config.py:964:print]   steps_per_print .............. 10
[2023-06-30 05:31:29,958] [INFO] [config.py:964:print]   train_batch_size ............. 32
[2023-06-30 05:31:29,958] [INFO] [config.py:964:print]   train_micro_batch_size_per_gpu  4
[2023-06-30 05:31:29,958] [INFO] [config.py:964:print]   use_node_local_storage ....... False
[2023-06-30 05:31:29,958] [INFO] [config.py:964:print]   wall_clock_breakdown ......... False
[2023-06-30 05:31:29,958] [INFO] [config.py:964:print]   world_size ................... 8
[2023-06-30 05:31:29,958] [INFO] [config.py:964:print]   zero_allow_untested_optimizer  False
[2023-06-30 05:31:29,958] [INFO] [config.py:964:print]   zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False
[2023-06-30 05:31:29,958] [INFO] [config.py:964:print]   zero_enabled ................. False
[2023-06-30 05:31:29,958] [INFO] [config.py:964:print]   zero_force_ds_cpu_optimizer .. True
[2023-06-30 05:31:29,958] [INFO] [config.py:964:print]   zero_optimization_stage ...... 0
[2023-06-30 05:31:29,958] [INFO] [config.py:950:print_user_config]   json = {
    "train_batch_size": 32, 
    "train_micro_batch_size_per_gpu": 4, 
    "steps_per_print": 10, 
    "zero_optimization": {
        "stage": 0, 
        "stage3_param_persistence_threshold": 1.000000e+04, 
        "offload_param": {
            "device": "none"
        }, 
        "memory_efficient_linear": false
    }, 
    "fp16": {
        "enabled": true
    }, 
    "gradient_clipping": 1.0, 
    "prescale_gradients": false, 
    "wall_clock_breakdown": false
}
******************[end] Initialized Reward Model [end] (duration: 8.21s)******************
***** Running training *****
Beginning of Epoch 1/1, Total Generation Batches 954
------------------------------------------------------
Free memory : 18.263428 (GigaBytes)  
Total memory: 31.748535 (GigaBytes)  
Requested memory: 1.031250 (GigaBytes) 
Setting maximum total tokens (input + output) to 512 
WorkSpace: 0x7fb262000000 
------------------------------------------------------
[2023-06-30 05:31:34,455] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1
[2023-06-30 05:31:34,618] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 0|ppo_ep: 1|act_loss: 0.051422119140625|cri_loss: 0.0933837890625|unsuper_loss: 0.0
average reward score: 0.0440673828125
-------------------------------------------------------------------------------------
|E2E latency=4.62s |Gather latency=0.00s (0.00%) |Generate time=3.07s (66.28%) |Training time=1.37s (29.65%) |Others=0.19 (4.07%)|CurSamplesPerSec=6.92 |AvgSamplesPerSec=6.92
[2023-06-30 05:31:37,929] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768
[2023-06-30 05:31:38,085] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768
epoch: 0|step: 1|ppo_ep: 1|act_loss: 0.161376953125|cri_loss: 0.074951171875|unsuper_loss: 0.0
average reward score: -0.300537109375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.30s (66.14%) |Training time=0.99s (28.43%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=7.91
[2023-06-30 05:31:41,379] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384
[2023-06-30 05:31:41,539] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384
epoch: 0|step: 2|ppo_ep: 1|act_loss: 0.10162353515625|cri_loss: 0.0782470703125|unsuper_loss: 0.0
average reward score: -0.132080078125
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.29s (66.33%) |Training time=0.97s (28.21%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.28 |AvgSamplesPerSec=8.31
epoch: 0|step: 3|ppo_ep: 1|act_loss: -0.045989990234375|cri_loss: 0.08123779296875|unsuper_loss: 0.0
average reward score: 0.5107421875
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.33%) |Training time=1.02s (28.96%) |Others=0.20 (5.70%)|CurSamplesPerSec=9.07 |AvgSamplesPerSec=8.49
epoch: 0|step: 4|ppo_ep: 1|act_loss: -0.00732421875|cri_loss: 0.125|unsuper_loss: 0.0
average reward score: 0.39306640625
-------------------------------------------------------------------------------------
|E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.36s (65.84%) |Training time=1.02s (28.45%) |Others=0.20 (5.71%)|CurSamplesPerSec=8.92 |AvgSamplesPerSec=8.57
epoch: 0|step: 5|ppo_ep: 1|act_loss: -0.0008306503295898438|cri_loss: 0.03265380859375|unsuper_loss: 0.0
average reward score: 0.034912109375
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.05%) |Training time=1.03s (29.16%) |Others=0.20 (5.79%)|CurSamplesPerSec=9.07 |AvgSamplesPerSec=8.65
epoch: 0|step: 6|ppo_ep: 1|act_loss: 0.01329803466796875|cri_loss: 0.08013916015625|unsuper_loss: 0.0
average reward score: -0.27783203125
-------------------------------------------------------------------------------------
|E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.40s (65.68%) |Training time=1.05s (28.84%) |Others=0.20 (5.48%)|CurSamplesPerSec=8.77 |AvgSamplesPerSec=8.67
epoch: 0|step: 7|ppo_ep: 1|act_loss: -0.0205230712890625|cri_loss: 0.060333251953125|unsuper_loss: 0.0
average reward score: -0.225341796875
-------------------------------------------------------------------------------------
|E2E latency=3.79s |Gather latency=0.00s (0.00%) |Generate time=2.34s (61.75%) |Training time=1.24s (32.74%) |Others=0.21 (5.51%)|CurSamplesPerSec=8.45 |AvgSamplesPerSec=8.64
epoch: 0|step: 8|ppo_ep: 1|act_loss: -0.037628173828125|cri_loss: 0.0955810546875|unsuper_loss: 0.0
average reward score: 0.56591796875
-------------------------------------------------------------------------------------
|E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.36s (65.39%) |Training time=1.05s (28.98%) |Others=0.20 (5.64%)|CurSamplesPerSec=8.86 |AvgSamplesPerSec=8.67
[2023-06-30 05:32:06,534] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192
[2023-06-30 05:32:06,535] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=4, lr=[5.79e-07, 5.79e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:32:06,535] [INFO] [timer.py:215:stop] epoch=0/micro_step=10/global_step=10, RunningAvgSamplesPerSec=44.71694754190473, CurrSamplesPerSec=49.91182095699783, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:32:06,708] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=3, lr=[3.5000000000000004e-07, 3.5000000000000004e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 9|ppo_ep: 1|act_loss: -0.059326171875|cri_loss: 0.0955810546875|unsuper_loss: 0.0
average reward score: -0.437255859375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.37%) |Training time=0.97s (27.75%) |Others=0.20 (5.88%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.72
epoch: 0|step: 10|ppo_ep: 1|act_loss: 0.01279449462890625|cri_loss: 0.0960693359375|unsuper_loss: 0.0
average reward score: 0.4921875
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.55%) |Training time=1.01s (28.78%) |Others=0.20 (5.67%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.75
[2023-06-30 05:32:13,542] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096
epoch: 0|step: 11|ppo_ep: 1|act_loss: -0.020294189453125|cri_loss: 0.0838623046875|unsuper_loss: 0.0
average reward score: 0.08349609375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.30s (66.06%) |Training time=0.98s (28.17%) |Others=0.20 (5.77%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.78
epoch: 0|step: 12|ppo_ep: 1|act_loss: -0.045440673828125|cri_loss: 0.07708740234375|unsuper_loss: 0.0
average reward score: 0.09765625
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.59%) |Training time=1.01s (28.76%) |Others=0.20 (5.65%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.81
epoch: 0|step: 13|ppo_ep: 1|act_loss: -0.14306640625|cri_loss: 0.0765380859375|unsuper_loss: 0.0
average reward score: 0.1248779296875
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.66%) |Training time=1.01s (28.69%) |Others=0.20 (5.65%)|CurSamplesPerSec=9.07 |AvgSamplesPerSec=8.82
epoch: 0|step: 14|ppo_ep: 1|act_loss: -0.1328125|cri_loss: 0.043853759765625|unsuper_loss: 0.0
average reward score: 0.027587890625
-------------------------------------------------------------------------------------
|E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.33s (65.36%) |Training time=1.03s (28.88%) |Others=0.21 (5.77%)|CurSamplesPerSec=8.99 |AvgSamplesPerSec=8.83
epoch: 0|step: 15|ppo_ep: 1|act_loss: -0.2279052734375|cri_loss: 0.098388671875|unsuper_loss: 0.0
average reward score: 0.41259765625
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.27%) |Training time=1.02s (28.98%) |Others=0.20 (5.74%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.85
epoch: 0|step: 16|ppo_ep: 1|act_loss: -0.027313232421875|cri_loss: 0.0814208984375|unsuper_loss: 0.0
average reward score: 0.391845703125
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.56%) |Training time=1.01s (28.60%) |Others=0.21 (5.83%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.86
epoch: 0|step: 17|ppo_ep: 1|act_loss: -0.10821533203125|cri_loss: 0.05255126953125|unsuper_loss: 0.0
average reward score: 0.81884765625
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.33s (65.78%) |Training time=1.02s (28.77%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.04 |AvgSamplesPerSec=8.87
epoch: 0|step: 18|ppo_ep: 1|act_loss: 0.037445068359375|cri_loss: 0.06512451171875|unsuper_loss: 0.0
average reward score: 0.065673828125
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.33s (65.88%) |Training time=1.01s (28.71%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.05 |AvgSamplesPerSec=8.88
[2023-06-30 05:32:41,761] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=5, lr=[1.4475000000000001e-06, 1.4475000000000001e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:32:41,795] [INFO] [timer.py:215:stop] epoch=0/micro_step=20/global_step=20, RunningAvgSamplesPerSec=45.91102203010401, CurrSamplesPerSec=46.56460627574679, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:32:41,953] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=3, lr=[8.500000000000001e-07, 8.500000000000001e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 19|ppo_ep: 1|act_loss: -0.06683349609375|cri_loss: 0.08544921875|unsuper_loss: 0.0
average reward score: 0.708984375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.68%) |Training time=1.02s (28.95%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.89
epoch: 0|step: 20|ppo_ep: 1|act_loss: -0.10491943359375|cri_loss: 0.0445556640625|unsuper_loss: 0.0
average reward score: 0.28662109375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.85%) |Training time=1.01s (28.76%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.90
epoch: 0|step: 21|ppo_ep: 1|act_loss: -0.033477783203125|cri_loss: 0.0439453125|unsuper_loss: 0.0
average reward score: 0.4755859375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.92%) |Training time=1.01s (28.72%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.91
epoch: 0|step: 22|ppo_ep: 1|act_loss: -0.056915283203125|cri_loss: 0.044281005859375|unsuper_loss: 0.0
average reward score: 0.494140625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.97%) |Training time=1.00s (28.65%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.92
epoch: 0|step: 23|ppo_ep: 1|act_loss: -0.12384033203125|cri_loss: 0.049224853515625|unsuper_loss: 0.0
average reward score: 0.8427734375
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.92%) |Training time=1.00s (28.52%) |Others=0.20 (5.56%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.93
epoch: 0|step: 24|ppo_ep: 1|act_loss: 0.0141448974609375|cri_loss: 0.03533935546875|unsuper_loss: 0.0
average reward score: 0.99853515625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.02%) |Training time=1.00s (28.61%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.94
epoch: 0|step: 25|ppo_ep: 1|act_loss: 0.01070404052734375|cri_loss: 0.048583984375|unsuper_loss: 0.0
average reward score: 0.98095703125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.91%) |Training time=1.01s (28.73%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.95
epoch: 0|step: 26|ppo_ep: 1|act_loss: -0.0119171142578125|cri_loss: 0.045806884765625|unsuper_loss: 0.0
average reward score: 1.23046875
-------------------------------------------------------------------------------------
|E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.36s (66.08%) |Training time=1.02s (28.50%) |Others=0.19 (5.42%)|CurSamplesPerSec=8.95 |AvgSamplesPerSec=8.95
epoch: 0|step: 27|ppo_ep: 1|act_loss: -0.00119781494140625|cri_loss: 0.026336669921875|unsuper_loss: 0.0
average reward score: 1.6298828125
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.94%) |Training time=1.01s (28.69%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.95
epoch: 0|step: 28|ppo_ep: 1|act_loss: -0.10198974609375|cri_loss: 0.08404541015625|unsuper_loss: 0.0
average reward score: 0.62451171875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.62%) |Training time=1.02s (28.99%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.96
[2023-06-30 05:33:16,882] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=5, lr=[2.4125e-06, 2.4125e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:33:16,915] [INFO] [timer.py:215:stop] epoch=0/micro_step=30/global_step=30, RunningAvgSamplesPerSec=46.3760282707336, CurrSamplesPerSec=48.03584675797283, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:33:17,074] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=3, lr=[1.3500000000000002e-06, 1.3500000000000002e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 29|ppo_ep: 1|act_loss: -0.120361328125|cri_loss: 0.056976318359375|unsuper_loss: 0.0
average reward score: 1.51171875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.30s (66.06%) |Training time=0.99s (28.52%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.96
epoch: 0|step: 30|ppo_ep: 1|act_loss: -0.006252288818359375|cri_loss: 0.0372314453125|unsuper_loss: 0.0
average reward score: 1.4609375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.19%) |Training time=0.99s (28.36%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.97
epoch: 0|step: 31|ppo_ep: 1|act_loss: -0.042083740234375|cri_loss: 0.066650390625|unsuper_loss: 0.0
average reward score: 1.943359375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.23%) |Training time=0.99s (28.37%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.98
epoch: 0|step: 32|ppo_ep: 1|act_loss: 0.09765625|cri_loss: 0.114990234375|unsuper_loss: 0.0
average reward score: 1.890625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.86%) |Training time=1.01s (28.74%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.98
epoch: 0|step: 33|ppo_ep: 1|act_loss: -0.063232421875|cri_loss: 0.0782470703125|unsuper_loss: 0.0
average reward score: 1.09765625
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.39s (67.92%) |Training time=0.94s (26.69%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=8.99
epoch: 0|step: 34|ppo_ep: 1|act_loss: 0.022430419921875|cri_loss: 0.06280517578125|unsuper_loss: 0.0
average reward score: 1.0859375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.07%) |Training time=0.99s (28.37%) |Others=0.19 (5.57%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.99
epoch: 0|step: 35|ppo_ep: 1|act_loss: -0.055633544921875|cri_loss: 0.03863525390625|unsuper_loss: 0.0
average reward score: 1.06640625
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.33s (66.14%) |Training time=0.99s (28.13%) |Others=0.20 (5.72%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.99
[2023-06-30 05:33:41,421] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048
epoch: 0|step: 36|ppo_ep: 1|act_loss: 0.10711669921875|cri_loss: 0.06573486328125|unsuper_loss: 0.0
average reward score: 1.427734375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.44%) |Training time=0.98s (28.14%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=9.00
epoch: 0|step: 37|ppo_ep: 1|act_loss: 0.0986328125|cri_loss: 0.05401611328125|unsuper_loss: 0.0
average reward score: 0.873046875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.61%) |Training time=1.02s (29.01%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=9.00
epoch: 0|step: 38|ppo_ep: 1|act_loss: 0.1334228515625|cri_loss: 0.060577392578125|unsuper_loss: 0.0
average reward score: 1.5537109375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.54%) |Training time=1.02s (28.99%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=9.00
[2023-06-30 05:33:51,892] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=6, lr=[3.2810000000000004e-06, 3.2810000000000004e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:33:51,926] [INFO] [timer.py:215:stop] epoch=0/micro_step=40/global_step=40, RunningAvgSamplesPerSec=46.83286867672776, CurrSamplesPerSec=46.58771912773762, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:33:52,086] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=3, lr=[1.85e-06, 1.85e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 39|ppo_ep: 1|act_loss: 0.05963134765625|cri_loss: 0.05450439453125|unsuper_loss: 0.0
average reward score: 1.9013671875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.48%) |Training time=1.02s (29.05%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=9.01
epoch: 0|step: 40|ppo_ep: 1|act_loss: 0.05438232421875|cri_loss: 0.08197021484375|unsuper_loss: 0.0
average reward score: 0.9580078125
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.99%) |Training time=1.01s (28.62%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=9.01
[2023-06-30 05:33:59,095] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192
epoch: 0|step: 41|ppo_ep: 1|act_loss: 0.09130859375|cri_loss: 0.07305908203125|unsuper_loss: 0.0
average reward score: 1.55859375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.78%) |Training time=1.02s (29.17%) |Others=0.18 (5.06%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=9.01
epoch: 0|step: 42|ppo_ep: 1|act_loss: 0.073974609375|cri_loss: 0.07342529296875|unsuper_loss: 0.0
average reward score: 1.357421875
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.22%) |Training time=1.04s (29.36%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.06 |AvgSamplesPerSec=9.02
epoch: 0|step: 43|ppo_ep: 1|act_loss: -0.03509521484375|cri_loss: 0.06256103515625|unsuper_loss: 0.0
average reward score: 1.40234375
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.51%) |Training time=1.02s (29.14%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=9.02
epoch: 0|step: 44|ppo_ep: 1|act_loss: -0.1112060546875|cri_loss: 0.17333984375|unsuper_loss: 0.0
average reward score: 1.462890625
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.33s (66.18%) |Training time=1.00s (28.31%) |Others=0.19 (5.51%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=9.02
epoch: 0|step: 45|ppo_ep: 1|act_loss: 0.293701171875|cri_loss: 0.1051025390625|unsuper_loss: 0.0
average reward score: 1.4658203125
-------------------------------------------------------------------------------------
|E2E latency=3.83s |Gather latency=0.00s (0.00%) |Generate time=2.42s (63.34%) |Training time=1.21s (31.71%) |Others=0.19 (4.95%)|CurSamplesPerSec=8.36 |AvgSamplesPerSec=9.00
epoch: 0|step: 46|ppo_ep: 1|act_loss: -0.0198516845703125|cri_loss: 0.281005859375|unsuper_loss: 0.0
average reward score: 1.46484375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.10%) |Training time=1.00s (28.50%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=9.01
epoch: 0|step: 47|ppo_ep: 1|act_loss: -0.07269287109375|cri_loss: 0.07769775390625|unsuper_loss: 0.0
average reward score: 0.77685546875
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.34s (66.37%) |Training time=0.99s (28.22%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=9.01
epoch: 0|step: 48|ppo_ep: 1|act_loss: 0.30078125|cri_loss: 0.1143798828125|unsuper_loss: 0.0
average reward score: 1.0634765625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.23%) |Training time=0.99s (28.39%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=9.01
[2023-06-30 05:34:27,323] [INFO] [logging.py:96:log_dist] [Rank 0] step=50, skipped=6, lr=[4.2460000000000005e-06, 4.2460000000000005e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:34:27,357] [INFO] [timer.py:215:stop] epoch=0/micro_step=50/global_step=50, RunningAvgSamplesPerSec=46.63982424981362, CurrSamplesPerSec=49.4927398725675, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:34:27,515] [INFO] [logging.py:96:log_dist] [Rank 0] step=50, skipped=4, lr=[2.3000000000000004e-06, 2.3000000000000004e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 49|ppo_ep: 1|act_loss: 0.1898193359375|cri_loss: 0.08831787109375|unsuper_loss: 0.0
average reward score: 0.83056640625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.34s (66.79%) |Training time=0.98s (27.83%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=9.01
epoch: 0|step: 50|ppo_ep: 1|act_loss: 0.146484375|cri_loss: 0.074462890625|unsuper_loss: 0.0
average reward score: 0.75634765625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.23%) |Training time=0.99s (28.36%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=9.02
epoch: 0|step: 51|ppo_ep: 1|act_loss: -0.12493896484375|cri_loss: 0.1767578125|unsuper_loss: 0.0
average reward score: 1.1865234375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.48%) |Training time=0.98s (28.13%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=9.02
epoch: 0|step: 52|ppo_ep: 1|act_loss: -0.058990478515625|cri_loss: 0.061676025390625|unsuper_loss: 0.0
average reward score: 1.5546875
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.34s (66.13%) |Training time=1.01s (28.52%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.06 |AvgSamplesPerSec=9.02
epoch: 0|step: 53|ppo_ep: 1|act_loss: -0.072265625|cri_loss: 0.07537841796875|unsuper_loss: 0.0
average reward score: 1.8662109375
-------------------------------------------------------------------------------------
|E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.35s (66.16%) |Training time=1.00s (28.22%) |Others=0.20 (5.61%)|CurSamplesPerSec=9.02 |AvgSamplesPerSec=9.02
epoch: 0|step: 54|ppo_ep: 1|act_loss: 0.2159423828125|cri_loss: 0.09716796875|unsuper_loss: 0.0
average reward score: 1.4384765625
-------------------------------------------------------------------------------------
|E2E latency=3.89s |Gather latency=0.00s (0.00%) |Generate time=2.31s (59.53%) |Training time=1.38s (35.46%) |Others=0.19 (5.02%)|CurSamplesPerSec=8.24 |AvgSamplesPerSec=9.00
epoch: 0|step: 55|ppo_ep: 1|act_loss: -0.0186614990234375|cri_loss: 0.029510498046875|unsuper_loss: 0.0
average reward score: 2.5390625
-------------------------------------------------------------------------------------
|E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=2.30s (57.66%) |Training time=1.50s (37.46%) |Others=0.19 (4.87%)|CurSamplesPerSec=8.02 |AvgSamplesPerSec=8.98
epoch: 0|step: 56|ppo_ep: 1|act_loss: 0.263916015625|cri_loss: 0.1387939453125|unsuper_loss: 0.0
average reward score: 1.171875
-------------------------------------------------------------------------------------
|E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=2.29s (57.45%) |Training time=1.50s (37.65%) |Others=0.20 (4.90%)|CurSamplesPerSec=8.03 |AvgSamplesPerSec=8.97
epoch: 0|step: 57|ppo_ep: 1|act_loss: 0.0281982421875|cri_loss: 0.06683349609375|unsuper_loss: 0.0
average reward score: 1.2216796875
-------------------------------------------------------------------------------------
|E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=2.29s (57.44%) |Training time=1.50s (37.69%) |Others=0.19 (4.87%)|CurSamplesPerSec=8.03 |AvgSamplesPerSec=8.95
epoch: 0|step: 58|ppo_ep: 1|act_loss: -0.20458984375|cri_loss: 0.0836181640625|unsuper_loss: 0.0
average reward score: 0.9765625
-------------------------------------------------------------------------------------
|E2E latency=3.97s |Gather latency=0.00s (0.00%) |Generate time=2.29s (57.65%) |Training time=1.49s (37.46%) |Others=0.19 (4.89%)|CurSamplesPerSec=8.05 |AvgSamplesPerSec=8.93
[2023-06-30 05:35:05,178] [INFO] [logging.py:96:log_dist] [Rank 0] step=60, skipped=6, lr=[5.211000000000001e-06, 5.211000000000001e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:35:05,212] [INFO] [timer.py:215:stop] epoch=0/micro_step=60/global_step=60, RunningAvgSamplesPerSec=43.682469093934685, CurrSamplesPerSec=27.384398422605834, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:35:05,374] [INFO] [logging.py:96:log_dist] [Rank 0] step=60, skipped=4, lr=[2.8000000000000003e-06, 2.8000000000000003e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 59|ppo_ep: 1|act_loss: -0.274169921875|cri_loss: 0.1549072265625|unsuper_loss: 0.0
average reward score: 2.02734375
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.29s (57.48%) |Training time=1.50s (37.65%) |Others=0.19 (4.87%)|CurSamplesPerSec=8.04 |AvgSamplesPerSec=8.91
epoch: 0|step: 60|ppo_ep: 1|act_loss: -0.06280517578125|cri_loss: 0.0614013671875|unsuper_loss: 0.0
average reward score: 2.15625
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.29s (57.59%) |Training time=1.49s (37.46%) |Others=0.20 (4.95%)|CurSamplesPerSec=8.04 |AvgSamplesPerSec=8.90
epoch: 0|step: 61|ppo_ep: 1|act_loss: 0.04803466796875|cri_loss: 0.053863525390625|unsuper_loss: 0.0
average reward score: 1.80859375
-------------------------------------------------------------------------------------
|E2E latency=4.06s |Gather latency=0.00s (0.00%) |Generate time=2.34s (57.60%) |Training time=1.51s (37.26%) |Others=0.21 (5.14%)|CurSamplesPerSec=7.88 |AvgSamplesPerSec=8.88
epoch: 0|step: 62|ppo_ep: 1|act_loss: 0.163330078125|cri_loss: 0.1417236328125|unsuper_loss: 0.0
average reward score: 2.005859375
-------------------------------------------------------------------------------------
|E2E latency=4.08s |Gather latency=0.00s (0.00%) |Generate time=2.39s (58.46%) |Training time=1.50s (36.71%) |Others=0.20 (4.83%)|CurSamplesPerSec=7.84 |AvgSamplesPerSec=8.86
epoch: 0|step: 63|ppo_ep: 1|act_loss: 0.0305328369140625|cri_loss: 0.042388916015625|unsuper_loss: 0.0
average reward score: 1.8232421875
-------------------------------------------------------------------------------------
|E2E latency=4.01s |Gather latency=0.00s (0.00%) |Generate time=2.30s (57.49%) |Training time=1.51s (37.66%) |Others=0.19 (4.85%)|CurSamplesPerSec=7.98 |AvgSamplesPerSec=8.85
epoch: 0|step: 64|ppo_ep: 1|act_loss: -0.0985107421875|cri_loss: 0.0579833984375|unsuper_loss: 0.0
average reward score: 1.74609375
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.30s (57.71%) |Training time=1.49s (37.41%) |Others=0.19 (4.88%)|CurSamplesPerSec=8.04 |AvgSamplesPerSec=8.83
epoch: 0|step: 65|ppo_ep: 1|act_loss: -0.08245849609375|cri_loss: 0.04962158203125|unsuper_loss: 0.0
average reward score: 1.65234375
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.30s (57.63%) |Training time=1.49s (37.47%) |Others=0.20 (4.90%)|CurSamplesPerSec=8.03 |AvgSamplesPerSec=8.82
epoch: 0|step: 66|ppo_ep: 1|act_loss: -0.117919921875|cri_loss: 0.06787109375|unsuper_loss: 0.0
average reward score: 1.5087890625
-------------------------------------------------------------------------------------
|E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=2.29s (57.49%) |Training time=1.50s (37.52%) |Others=0.20 (4.99%)|CurSamplesPerSec=8.02 |AvgSamplesPerSec=8.81
epoch: 0|step: 67|ppo_ep: 1|act_loss: -0.06329345703125|cri_loss: 0.05621337890625|unsuper_loss: 0.0
average reward score: 2.12890625
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.30s (57.65%) |Training time=1.49s (37.47%) |Others=0.19 (4.89%)|CurSamplesPerSec=8.03 |AvgSamplesPerSec=8.79
epoch: 0|step: 68|ppo_ep: 1|act_loss: 0.1907958984375|cri_loss: 0.11407470703125|unsuper_loss: 0.0
average reward score: 1.0634765625
-------------------------------------------------------------------------------------
|E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=2.30s (57.63%) |Training time=1.50s (37.51%) |Others=0.19 (4.86%)|CurSamplesPerSec=8.02 |AvgSamplesPerSec=8.78
[2023-06-30 05:35:45,292] [INFO] [logging.py:96:log_dist] [Rank 0] step=70, skipped=6, lr=[6.176000000000001e-06, 6.176000000000001e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:35:45,325] [INFO] [timer.py:215:stop] epoch=0/micro_step=70/global_step=70, RunningAvgSamplesPerSec=40.15195643664052, CurrSamplesPerSec=27.19589011312766, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:35:45,488] [INFO] [logging.py:96:log_dist] [Rank 0] step=70, skipped=4, lr=[3.3000000000000006e-06, 3.3000000000000006e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 69|ppo_ep: 1|act_loss: 0.10711669921875|cri_loss: 0.1483154296875|unsuper_loss: 0.0
average reward score: 2.2109375
-------------------------------------------------------------------------------------
|E2E latency=4.05s |Gather latency=0.00s (0.00%) |Generate time=2.35s (57.98%) |Training time=1.51s (37.13%) |Others=0.20 (4.89%)|CurSamplesPerSec=7.89 |AvgSamplesPerSec=8.77
epoch: 0|step: 70|ppo_ep: 1|act_loss: 0.0811767578125|cri_loss: 0.0277862548828125|unsuper_loss: 0.0
average reward score: 1.8515625
-------------------------------------------------------------------------------------
|E2E latency=4.06s |Gather latency=0.00s (0.00%) |Generate time=2.44s (60.06%) |Training time=1.43s (35.09%) |Others=0.20 (4.85%)|CurSamplesPerSec=7.88 |AvgSamplesPerSec=8.75
epoch: 0|step: 71|ppo_ep: 1|act_loss: 0.00701141357421875|cri_loss: 0.03857421875|unsuper_loss: 0.0
average reward score: 1.7568359375
-------------------------------------------------------------------------------------
|E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=2.31s (57.84%) |Training time=1.49s (37.29%) |Others=0.19 (4.87%)|CurSamplesPerSec=8.00 |AvgSamplesPerSec=8.74
epoch: 0|step: 72|ppo_ep: 1|act_loss: -0.149658203125|cri_loss: 0.056182861328125|unsuper_loss: 0.0
average reward score: 1.974609375
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.30s (57.90%) |Training time=1.48s (37.22%) |Others=0.19 (4.88%)|CurSamplesPerSec=8.05 |AvgSamplesPerSec=8.73
epoch: 0|step: 73|ppo_ep: 1|act_loss: -0.2216796875|cri_loss: 0.1129150390625|unsuper_loss: 0.0
average reward score: 2.09765625
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.30s (57.74%) |Training time=1.49s (37.40%) |Others=0.19 (4.87%)|CurSamplesPerSec=8.05 |AvgSamplesPerSec=8.72
epoch: 0|step: 74|ppo_ep: 1|act_loss: -0.26513671875|cri_loss: 0.10577392578125|unsuper_loss: 0.0
average reward score: 1.921875
-------------------------------------------------------------------------------------
|E2E latency=3.97s |Gather latency=0.00s (0.00%) |Generate time=2.30s (57.91%) |Training time=1.48s (37.17%) |Others=0.20 (4.92%)|CurSamplesPerSec=8.06 |AvgSamplesPerSec=8.71
epoch: 0|step: 75|ppo_ep: 1|act_loss: -0.1070556640625|cri_loss: 0.06085205078125|unsuper_loss: 0.0
average reward score: 1.5078125
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.30s (57.76%) |Training time=1.49s (37.35%) |Others=0.19 (4.89%)|CurSamplesPerSec=8.04 |AvgSamplesPerSec=8.70
epoch: 0|step: 76|ppo_ep: 1|act_loss: 0.0670166015625|cri_loss: 0.0479736328125|unsuper_loss: 0.0
average reward score: 1.861328125
-------------------------------------------------------------------------------------
|E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=2.30s (57.66%) |Training time=1.49s (37.34%) |Others=0.20 (5.00%)|CurSamplesPerSec=8.02 |AvgSamplesPerSec=8.69
epoch: 0|step: 77|ppo_ep: 1|act_loss: 0.0548095703125|cri_loss: 0.04315185546875|unsuper_loss: 0.0
average reward score: 1.646484375
-------------------------------------------------------------------------------------
|E2E latency=4.02s |Gather latency=0.00s (0.00%) |Generate time=2.31s (57.59%) |Training time=1.51s (37.51%) |Others=0.20 (4.91%)|CurSamplesPerSec=7.97 |AvgSamplesPerSec=8.68
epoch: 0|step: 78|ppo_ep: 1|act_loss: 0.1278076171875|cri_loss: 0.0733642578125|unsuper_loss: 0.0
average reward score: 1.1669921875
-------------------------------------------------------------------------------------
|E2E latency=4.04s |Gather latency=0.00s (0.00%) |Generate time=2.33s (57.66%) |Training time=1.51s (37.46%) |Others=0.20 (4.88%)|CurSamplesPerSec=7.92 |AvgSamplesPerSec=8.67
[2023-06-30 05:36:25,316] [INFO] [logging.py:96:log_dist] [Rank 0] step=80, skipped=6, lr=[7.141000000000001e-06, 7.141000000000001e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:36:25,349] [INFO] [timer.py:215:stop] epoch=0/micro_step=80/global_step=80, RunningAvgSamplesPerSec=37.95412529645704, CurrSamplesPerSec=27.326392016372495, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:36:25,511] [INFO] [logging.py:96:log_dist] [Rank 0] step=80, skipped=4, lr=[3.8000000000000005e-06, 3.8000000000000005e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 79|ppo_ep: 1|act_loss: -0.0168914794921875|cri_loss: 0.0294342041015625|unsuper_loss: 0.0
average reward score: 1.7353515625
-------------------------------------------------------------------------------------
|E2E latency=4.01s |Gather latency=0.00s (0.00%) |Generate time=2.31s (57.72%) |Training time=1.50s (37.41%) |Others=0.20 (4.87%)|CurSamplesPerSec=7.98 |AvgSamplesPerSec=8.66
epoch: 0|step: 80|ppo_ep: 1|act_loss: -0.1839599609375|cri_loss: 0.066650390625|unsuper_loss: 0.0
average reward score: 1.732421875
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.29s (57.62%) |Training time=1.49s (37.49%) |Others=0.19 (4.90%)|CurSamplesPerSec=8.04 |AvgSamplesPerSec=8.65
epoch: 0|step: 81|ppo_ep: 1|act_loss: -0.2330322265625|cri_loss: 0.11383056640625|unsuper_loss: 0.0
average reward score: 2.095703125
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.29s (57.63%) |Training time=1.49s (37.48%) |Others=0.19 (4.89%)|CurSamplesPerSec=8.04 |AvgSamplesPerSec=8.65
epoch: 0|step: 82|ppo_ep: 1|act_loss: -0.1510009765625|cri_loss: 0.05621337890625|unsuper_loss: 0.0
average reward score: 1.81640625
-------------------------------------------------------------------------------------
|E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=2.29s (57.54%) |Training time=1.49s (37.40%) |Others=0.20 (5.05%)|CurSamplesPerSec=8.03 |AvgSamplesPerSec=8.64
epoch: 0|step: 83|ppo_ep: 1|act_loss: -0.06646728515625|cri_loss: 0.0248260498046875|unsuper_loss: 0.0
average reward score: 2.14453125
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.29s (57.57%) |Training time=1.50s (37.55%) |Others=0.19 (4.88%)|CurSamplesPerSec=8.03 |AvgSamplesPerSec=8.63
epoch: 0|step: 84|ppo_ep: 1|act_loss: 0.1416015625|cri_loss: 0.056121826171875|unsuper_loss: 0.0
average reward score: 1.6689453125
-------------------------------------------------------------------------------------
|E2E latency=4.01s |Gather latency=0.00s (0.00%) |Generate time=2.30s (57.44%) |Training time=1.51s (37.70%) |Others=0.19 (4.86%)|CurSamplesPerSec=7.99 |AvgSamplesPerSec=8.62
epoch: 0|step: 85|ppo_ep: 1|act_loss: 0.093505859375|cri_loss: 0.037384033203125|unsuper_loss: 0.0
average reward score: 2.06640625
-------------------------------------------------------------------------------------
|E2E latency=4.05s |Gather latency=0.00s (0.00%) |Generate time=2.31s (57.00%) |Training time=1.55s (38.16%) |Others=0.20 (4.84%)|CurSamplesPerSec=7.90 |AvgSamplesPerSec=8.61
epoch: 0|step: 86|ppo_ep: 1|act_loss: 0.08856201171875|cri_loss: 0.03985595703125|unsuper_loss: 0.0
average reward score: 1.513671875
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.29s (57.59%) |Training time=1.50s (37.55%) |Others=0.19 (4.86%)|CurSamplesPerSec=8.03 |AvgSamplesPerSec=8.61
epoch: 0|step: 87|ppo_ep: 1|act_loss: 0.06573486328125|cri_loss: 0.038818359375|unsuper_loss: 0.0
average reward score: 1.7431640625
-------------------------------------------------------------------------------------
|E2E latency=3.97s |Gather latency=0.00s (0.00%) |Generate time=2.29s (57.58%) |Training time=1.49s (37.49%) |Others=0.20 (4.93%)|CurSamplesPerSec=8.05 |AvgSamplesPerSec=8.60
epoch: 0|step: 88|ppo_ep: 1|act_loss: -0.034942626953125|cri_loss: 0.0240631103515625|unsuper_loss: 0.0
average reward score: 2.130859375
-------------------------------------------------------------------------------------
|E2E latency=3.97s |Gather latency=0.00s (0.00%) |Generate time=2.29s (57.52%) |Training time=1.49s (37.62%) |Others=0.19 (4.87%)|CurSamplesPerSec=8.05 |AvgSamplesPerSec=8.59
[2023-06-30 05:37:05,207] [INFO] [logging.py:96:log_dist] [Rank 0] step=90, skipped=6, lr=[8.106e-06, 8.106e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:37:05,240] [INFO] [timer.py:215:stop] epoch=0/micro_step=90/global_step=90, RunningAvgSamplesPerSec=36.348432428390474, CurrSamplesPerSec=27.542485752442584, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:37:05,401] [INFO] [logging.py:96:log_dist] [Rank 0] step=90, skipped=4, lr=[4.3e-06, 4.3e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 89|ppo_ep: 1|act_loss: -0.094970703125|cri_loss: 0.027069091796875|unsuper_loss: 0.0
average reward score: 1.9921875
-------------------------------------------------------------------------------------
|E2E latency=3.97s |Gather latency=0.00s (0.00%) |Generate time=2.29s (57.59%) |Training time=1.49s (37.55%) |Others=0.19 (4.86%)|CurSamplesPerSec=8.06 |AvgSamplesPerSec=8.59
epoch: 0|step: 90|ppo_ep: 1|act_loss: -0.10107421875|cri_loss: 0.0299224853515625|unsuper_loss: 0.0
average reward score: 2.103515625
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.29s (57.61%) |Training time=1.49s (37.51%) |Others=0.19 (4.88%)|CurSamplesPerSec=8.05 |AvgSamplesPerSec=8.58
epoch: 0|step: 91|ppo_ep: 1|act_loss: -0.0175018310546875|cri_loss: 0.0294342041015625|unsuper_loss: 0.0
average reward score: 2.3515625
-------------------------------------------------------------------------------------
|E2E latency=3.97s |Gather latency=0.00s (0.00%) |Generate time=2.29s (57.60%) |Training time=1.49s (37.51%) |Others=0.19 (4.88%)|CurSamplesPerSec=8.06 |AvgSamplesPerSec=8.57
epoch: 0|step: 92|ppo_ep: 1|act_loss: 0.1485595703125|cri_loss: 0.03546142578125|unsuper_loss: 0.0
average reward score: 1.7001953125
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.30s (57.62%) |Training time=1.49s (37.49%) |Others=0.19 (4.88%)|CurSamplesPerSec=8.03 |AvgSamplesPerSec=8.57
epoch: 0|step: 93|ppo_ep: 1|act_loss: 0.065673828125|cri_loss: 0.0396728515625|unsuper_loss: 0.0
average reward score: 2.62890625
-------------------------------------------------------------------------------------
|E2E latency=4.06s |Gather latency=0.00s (0.00%) |Generate time=2.32s (57.17%) |Training time=1.54s (38.01%) |Others=0.20 (4.82%)|CurSamplesPerSec=7.89 |AvgSamplesPerSec=8.56
epoch: 0|step: 94|ppo_ep: 1|act_loss: 0.12469482421875|cri_loss: 0.0418701171875|unsuper_loss: 0.0
average reward score: 1.9931640625
-------------------------------------------------------------------------------------
|E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=2.33s (58.11%) |Training time=1.48s (37.01%) |Others=0.20 (4.88%)|CurSamplesPerSec=7.99 |AvgSamplesPerSec=8.55
epoch: 0|step: 95|ppo_ep: 1|act_loss: 0.076904296875|cri_loss: 0.03350830078125|unsuper_loss: 0.0
average reward score: 2.1796875
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.29s (57.65%) |Training time=1.49s (37.48%) |Others=0.19 (4.88%)|CurSamplesPerSec=8.05 |AvgSamplesPerSec=8.55
epoch: 0|step: 96|ppo_ep: 1|act_loss: -0.0194244384765625|cri_loss: 0.006145477294921875|unsuper_loss: 0.0
average reward score: 2.353515625
-------------------------------------------------------------------------------------
|E2E latency=4.02s |Gather latency=0.00s (0.00%) |Generate time=2.33s (58.06%) |Training time=1.49s (37.11%) |Others=0.19 (4.82%)|CurSamplesPerSec=7.96 |AvgSamplesPerSec=8.54
epoch: 0|step: 97|ppo_ep: 1|act_loss: -0.0831298828125|cri_loss: 0.06939697265625|unsuper_loss: 0.0
average reward score: 1.9892578125
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.30s (57.71%) |Training time=1.49s (37.42%) |Others=0.19 (4.87%)|CurSamplesPerSec=8.03 |AvgSamplesPerSec=8.54
epoch: 0|step: 98|ppo_ep: 1|act_loss: -0.0599365234375|cri_loss: 0.055023193359375|unsuper_loss: 0.0
average reward score: 2.0234375
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.30s (57.67%) |Training time=1.49s (37.46%) |Others=0.19 (4.87%)|CurSamplesPerSec=8.04 |AvgSamplesPerSec=8.53
[2023-06-30 05:37:45,140] [INFO] [logging.py:96:log_dist] [Rank 0] step=100, skipped=6, lr=[9.071e-06, 9.071e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:37:45,173] [INFO] [timer.py:215:stop] epoch=0/micro_step=100/global_step=100, RunningAvgSamplesPerSec=35.1760415009628, CurrSamplesPerSec=27.337451333237603, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:37:45,334] [INFO] [logging.py:96:log_dist] [Rank 0] step=100, skipped=4, lr=[4.800000000000001e-06, 4.800000000000001e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 99|ppo_ep: 1|act_loss: -0.03814697265625|cri_loss: 0.048370361328125|unsuper_loss: 0.0
average reward score: 1.7548828125
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.29s (57.49%) |Training time=1.50s (37.65%) |Others=0.19 (4.85%)|CurSamplesPerSec=8.04 |AvgSamplesPerSec=8.53
epoch: 0|step: 100|ppo_ep: 1|act_loss: 0.0528564453125|cri_loss: 0.0445556640625|unsuper_loss: 0.0
average reward score: 1.6484375
-------------------------------------------------------------------------------------
|E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=2.30s (57.64%) |Training time=1.50s (37.50%) |Others=0.19 (4.87%)|CurSamplesPerSec=8.02 |AvgSamplesPerSec=8.52
epoch: 0|step: 101|ppo_ep: 1|act_loss: 0.0157318115234375|cri_loss: 0.0226898193359375|unsuper_loss: 0.0
average reward score: 2.140625
-------------------------------------------------------------------------------------
|E2E latency=4.01s |Gather latency=0.00s (0.00%) |Generate time=2.30s (57.40%) |Training time=1.51s (37.63%) |Others=0.20 (4.97%)|CurSamplesPerSec=7.99 |AvgSamplesPerSec=8.51
epoch: 0|step: 102|ppo_ep: 1|act_loss: -0.032196044921875|cri_loss: 0.02496337890625|unsuper_loss: 0.0
average reward score: 2.458984375
-------------------------------------------------------------------------------------
|E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=2.32s (57.94%) |Training time=1.49s (37.19%) |Others=0.19 (4.87%)|CurSamplesPerSec=8.00 |AvgSamplesPerSec=8.51
epoch: 0|step: 103|ppo_ep: 1|act_loss: -0.0919189453125|cri_loss: 0.0289459228515625|unsuper_loss: 0.0
average reward score: 1.705078125
-------------------------------------------------------------------------------------
|E2E latency=3.97s |Gather latency=0.00s (0.00%) |Generate time=2.28s (57.48%) |Training time=1.50s (37.64%) |Others=0.19 (4.88%)|CurSamplesPerSec=8.05 |AvgSamplesPerSec=8.50
epoch: 0|step: 104|ppo_ep: 1|act_loss: -0.037628173828125|cri_loss: 0.0233612060546875|unsuper_loss: 0.0
average reward score: 2.0859375
-------------------------------------------------------------------------------------
|E2E latency=4.05s |Gather latency=0.00s (0.00%) |Generate time=2.33s (57.68%) |Training time=1.52s (37.49%) |Others=0.20 (4.82%)|CurSamplesPerSec=7.91 |AvgSamplesPerSec=8.50
epoch: 0|step: 105|ppo_ep: 1|act_loss: 0.0193328857421875|cri_loss: 0.03729248046875|unsuper_loss: 0.0
average reward score: 2.333984375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.45%) |Training time=1.02s (29.16%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.50
epoch: 0|step: 106|ppo_ep: 1|act_loss: 0.11151123046875|cri_loss: 0.090087890625|unsuper_loss: 0.0
average reward score: 2.224609375
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.16%) |Training time=1.04s (29.44%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.51
epoch: 0|step: 107|ppo_ep: 1|act_loss: 0.207275390625|cri_loss: 0.06658935546875|unsuper_loss: 0.0
average reward score: 1.939453125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.28s (64.93%) |Training time=1.04s (29.72%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.51
epoch: 0|step: 108|ppo_ep: 1|act_loss: 0.0750732421875|cri_loss: 0.02655029296875|unsuper_loss: 0.0
average reward score: 1.50390625
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.21%) |Training time=1.03s (29.36%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=8.52
[2023-06-30 05:38:22,773] [INFO] [logging.py:96:log_dist] [Rank 0] step=110, skipped=6, lr=[9.649477647746756e-06, 9.649477647746756e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:38:22,806] [INFO] [timer.py:215:stop] epoch=0/micro_step=110/global_step=110, RunningAvgSamplesPerSec=35.0704005590471, CurrSamplesPerSec=44.997045740730556, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:38:22,973] [INFO] [logging.py:96:log_dist] [Rank 0] step=110, skipped=4, lr=[4.999391053853971e-06, 4.999391053853971e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 109|ppo_ep: 1|act_loss: -0.0445556640625|cri_loss: 0.038482666015625|unsuper_loss: 0.0
average reward score: 2.154296875
-------------------------------------------------------------------------------------
|E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.18%) |Training time=1.04s (29.27%) |Others=0.20 (5.55%)|CurSamplesPerSec=9.00 |AvgSamplesPerSec=8.52
epoch: 0|step: 110|ppo_ep: 1|act_loss: 0.01416778564453125|cri_loss: 0.056884765625|unsuper_loss: 0.0
average reward score: 1.8291015625
-------------------------------------------------------------------------------------
|E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.37s (65.80%) |Training time=1.04s (28.94%) |Others=0.19 (5.26%)|CurSamplesPerSec=8.89 |AvgSamplesPerSec=8.53
epoch: 0|step: 111|ppo_ep: 1|act_loss: -0.005035400390625|cri_loss: 0.037445068359375|unsuper_loss: 0.0
average reward score: 2.33984375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.16%) |Training time=1.03s (29.45%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.53
epoch: 0|step: 112|ppo_ep: 1|act_loss: 0.06298828125|cri_loss: 0.027679443359375|unsuper_loss: 0.0
average reward score: 2.0234375
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.08%) |Training time=1.04s (29.51%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.54
epoch: 0|step: 113|ppo_ep: 1|act_loss: 0.09454345703125|cri_loss: 0.037322998046875|unsuper_loss: 0.0
average reward score: 1.978515625
-------------------------------------------------------------------------------------
|E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.33s (65.44%) |Training time=1.04s (29.23%) |Others=0.19 (5.33%)|CurSamplesPerSec=8.99 |AvgSamplesPerSec=8.54
epoch: 0|step: 114|ppo_ep: 1|act_loss: 0.028717041015625|cri_loss: 0.0178985595703125|unsuper_loss: 0.0
average reward score: 1.6875
-------------------------------------------------------------------------------------
|E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.18%) |Training time=1.05s (29.52%) |Others=0.19 (5.31%)|CurSamplesPerSec=9.00 |AvgSamplesPerSec=8.54
epoch: 0|step: 115|ppo_ep: 1|act_loss: -0.0897216796875|cri_loss: 0.04522705078125|unsuper_loss: 0.0
average reward score: 2.138671875
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.28s (64.87%) |Training time=1.04s (29.73%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.55
epoch: 0|step: 116|ppo_ep: 1|act_loss: -0.1007080078125|cri_loss: 0.04638671875|unsuper_loss: 0.0
average reward score: 2.375
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.14%) |Training time=1.04s (29.49%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.07 |AvgSamplesPerSec=8.55
epoch: 0|step: 117|ppo_ep: 1|act_loss: -0.10595703125|cri_loss: 0.05657958984375|unsuper_loss: 0.0
average reward score: 2.44140625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.08%) |Training time=1.04s (29.53%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.56
epoch: 0|step: 118|ppo_ep: 1|act_loss: -0.0283966064453125|cri_loss: 0.025054931640625|unsuper_loss: 0.0
average reward score: 1.2783203125
-------------------------------------------------------------------------------------
|E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.24%) |Training time=1.04s (29.41%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.02 |AvgSamplesPerSec=8.56
[2023-06-30 05:38:58,472] [INFO] [logging.py:96:log_dist] [Rank 0] step=120, skipped=6, lr=[9.643602483694308e-06, 9.643602483694308e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:38:58,505] [INFO] [timer.py:215:stop] epoch=0/micro_step=120/global_step=120, RunningAvgSamplesPerSec=35.65223263270058, CurrSamplesPerSec=33.514499473876874, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:38:58,663] [INFO] [logging.py:96:log_dist] [Rank 0] step=120, skipped=4, lr=[4.995670790537125e-06, 4.995670790537125e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 119|ppo_ep: 1|act_loss: 0.0298004150390625|cri_loss: 0.0279998779296875|unsuper_loss: 0.0
average reward score: 2.560546875
-------------------------------------------------------------------------------------
|E2E latency=3.85s |Gather latency=0.00s (0.00%) |Generate time=2.38s (61.76%) |Training time=1.28s (33.33%) |Others=0.19 (4.91%)|CurSamplesPerSec=8.30 |AvgSamplesPerSec=8.56
epoch: 0|step: 120|ppo_ep: 1|act_loss: 0.0797119140625|cri_loss: 0.022369384765625|unsuper_loss: 0.0
average reward score: 2.30859375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.14%) |Training time=1.03s (29.47%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.56
epoch: 0|step: 121|ppo_ep: 1|act_loss: 0.013519287109375|cri_loss: 0.0292816162109375|unsuper_loss: 0.0
average reward score: 2.546875
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.29s (64.84%) |Training time=1.05s (29.72%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.57
epoch: 0|step: 122|ppo_ep: 1|act_loss: -0.015625|cri_loss: 0.0291290283203125|unsuper_loss: 0.0
average reward score: 1.6689453125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.28s (64.97%) |Training time=1.04s (29.66%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.57
epoch: 0|step: 123|ppo_ep: 1|act_loss: -0.10791015625|cri_loss: 0.03717041015625|unsuper_loss: 0.0
average reward score: 2.611328125
-------------------------------------------------------------------------------------
|E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.06%) |Training time=1.06s (29.65%) |Others=0.19 (5.29%)|CurSamplesPerSec=8.97 |AvgSamplesPerSec=8.57
epoch: 0|step: 124|ppo_ep: 1|act_loss: -0.01207733154296875|cri_loss: 0.032928466796875|unsuper_loss: 0.0
average reward score: 2.115234375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.04%) |Training time=1.04s (29.57%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.58
epoch: 0|step: 125|ppo_ep: 1|act_loss: 0.06317138671875|cri_loss: 0.030364990234375|unsuper_loss: 0.0
average reward score: 1.845703125
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.15%) |Training time=1.04s (29.48%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=8.58
epoch: 0|step: 126|ppo_ep: 1|act_loss: 0.0721435546875|cri_loss: 0.05828857421875|unsuper_loss: 0.0
average reward score: 1.96875
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.15%) |Training time=1.04s (29.45%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.59
epoch: 0|step: 127|ppo_ep: 1|act_loss: 0.020355224609375|cri_loss: 0.026519775390625|unsuper_loss: 0.0
average reward score: 2.2109375
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.29s (64.96%) |Training time=1.05s (29.67%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.59
epoch: 0|step: 128|ppo_ep: 1|act_loss: 0.064208984375|cri_loss: 0.03253173828125|unsuper_loss: 0.0
average reward score: 2.099609375
-------------------------------------------------------------------------------------
|E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.34s (65.36%) |Training time=1.05s (29.29%) |Others=0.19 (5.35%)|CurSamplesPerSec=8.92 |AvgSamplesPerSec=8.59
[2023-06-30 05:39:33,742] [INFO] [logging.py:96:log_dist] [Rank 0] step=130, skipped=6, lr=[9.63120719155926e-06, 9.63120719155926e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:39:33,776] [INFO] [timer.py:215:stop] epoch=0/micro_step=130/global_step=130, RunningAvgSamplesPerSec=36.23395956532239, CurrSamplesPerSec=46.256852916088675, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:39:33,936] [INFO] [logging.py:96:log_dist] [Rank 0] step=130, skipped=4, lr=[4.988573595161374e-06, 4.988573595161374e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 129|ppo_ep: 1|act_loss: -0.0347900390625|cri_loss: 0.018280029296875|unsuper_loss: 0.0
average reward score: 1.970703125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.32%) |Training time=1.02s (29.23%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.60
epoch: 0|step: 130|ppo_ep: 1|act_loss: -0.012176513671875|cri_loss: 0.02044677734375|unsuper_loss: 0.0
average reward score: 1.8388671875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.82%) |Training time=1.00s (28.80%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.60
epoch: 0|step: 131|ppo_ep: 1|act_loss: 0.08917236328125|cri_loss: 0.055572509765625|unsuper_loss: 0.0
average reward score: 2.0546875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.57%) |Training time=1.01s (29.01%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.60
epoch: 0|step: 132|ppo_ep: 1|act_loss: 0.031219482421875|cri_loss: 0.045166015625|unsuper_loss: 0.0
average reward score: 2.083984375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.64%) |Training time=1.01s (28.98%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.61
epoch: 0|step: 133|ppo_ep: 1|act_loss: -0.0711669921875|cri_loss: 0.0221710205078125|unsuper_loss: 0.0
average reward score: 1.876953125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.68%) |Training time=1.01s (28.92%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.61
epoch: 0|step: 134|ppo_ep: 1|act_loss: -0.040740966796875|cri_loss: 0.0364990234375|unsuper_loss: 0.0
average reward score: 2.7578125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.30s (66.14%) |Training time=0.99s (28.46%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.62
epoch: 0|step: 135|ppo_ep: 1|act_loss: -0.0204315185546875|cri_loss: 0.043853759765625|unsuper_loss: 0.0
average reward score: 2.283203125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.37%) |Training time=1.02s (29.26%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.62
epoch: 0|step: 136|ppo_ep: 1|act_loss: -0.1275634765625|cri_loss: 0.05987548828125|unsuper_loss: 0.0
average reward score: 2.681640625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.90%) |Training time=1.00s (28.49%) |Others=0.20 (5.61%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.62
epoch: 0|step: 137|ppo_ep: 1|act_loss: -0.25390625|cri_loss: 0.1402587890625|unsuper_loss: 0.0
average reward score: 2.369140625
-------------------------------------------------------------------------------------
|E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.32s (63.92%) |Training time=1.11s (30.68%) |Others=0.20 (5.40%)|CurSamplesPerSec=8.83 |AvgSamplesPerSec=8.62
epoch: 0|step: 138|ppo_ep: 1|act_loss: -0.1541748046875|cri_loss: 0.08843994140625|unsuper_loss: 0.0
average reward score: 2.666015625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.82%) |Training time=1.00s (28.68%) |Others=0.19 (5.49%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.63
[2023-06-30 05:40:08,817] [INFO] [logging.py:96:log_dist] [Rank 0] step=140, skipped=6, lr=[9.612308543609631e-06, 9.612308543609631e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:40:08,850] [INFO] [timer.py:215:stop] epoch=0/micro_step=140/global_step=140, RunningAvgSamplesPerSec=36.81751404391725, CurrSamplesPerSec=46.70005337438127, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:40:09,008] [INFO] [logging.py:96:log_dist] [Rank 0] step=140, skipped=4, lr=[4.9781090710552835e-06, 4.9781090710552835e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 139|ppo_ep: 1|act_loss: -0.1177978515625|cri_loss: 0.054962158203125|unsuper_loss: 0.0
average reward score: 2.291015625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.50%) |Training time=1.01s (29.09%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.63
epoch: 0|step: 140|ppo_ep: 1|act_loss: 0.008636474609375|cri_loss: 0.0391845703125|unsuper_loss: 0.0
average reward score: 2.447265625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.41%) |Training time=1.02s (29.14%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.64
epoch: 0|step: 141|ppo_ep: 1|act_loss: 0.1112060546875|cri_loss: 0.03857421875|unsuper_loss: 0.0
average reward score: 2.5703125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.63%) |Training time=1.01s (28.99%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.64
epoch: 0|step: 142|ppo_ep: 1|act_loss: 0.10406494140625|cri_loss: 0.0290374755859375|unsuper_loss: 0.0
average reward score: 2.310546875
-------------------------------------------------------------------------------------
|E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.37s (66.29%) |Training time=1.01s (28.40%) |Others=0.19 (5.31%)|CurSamplesPerSec=8.97 |AvgSamplesPerSec=8.64
epoch: 0|step: 143|ppo_ep: 1|act_loss: 0.1787109375|cri_loss: 0.055755615234375|unsuper_loss: 0.0
average reward score: 2.359375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.75%) |Training time=1.01s (28.86%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.64
epoch: 0|step: 144|ppo_ep: 1|act_loss: 0.2347412109375|cri_loss: 0.067626953125|unsuper_loss: 0.0
average reward score: 2.244140625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.56%) |Training time=1.01s (28.98%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.65
epoch: 0|step: 145|ppo_ep: 1|act_loss: 0.1595458984375|cri_loss: 0.046356201171875|unsuper_loss: 0.0
average reward score: 2.40234375
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.68%) |Training time=1.02s (28.94%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.06 |AvgSamplesPerSec=8.65
epoch: 0|step: 146|ppo_ep: 1|act_loss: 0.1253662109375|cri_loss: 0.04425048828125|unsuper_loss: 0.0
average reward score: 2.41015625
-------------------------------------------------------------------------------------
|E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.36s (66.27%) |Training time=1.01s (28.44%) |Others=0.19 (5.29%)|CurSamplesPerSec=8.97 |AvgSamplesPerSec=8.65
epoch: 0|step: 147|ppo_ep: 1|act_loss: 0.0599365234375|cri_loss: 0.034210205078125|unsuper_loss: 0.0
average reward score: 2.279296875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.23%) |Training time=1.03s (29.40%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.66
epoch: 0|step: 148|ppo_ep: 1|act_loss: 0.0660400390625|cri_loss: 0.05462646484375|unsuper_loss: 0.0
average reward score: 2.203125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.47%) |Training time=1.02s (29.13%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.66
[2023-06-30 05:40:43,935] [INFO] [logging.py:96:log_dist] [Rank 0] step=150, skipped=6, lr=[9.586932111908205e-06, 9.586932111908205e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:40:43,969] [INFO] [timer.py:215:stop] epoch=0/micro_step=150/global_step=150, RunningAvgSamplesPerSec=37.34123320357087, CurrSamplesPerSec=46.05990290948949, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:40:44,126] [INFO] [logging.py:96:log_dist] [Rank 0] step=150, skipped=4, lr=[4.964291377933776e-06, 4.964291377933776e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 149|ppo_ep: 1|act_loss: 0.00893402099609375|cri_loss: 0.0185699462890625|unsuper_loss: 0.0
average reward score: 2.103515625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.30%) |Training time=1.02s (29.32%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.66
epoch: 0|step: 150|ppo_ep: 1|act_loss: -0.0938720703125|cri_loss: 0.041534423828125|unsuper_loss: 0.0
average reward score: 2.521484375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.58%) |Training time=1.01s (29.04%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.67
epoch: 0|step: 151|ppo_ep: 1|act_loss: -0.09698486328125|cri_loss: 0.07244873046875|unsuper_loss: 0.0
average reward score: 2.681640625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.28s (64.80%) |Training time=1.03s (29.24%) |Others=0.21 (5.96%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.67
epoch: 0|step: 152|ppo_ep: 1|act_loss: -0.20166015625|cri_loss: 0.1053466796875|unsuper_loss: 0.0
average reward score: 2.6171875
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.34s (66.04%) |Training time=1.01s (28.58%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.03 |AvgSamplesPerSec=8.67
epoch: 0|step: 153|ppo_ep: 1|act_loss: -0.18017578125|cri_loss: 0.12451171875|unsuper_loss: 0.0
average reward score: 2.330078125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.45%) |Training time=1.02s (29.07%) |Others=0.19 (5.48%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.67
epoch: 0|step: 154|ppo_ep: 1|act_loss: -0.170654296875|cri_loss: 0.07171630859375|unsuper_loss: 0.0
average reward score: 2.369140625
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.70%) |Training time=1.01s (28.65%) |Others=0.20 (5.65%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=8.68
epoch: 0|step: 155|ppo_ep: 1|act_loss: -0.04522705078125|cri_loss: 0.0775146484375|unsuper_loss: 0.0
average reward score: 2.9609375
-------------------------------------------------------------------------------------
|E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.33s (65.46%) |Training time=1.04s (29.19%) |Others=0.19 (5.34%)|CurSamplesPerSec=9.01 |AvgSamplesPerSec=8.68
epoch: 0|step: 156|ppo_ep: 1|act_loss: -0.049652099609375|cri_loss: 0.031463623046875|unsuper_loss: 0.0
average reward score: 2.345703125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.40%) |Training time=1.02s (28.95%) |Others=0.20 (5.65%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.68
epoch: 0|step: 157|ppo_ep: 1|act_loss: -0.00774383544921875|cri_loss: 0.0240631103515625|unsuper_loss: 0.0
average reward score: 3.109375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.61%) |Training time=1.02s (29.01%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.68
epoch: 0|step: 158|ppo_ep: 1|act_loss: 0.08807373046875|cri_loss: 0.033843994140625|unsuper_loss: 0.0
average reward score: 2.42578125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.41%) |Training time=1.02s (29.21%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.69
[2023-06-30 05:41:19,085] [INFO] [logging.py:96:log_dist] [Rank 0] step=160, skipped=6, lr=[9.555112233710543e-06, 9.555112233710543e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:41:19,119] [INFO] [timer.py:215:stop] epoch=0/micro_step=160/global_step=160, RunningAvgSamplesPerSec=37.80937525334218, CurrSamplesPerSec=47.262892050600534, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:41:19,277] [INFO] [logging.py:96:log_dist] [Rank 0] step=160, skipped=4, lr=[4.947139212738395e-06, 4.947139212738395e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 159|ppo_ep: 1|act_loss: 0.10955810546875|cri_loss: 0.0467529296875|unsuper_loss: 0.0
average reward score: 3.056640625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.89%) |Training time=1.01s (28.70%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.69
epoch: 0|step: 160|ppo_ep: 1|act_loss: 0.03643798828125|cri_loss: 0.02471923828125|unsuper_loss: 0.0
average reward score: 2.38671875
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.58%) |Training time=1.02s (29.04%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.69
epoch: 0|step: 161|ppo_ep: 1|act_loss: -0.0034236907958984375|cri_loss: 0.037994384765625|unsuper_loss: 0.0
average reward score: 2.74609375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.79%) |Training time=1.01s (28.82%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.69
epoch: 0|step: 162|ppo_ep: 1|act_loss: 0.156982421875|cri_loss: 0.050537109375|unsuper_loss: 0.0
average reward score: 3.26953125
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.34s (66.03%) |Training time=1.01s (28.61%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.03 |AvgSamplesPerSec=8.70
epoch: 0|step: 163|ppo_ep: 1|act_loss: 0.133544921875|cri_loss: 0.0455322265625|unsuper_loss: 0.0
average reward score: 2.560546875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.75%) |Training time=1.01s (28.79%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.70
epoch: 0|step: 164|ppo_ep: 1|act_loss: 0.03631591796875|cri_loss: 0.0307159423828125|unsuper_loss: 0.0
average reward score: 2.884765625
-------------------------------------------------------------------------------------
|E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.33s (65.82%) |Training time=1.02s (28.85%) |Others=0.19 (5.33%)|CurSamplesPerSec=9.02 |AvgSamplesPerSec=8.70
epoch: 0|step: 165|ppo_ep: 1|act_loss: -0.05828857421875|cri_loss: 0.0213470458984375|unsuper_loss: 0.0
average reward score: 2.240234375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.97%) |Training time=1.00s (28.63%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.70
epoch: 0|step: 166|ppo_ep: 1|act_loss: 0.0111236572265625|cri_loss: 0.0164337158203125|unsuper_loss: 0.0
average reward score: 3.220703125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.55%) |Training time=1.02s (28.98%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.71
epoch: 0|step: 167|ppo_ep: 1|act_loss: -0.0021152496337890625|cri_loss: 0.0251617431640625|unsuper_loss: 0.0
average reward score: 2.5078125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.58%) |Training time=1.01s (29.02%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.71
epoch: 0|step: 168|ppo_ep: 1|act_loss: -0.10284423828125|cri_loss: 0.03558349609375|unsuper_loss: 0.0
average reward score: 3.8046875
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.34s (66.60%) |Training time=0.99s (28.03%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.71
[2023-06-30 05:41:54,176] [INFO] [logging.py:96:log_dist] [Rank 0] step=170, skipped=6, lr=[9.516891965002726e-06, 9.516891965002726e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:41:54,209] [INFO] [timer.py:215:stop] epoch=0/micro_step=170/global_step=170, RunningAvgSamplesPerSec=38.259588508724946, CurrSamplesPerSec=48.19437794554611, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:41:54,367] [INFO] [logging.py:96:log_dist] [Rank 0] step=170, skipped=4, lr=[4.926675784338174e-06, 4.926675784338174e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 169|ppo_ep: 1|act_loss: -0.25146484375|cri_loss: 0.10772705078125|unsuper_loss: 0.0
average reward score: 3.33984375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.30s (66.11%) |Training time=0.99s (28.49%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.71
epoch: 0|step: 170|ppo_ep: 1|act_loss: -0.07916259765625|cri_loss: 0.033782958984375|unsuper_loss: 0.0
average reward score: 3.26953125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.11%) |Training time=1.00s (28.49%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.72
epoch: 0|step: 171|ppo_ep: 1|act_loss: -0.043487548828125|cri_loss: 0.05731201171875|unsuper_loss: 0.0
average reward score: 3.27734375
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.82%) |Training time=1.01s (28.76%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.07 |AvgSamplesPerSec=8.72
epoch: 0|step: 172|ppo_ep: 1|act_loss: 0.051544189453125|cri_loss: 0.0386962890625|unsuper_loss: 0.0
average reward score: 3.078125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.35s (67.00%) |Training time=0.97s (27.58%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.72
epoch: 0|step: 173|ppo_ep: 1|act_loss: -0.01483917236328125|cri_loss: 0.023590087890625|unsuper_loss: 0.0
average reward score: 2.83203125
-------------------------------------------------------------------------------------
|E2E latency=3.88s |Gather latency=0.00s (0.00%) |Generate time=2.39s (61.48%) |Training time=1.30s (33.47%) |Others=0.20 (5.04%)|CurSamplesPerSec=8.24 |AvgSamplesPerSec=8.72
epoch: 0|step: 174|ppo_ep: 1|act_loss: -0.0209197998046875|cri_loss: 0.01678466796875|unsuper_loss: 0.0
average reward score: 3.08984375
-------------------------------------------------------------------------------------
|E2E latency=3.92s |Gather latency=0.00s (0.00%) |Generate time=2.31s (58.89%) |Training time=1.42s (36.31%) |Others=0.19 (4.80%)|CurSamplesPerSec=8.16 |AvgSamplesPerSec=8.71
epoch: 0|step: 175|ppo_ep: 1|act_loss: 0.07305908203125|cri_loss: 0.0302276611328125|unsuper_loss: 0.0
average reward score: 3.234375
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.77%) |Training time=0.96s (27.79%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.72
epoch: 0|step: 176|ppo_ep: 1|act_loss: 0.056060791015625|cri_loss: 0.0318603515625|unsuper_loss: 0.0
average reward score: 3.392578125
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.33s (66.87%) |Training time=0.96s (27.56%) |Others=0.19 (5.57%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.72
epoch: 0|step: 177|ppo_ep: 1|act_loss: 0.08209228515625|cri_loss: 0.0234832763671875|unsuper_loss: 0.0
average reward score: 3.125
-------------------------------------------------------------------------------------
|E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.31s (67.14%) |Training time=0.94s (27.37%) |Others=0.19 (5.49%)|CurSamplesPerSec=9.29 |AvgSamplesPerSec=8.72
epoch: 0|step: 178|ppo_ep: 1|act_loss: -0.0352783203125|cri_loss: 0.035888671875|unsuper_loss: 0.0
average reward score: 3.478515625
-------------------------------------------------------------------------------------
|E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.32s (67.47%) |Training time=0.93s (27.06%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.31 |AvgSamplesPerSec=8.72
[2023-06-30 05:42:29,785] [INFO] [logging.py:96:log_dist] [Rank 0] step=180, skipped=6, lr=[9.472323022241576e-06, 9.472323022241576e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:42:29,818] [INFO] [timer.py:215:stop] epoch=0/micro_step=180/global_step=180, RunningAvgSamplesPerSec=38.57575241320838, CurrSamplesPerSec=52.78445589049026, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:42:29,978] [INFO] [logging.py:96:log_dist] [Rank 0] step=180, skipped=4, lr=[4.9029287821253445e-06, 4.9029287821253445e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 179|ppo_ep: 1|act_loss: -0.0019197463989257812|cri_loss: 0.01409149169921875|unsuper_loss: 0.0
average reward score: 3.08984375
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.32s (67.29%) |Training time=0.94s (27.16%) |Others=0.19 (5.55%)|CurSamplesPerSec=9.28 |AvgSamplesPerSec=8.73
epoch: 0|step: 180|ppo_ep: 1|act_loss: -0.04132080078125|cri_loss: 0.0157928466796875|unsuper_loss: 0.0
average reward score: 3.5859375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.33s (66.66%) |Training time=0.98s (27.94%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.73
epoch: 0|step: 181|ppo_ep: 1|act_loss: -0.061370849609375|cri_loss: 0.020599365234375|unsuper_loss: 0.0
average reward score: 3.89453125
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.33s (66.21%) |Training time=0.99s (28.01%) |Others=0.20 (5.78%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.73
epoch: 0|step: 182|ppo_ep: 1|act_loss: -0.02935791015625|cri_loss: 0.020721435546875|unsuper_loss: 0.0
average reward score: 3.576171875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.63%) |Training time=0.90s (25.84%) |Others=0.19 (5.54%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.73
epoch: 0|step: 183|ppo_ep: 1|act_loss: -0.056884765625|cri_loss: 0.03485107421875|unsuper_loss: 0.0
average reward score: 2.900390625
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.31s (67.12%) |Training time=0.94s (27.41%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.28 |AvgSamplesPerSec=8.74
epoch: 0|step: 184|ppo_ep: 1|act_loss: -0.11041259765625|cri_loss: 0.028350830078125|unsuper_loss: 0.0
average reward score: 2.912109375
-------------------------------------------------------------------------------------
|E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.32s (67.39%) |Training time=0.93s (27.10%) |Others=0.19 (5.50%)|CurSamplesPerSec=9.31 |AvgSamplesPerSec=8.74
epoch: 0|step: 185|ppo_ep: 1|act_loss: -0.0894775390625|cri_loss: 0.0199432373046875|unsuper_loss: 0.0
average reward score: 3.2109375
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.33s (67.29%) |Training time=0.94s (27.21%) |Others=0.19 (5.49%)|CurSamplesPerSec=9.24 |AvgSamplesPerSec=8.74
epoch: 0|step: 186|ppo_ep: 1|act_loss: -0.1104736328125|cri_loss: 0.047760009765625|unsuper_loss: 0.0
average reward score: 3.568359375
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.31s (67.04%) |Training time=0.95s (27.50%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.28 |AvgSamplesPerSec=8.75
epoch: 0|step: 187|ppo_ep: 1|act_loss: -0.0188751220703125|cri_loss: 0.0208740234375|unsuper_loss: 0.0
average reward score: 3.5
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.31s (67.12%) |Training time=0.94s (27.34%) |Others=0.19 (5.54%)|CurSamplesPerSec=9.28 |AvgSamplesPerSec=8.75
epoch: 0|step: 188|ppo_ep: 1|act_loss: 0.0980224609375|cri_loss: 0.031524658203125|unsuper_loss: 0.0
average reward score: 3.693359375
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.32s (67.11%) |Training time=0.95s (27.44%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.27 |AvgSamplesPerSec=8.75
[2023-06-30 05:43:04,459] [INFO] [logging.py:96:log_dist] [Rank 0] step=190, skipped=6, lr=[9.421465712376322e-06, 9.421465712376322e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:43:04,492] [INFO] [timer.py:215:stop] epoch=0/micro_step=190/global_step=190, RunningAvgSamplesPerSec=39.102572155692016, CurrSamplesPerSec=50.17571442350037, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:43:04,653] [INFO] [logging.py:96:log_dist] [Rank 0] step=190, skipped=4, lr=[4.875930338548377e-06, 4.875930338548377e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 189|ppo_ep: 1|act_loss: 0.024810791015625|cri_loss: 0.0091705322265625|unsuper_loss: 0.0
average reward score: 3.28515625
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.74%) |Training time=0.97s (27.78%) |Others=0.19 (5.48%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.75
epoch: 0|step: 190|ppo_ep: 1|act_loss: 0.00024127960205078125|cri_loss: 0.00811004638671875|unsuper_loss: 0.0
average reward score: 3.080078125
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.33s (67.52%) |Training time=0.93s (26.98%) |Others=0.19 (5.50%)|CurSamplesPerSec=9.26 |AvgSamplesPerSec=8.76
epoch: 0|step: 191|ppo_ep: 1|act_loss: 0.022705078125|cri_loss: 0.0246429443359375|unsuper_loss: 0.0
average reward score: 3.201171875
-------------------------------------------------------------------------------------
|E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.39s (65.68%) |Training time=1.06s (29.09%) |Others=0.19 (5.23%)|CurSamplesPerSec=8.78 |AvgSamplesPerSec=8.76
epoch: 0|step: 192|ppo_ep: 1|act_loss: -0.006687164306640625|cri_loss: 0.01345062255859375|unsuper_loss: 0.0
average reward score: 2.609375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.96%) |Training time=0.96s (27.60%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.76
epoch: 0|step: 193|ppo_ep: 1|act_loss: -0.050079345703125|cri_loss: 0.018157958984375|unsuper_loss: 0.0
average reward score: 3.65625
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.96%) |Training time=0.95s (27.55%) |Others=0.19 (5.49%)|CurSamplesPerSec=9.23 |AvgSamplesPerSec=8.76
epoch: 0|step: 194|ppo_ep: 1|act_loss: -0.02288818359375|cri_loss: 0.00928497314453125|unsuper_loss: 0.0
average reward score: 3.125
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.33s (67.26%) |Training time=0.94s (27.27%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.76
epoch: 0|step: 195|ppo_ep: 1|act_loss: -0.006526947021484375|cri_loss: 0.007740020751953125|unsuper_loss: 0.0
average reward score: 3.529296875
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.32s (67.26%) |Training time=0.94s (27.28%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.28 |AvgSamplesPerSec=8.76
epoch: 0|step: 196|ppo_ep: 1|act_loss: 0.023284912109375|cri_loss: 0.0196075439453125|unsuper_loss: 0.0
average reward score: 2.9140625
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.87%) |Training time=0.96s (27.69%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.26 |AvgSamplesPerSec=8.77
epoch: 0|step: 197|ppo_ep: 1|act_loss: 0.05718994140625|cri_loss: 0.0144195556640625|unsuper_loss: 0.0
average reward score: 2.84375
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.31s (67.08%) |Training time=0.94s (27.19%) |Others=0.20 (5.73%)|CurSamplesPerSec=9.28 |AvgSamplesPerSec=8.77
epoch: 0|step: 198|ppo_ep: 1|act_loss: 0.0626220703125|cri_loss: 0.011260986328125|unsuper_loss: 0.0
average reward score: 3.25
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.66%) |Training time=0.97s (27.90%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.77
[2023-06-30 05:43:39,244] [INFO] [logging.py:96:log_dist] [Rank 0] step=200, skipped=6, lr=[9.364388851246277e-06, 9.364388851246277e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:43:39,278] [INFO] [timer.py:215:stop] epoch=0/micro_step=200/global_step=200, RunningAvgSamplesPerSec=39.56311006202435, CurrSamplesPerSec=52.65608505463034, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:43:39,436] [INFO] [logging.py:96:log_dist] [Rank 0] step=200, skipped=4, lr=[4.845716985633049e-06, 4.845716985633049e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 199|ppo_ep: 1|act_loss: 0.026336669921875|cri_loss: 0.0187530517578125|unsuper_loss: 0.0
average reward score: 3.130859375
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.33s (67.41%) |Training time=0.94s (27.09%) |Others=0.19 (5.51%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.77
epoch: 0|step: 200|ppo_ep: 1|act_loss: -0.00177001953125|cri_loss: 0.021728515625|unsuper_loss: 0.0
average reward score: 3.400390625
-------------------------------------------------------------------------------------
|E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.46%) |Training time=1.01s (28.23%) |Others=0.19 (5.30%)|CurSamplesPerSec=8.95 |AvgSamplesPerSec=8.77
epoch: 0|step: 201|ppo_ep: 1|act_loss: -0.10235595703125|cri_loss: 0.03204345703125|unsuper_loss: 0.0
average reward score: 3.14453125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.33s (66.16%) |Training time=1.00s (28.43%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.78
epoch: 0|step: 202|ppo_ep: 1|act_loss: 0.0039520263671875|cri_loss: 0.01375579833984375|unsuper_loss: 0.0
average reward score: 2.6875
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.76%) |Training time=0.95s (27.53%) |Others=0.20 (5.71%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.78
epoch: 0|step: 203|ppo_ep: 1|act_loss: 0.107177734375|cri_loss: 0.047393798828125|unsuper_loss: 0.0
average reward score: 2.7890625
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.31s (67.07%) |Training time=0.95s (27.42%) |Others=0.19 (5.51%)|CurSamplesPerSec=9.27 |AvgSamplesPerSec=8.78
epoch: 0|step: 204|ppo_ep: 1|act_loss: 0.003231048583984375|cri_loss: 0.049957275390625|unsuper_loss: 0.0
average reward score: 3.177734375
-------------------------------------------------------------------------------------
|E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.31s (67.07%) |Training time=0.94s (27.44%) |Others=0.19 (5.48%)|CurSamplesPerSec=9.30 |AvgSamplesPerSec=8.78
epoch: 0|step: 205|ppo_ep: 1|act_loss: 0.1552734375|cri_loss: 0.1396484375|unsuper_loss: 0.0
average reward score: 2.40234375
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.97%) |Training time=0.95s (27.55%) |Others=0.19 (5.48%)|CurSamplesPerSec=9.28 |AvgSamplesPerSec=8.79
epoch: 0|step: 206|ppo_ep: 1|act_loss: 0.1719970703125|cri_loss: 0.146484375|unsuper_loss: 0.0
average reward score: 3.130859375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.54%) |Training time=0.97s (28.01%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.23 |AvgSamplesPerSec=8.79
epoch: 0|step: 207|ppo_ep: 1|act_loss: 0.08245849609375|cri_loss: 0.125732421875|unsuper_loss: 0.0
average reward score: 3.048828125
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.92%) |Training time=0.95s (27.57%) |Others=0.19 (5.51%)|CurSamplesPerSec=9.24 |AvgSamplesPerSec=8.79
epoch: 0|step: 208|ppo_ep: 1|act_loss: -0.055877685546875|cri_loss: 0.091796875|unsuper_loss: 0.0
average reward score: 3.373046875
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.33s (67.34%) |Training time=0.94s (27.19%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.27 |AvgSamplesPerSec=8.79
[2023-06-30 05:44:14,101] [INFO] [logging.py:96:log_dist] [Rank 0] step=210, skipped=6, lr=[9.301169670465047e-06, 9.301169670465047e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:44:14,135] [INFO] [timer.py:215:stop] epoch=0/micro_step=210/global_step=210, RunningAvgSamplesPerSec=39.964507702458626, CurrSamplesPerSec=46.075461775119656, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:44:14,293] [INFO] [logging.py:96:log_dist] [Rank 0] step=210, skipped=4, lr=[4.812329605550381e-06, 4.812329605550381e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 209|ppo_ep: 1|act_loss: 0.03741455078125|cri_loss: 0.0712890625|unsuper_loss: 0.0
average reward score: 3.439453125
-------------------------------------------------------------------------------------
|E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.37s (66.07%) |Training time=1.03s (28.63%) |Others=0.19 (5.30%)|CurSamplesPerSec=8.93 |AvgSamplesPerSec=8.79
epoch: 0|step: 210|ppo_ep: 1|act_loss: -0.17138671875|cri_loss: 0.036651611328125|unsuper_loss: 0.0
average reward score: 3.0
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.34s (67.38%) |Training time=0.94s (27.14%) |Others=0.19 (5.48%)|CurSamplesPerSec=9.23 |AvgSamplesPerSec=8.79
epoch: 0|step: 211|ppo_ep: 1|act_loss: -0.13916015625|cri_loss: 0.0560302734375|unsuper_loss: 0.0
average reward score: 3.494140625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.52%) |Training time=0.98s (28.03%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.80
epoch: 0|step: 212|ppo_ep: 1|act_loss: -0.140625|cri_loss: 0.0523681640625|unsuper_loss: 0.0
average reward score: 3.009765625
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.31s (67.06%) |Training time=0.95s (27.47%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.29 |AvgSamplesPerSec=8.80
epoch: 0|step: 213|ppo_ep: 1|act_loss: -0.0213623046875|cri_loss: 0.0389404296875|unsuper_loss: 0.0
average reward score: 3.490234375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.64%) |Training time=0.97s (27.86%) |Others=0.19 (5.50%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.80
epoch: 0|step: 214|ppo_ep: 1|act_loss: -0.0618896484375|cri_loss: 0.033355712890625|unsuper_loss: 0.0
average reward score: 2.71875
-------------------------------------------------------------------------------------
|E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.31s (67.28%) |Training time=0.94s (27.21%) |Others=0.19 (5.51%)|CurSamplesPerSec=9.30 |AvgSamplesPerSec=8.80
epoch: 0|step: 215|ppo_ep: 1|act_loss: -0.045989990234375|cri_loss: 0.0310821533203125|unsuper_loss: 0.0
average reward score: 3.619140625
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.91%) |Training time=0.94s (27.33%) |Others=0.20 (5.76%)|CurSamplesPerSec=9.27 |AvgSamplesPerSec=8.80
epoch: 0|step: 216|ppo_ep: 1|act_loss: -0.07818603515625|cri_loss: 0.0201416015625|unsuper_loss: 0.0
average reward score: 2.94140625
-------------------------------------------------------------------------------------
|E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.31s (67.26%) |Training time=0.94s (27.24%) |Others=0.19 (5.51%)|CurSamplesPerSec=9.31 |AvgSamplesPerSec=8.81
epoch: 0|step: 217|ppo_ep: 1|act_loss: 0.055389404296875|cri_loss: 0.0210113525390625|unsuper_loss: 0.0
average reward score: 3.6015625
-------------------------------------------------------------------------------------
|E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.33s (68.13%) |Training time=0.90s (26.32%) |Others=0.19 (5.55%)|CurSamplesPerSec=9.34 |AvgSamplesPerSec=8.81
epoch: 0|step: 218|ppo_ep: 1|act_loss: 0.01177215576171875|cri_loss: 0.016815185546875|unsuper_loss: 0.0
average reward score: 3.43359375
-------------------------------------------------------------------------------------
|E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.71%) |Training time=1.00s (27.99%) |Others=0.19 (5.31%)|CurSamplesPerSec=8.93 |AvgSamplesPerSec=8.81
[2023-06-30 05:44:48,792] [INFO] [logging.py:96:log_dist] [Rank 0] step=220, skipped=6, lr=[9.23189371291718e-06, 9.23189371291718e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:44:48,825] [INFO] [timer.py:215:stop] epoch=0/micro_step=220/global_step=220, RunningAvgSamplesPerSec=40.396368444174186, CurrSamplesPerSec=58.644626423648404, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:44:48,983] [INFO] [logging.py:96:log_dist] [Rank 0] step=220, skipped=4, lr=[4.775813375298314e-06, 4.775813375298314e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 219|ppo_ep: 1|act_loss: 0.067138671875|cri_loss: 0.0240631103515625|unsuper_loss: 0.0
average reward score: 3.154296875
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.40s (69.27%) |Training time=0.88s (25.25%) |Others=0.19 (5.48%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.81
epoch: 0|step: 220|ppo_ep: 1|act_loss: -0.0274505615234375|cri_loss: 0.0203094482421875|unsuper_loss: 0.0
average reward score: 3.65625
-------------------------------------------------------------------------------------
|E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.45s (68.74%) |Training time=0.91s (25.61%) |Others=0.20 (5.64%)|CurSamplesPerSec=8.98 |AvgSamplesPerSec=8.81
epoch: 0|step: 221|ppo_ep: 1|act_loss: 0.00609588623046875|cri_loss: 0.021331787109375|unsuper_loss: 0.0
average reward score: 3.21484375
-------------------------------------------------------------------------------------
|E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.00%) |Training time=0.81s (23.44%) |Others=0.19 (5.56%)|CurSamplesPerSec=9.29 |AvgSamplesPerSec=8.81
epoch: 0|step: 222|ppo_ep: 1|act_loss: 0.075439453125|cri_loss: 0.0194244384765625|unsuper_loss: 0.0
average reward score: 3.15234375
-------------------------------------------------------------------------------------
|E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.43s (70.93%) |Training time=0.81s (23.49%) |Others=0.19 (5.58%)|CurSamplesPerSec=9.34 |AvgSamplesPerSec=8.82
epoch: 0|step: 223|ppo_ep: 1|act_loss: 0.0214385986328125|cri_loss: 0.01166534423828125|unsuper_loss: 0.0
average reward score: 3.140625
-------------------------------------------------------------------------------------
|E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.43s (70.75%) |Training time=0.82s (23.73%) |Others=0.19 (5.52%)|CurSamplesPerSec=9.31 |AvgSamplesPerSec=8.82
epoch: 0|step: 224|ppo_ep: 1|act_loss: 0.01430511474609375|cri_loss: 0.01178741455078125|unsuper_loss: 0.0
average reward score: 3.36328125
-------------------------------------------------------------------------------------
|E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.43s (70.70%) |Training time=0.82s (23.80%) |Others=0.19 (5.51%)|CurSamplesPerSec=9.31 |AvgSamplesPerSec=8.82
epoch: 0|step: 225|ppo_ep: 1|act_loss: 0.0279541015625|cri_loss: 0.01390838623046875|unsuper_loss: 0.0
average reward score: 3.455078125
-------------------------------------------------------------------------------------
|E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.43s (70.72%) |Training time=0.81s (23.70%) |Others=0.19 (5.59%)|CurSamplesPerSec=9.33 |AvgSamplesPerSec=8.82
epoch: 0|step: 226|ppo_ep: 1|act_loss: 0.04278564453125|cri_loss: 0.01398468017578125|unsuper_loss: 0.0
average reward score: 3.310546875
-------------------------------------------------------------------------------------
|E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.45s (71.21%) |Training time=0.80s (23.30%) |Others=0.19 (5.50%)|CurSamplesPerSec=9.30 |AvgSamplesPerSec=8.82
epoch: 0|step: 227|ppo_ep: 1|act_loss: -0.019012451171875|cri_loss: 0.014801025390625|unsuper_loss: 0.0
average reward score: 3.921875
-------------------------------------------------------------------------------------
|E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.43s (67.07%) |Training time=1.00s (27.68%) |Others=0.19 (5.25%)|CurSamplesPerSec=8.85 |AvgSamplesPerSec=8.82
epoch: 0|step: 228|ppo_ep: 1|act_loss: -0.0035266876220703125|cri_loss: 0.0335693359375|unsuper_loss: 0.0
average reward score: 3.9140625
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.98%) |Training time=0.85s (24.55%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.83
[2023-06-30 05:45:23,557] [INFO] [logging.py:96:log_dist] [Rank 0] step=230, skipped=6, lr=[9.156654717008744e-06, 9.156654717008744e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:45:23,590] [INFO] [timer.py:215:stop] epoch=0/micro_step=230/global_step=230, RunningAvgSamplesPerSec=41.01712050003653, CurrSamplesPerSec=59.52549456846169, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:45:23,749] [INFO] [logging.py:96:log_dist] [Rank 0] step=230, skipped=4, lr=[4.736217705571989e-06, 4.736217705571989e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 229|ppo_ep: 1|act_loss: -0.0738525390625|cri_loss: 0.019073486328125|unsuper_loss: 0.0
average reward score: 3.76171875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.72%) |Training time=0.87s (24.87%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.83
epoch: 0|step: 230|ppo_ep: 1|act_loss: -0.0673828125|cri_loss: 0.0268402099609375|unsuper_loss: 0.0
average reward score: 3.5703125
-------------------------------------------------------------------------------------
|E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.47s (69.44%) |Training time=0.89s (25.16%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.00 |AvgSamplesPerSec=8.83
epoch: 0|step: 231|ppo_ep: 1|act_loss: -0.07220458984375|cri_loss: 0.02911376953125|unsuper_loss: 0.0
average reward score: 3.58984375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.52%) |Training time=0.88s (25.05%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.83
epoch: 0|step: 232|ppo_ep: 1|act_loss: -0.0035495758056640625|cri_loss: 0.036376953125|unsuper_loss: 0.0
average reward score: 2.908203125
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.84%) |Training time=0.86s (24.66%) |Others=0.19 (5.50%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.83
epoch: 0|step: 233|ppo_ep: 1|act_loss: -0.0489501953125|cri_loss: 0.0172271728515625|unsuper_loss: 0.0
average reward score: 3.51953125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.45s (70.03%) |Training time=0.86s (24.46%) |Others=0.19 (5.51%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.83
epoch: 0|step: 234|ppo_ep: 1|act_loss: 0.0103607177734375|cri_loss: 0.0202178955078125|unsuper_loss: 0.0
average reward score: 3.513671875
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.97%) |Training time=0.85s (24.56%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.83
epoch: 0|step: 235|ppo_ep: 1|act_loss: 0.006908416748046875|cri_loss: 0.0211181640625|unsuper_loss: 0.0
average reward score: 3.44921875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.47s (70.57%) |Training time=0.84s (23.92%) |Others=0.19 (5.51%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.84
epoch: 0|step: 236|ppo_ep: 1|act_loss: 0.040313720703125|cri_loss: 0.01549530029296875|unsuper_loss: 0.0
average reward score: 3.5
-------------------------------------------------------------------------------------
|E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.45s (66.59%) |Training time=1.04s (28.24%) |Others=0.19 (5.17%)|CurSamplesPerSec=8.71 |AvgSamplesPerSec=8.83
epoch: 0|step: 237|ppo_ep: 1|act_loss: 0.00144195556640625|cri_loss: 0.0218353271484375|unsuper_loss: 0.0
average reward score: 3.63671875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.58%) |Training time=0.87s (24.81%) |Others=0.20 (5.61%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.84
epoch: 0|step: 238|ppo_ep: 1|act_loss: -0.0060272216796875|cri_loss: 0.01739501953125|unsuper_loss: 0.0
average reward score: 3.203125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.72%) |Training time=1.01s (28.83%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.84
[2023-06-30 05:45:58,755] [INFO] [logging.py:96:log_dist] [Rank 0] step=240, skipped=6, lr=[9.075554489828361e-06, 9.075554489828361e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:45:58,788] [INFO] [timer.py:215:stop] epoch=0/micro_step=240/global_step=240, RunningAvgSamplesPerSec=41.46350535971084, CurrSamplesPerSec=46.721219074777494, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:45:58,953] [INFO] [logging.py:96:log_dist] [Rank 0] step=240, skipped=4, lr=[4.693596173905352e-06, 4.693596173905352e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 239|ppo_ep: 1|act_loss: 0.0008525848388671875|cri_loss: 0.01456451416015625|unsuper_loss: 0.0
average reward score: 3.416015625
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.62%) |Training time=1.02s (28.83%) |Others=0.20 (5.55%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.84
epoch: 0|step: 240|ppo_ep: 1|act_loss: -0.0223846435546875|cri_loss: 0.01108551025390625|unsuper_loss: 0.0
average reward score: 3.92578125
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.34s (66.37%) |Training time=1.00s (28.26%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.06 |AvgSamplesPerSec=8.84
epoch: 0|step: 241|ppo_ep: 1|act_loss: -0.0457763671875|cri_loss: 0.0103912353515625|unsuper_loss: 0.0
average reward score: 3.025390625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.78%) |Training time=1.01s (28.74%) |Others=0.19 (5.48%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.84
epoch: 0|step: 242|ppo_ep: 1|act_loss: 0.046356201171875|cri_loss: 0.02838134765625|unsuper_loss: 0.0
average reward score: 3.478515625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.89%) |Training time=1.01s (28.73%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.84
epoch: 0|step: 243|ppo_ep: 1|act_loss: -0.0174407958984375|cri_loss: 0.0152740478515625|unsuper_loss: 0.0
average reward score: 2.810546875
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.42%) |Training time=1.03s (29.22%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.84
epoch: 0|step: 244|ppo_ep: 1|act_loss: -0.03118896484375|cri_loss: 0.01454925537109375|unsuper_loss: 0.0
average reward score: 3.123046875
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.71%) |Training time=1.02s (28.89%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.07 |AvgSamplesPerSec=8.84
epoch: 0|step: 245|ppo_ep: 1|act_loss: -0.003681182861328125|cri_loss: 0.014923095703125|unsuper_loss: 0.0
average reward score: 2.958984375
-------------------------------------------------------------------------------------
|E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.36s (65.38%) |Training time=1.04s (28.91%) |Others=0.21 (5.71%)|CurSamplesPerSec=8.88 |AvgSamplesPerSec=8.84
epoch: 0|step: 246|ppo_ep: 1|act_loss: -0.0148468017578125|cri_loss: 0.0156402587890625|unsuper_loss: 0.0
average reward score: 2.98046875
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.49%) |Training time=1.02s (28.73%) |Others=0.20 (5.78%)|CurSamplesPerSec=9.03 |AvgSamplesPerSec=8.84
epoch: 0|step: 247|ppo_ep: 1|act_loss: -0.0206756591796875|cri_loss: 0.0171966552734375|unsuper_loss: 0.0
average reward score: 3.42578125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.15%) |Training time=1.02s (29.18%) |Others=0.20 (5.67%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.85
epoch: 0|step: 248|ppo_ep: 1|act_loss: 0.016021728515625|cri_loss: 0.020538330078125|unsuper_loss: 0.0
average reward score: 2.89453125
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.16%) |Training time=1.02s (29.11%) |Others=0.20 (5.73%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=8.85
[2023-06-30 05:46:34,051] [INFO] [logging.py:96:log_dist] [Rank 0] step=250, skipped=6, lr=[8.988702769390434e-06, 8.988702769390434e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:46:34,084] [INFO] [timer.py:215:stop] epoch=0/micro_step=250/global_step=250, RunningAvgSamplesPerSec=41.63641341821135, CurrSamplesPerSec=44.53643857089434, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:46:34,260] [INFO] [logging.py:96:log_dist] [Rank 0] step=250, skipped=4, lr=[4.648006452174529e-06, 4.648006452174529e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 249|ppo_ep: 1|act_loss: 0.01532745361328125|cri_loss: 0.01142120361328125|unsuper_loss: 0.0
average reward score: 3.01953125
-------------------------------------------------------------------------------------
|E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.30s (64.75%) |Training time=1.05s (29.47%) |Others=0.21 (5.79%)|CurSamplesPerSec=9.00 |AvgSamplesPerSec=8.85
[2023-06-30 05:46:37,772] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 250|ppo_ep: 1|act_loss: 0.02886962890625|cri_loss: 0.026397705078125|unsuper_loss: 0.0
average reward score: 2.8125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.87%) |Training time=1.02s (29.05%) |Others=0.18 (5.08%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.85
epoch: 0|step: 251|ppo_ep: 1|act_loss: -0.059326171875|cri_loss: 0.010101318359375|unsuper_loss: 0.0
average reward score: 3.21875
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.47%) |Training time=1.02s (29.06%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.85
epoch: 0|step: 252|ppo_ep: 1|act_loss: -0.09002685546875|cri_loss: 0.01727294921875|unsuper_loss: 0.0
average reward score: 3.119140625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.52%) |Training time=1.02s (29.09%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.85
epoch: 0|step: 253|ppo_ep: 1|act_loss: 0.01178741455078125|cri_loss: 0.01030731201171875|unsuper_loss: 0.0
average reward score: 3.0859375
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.80%) |Training time=1.01s (28.72%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.85
epoch: 0|step: 254|ppo_ep: 1|act_loss: 0.0140228271484375|cri_loss: 0.016693115234375|unsuper_loss: 0.0
average reward score: 3.3984375
-------------------------------------------------------------------------------------
|E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.44%) |Training time=1.02s (28.21%) |Others=0.19 (5.36%)|CurSamplesPerSec=8.89 |AvgSamplesPerSec=8.85
epoch: 0|step: 255|ppo_ep: 1|act_loss: 0.0782470703125|cri_loss: 0.0166015625|unsuper_loss: 0.0
average reward score: 3.740234375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.21%) |Training time=0.89s (25.28%) |Others=0.19 (5.51%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.85
epoch: 0|step: 256|ppo_ep: 1|act_loss: -0.031829833984375|cri_loss: 0.00992584228515625|unsuper_loss: 0.0
average reward score: 3.599609375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.18%) |Training time=0.92s (26.42%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.85
epoch: 0|step: 257|ppo_ep: 1|act_loss: 0.0307159423828125|cri_loss: 0.0224761962890625|unsuper_loss: 0.0
average reward score: 3.666015625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.06%) |Training time=0.93s (26.53%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.85
epoch: 0|step: 258|ppo_ep: 1|act_loss: 0.0031909942626953125|cri_loss: 0.0208282470703125|unsuper_loss: 0.0
average reward score: 3.775390625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.11%) |Training time=0.93s (26.50%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.86
[2023-06-30 05:47:09,250] [INFO] [logging.py:96:log_dist] [Rank 0] step=260, skipped=6, lr=[8.89621707614687e-06, 8.89621707614687e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:47:09,284] [INFO] [timer.py:215:stop] epoch=0/micro_step=260/global_step=260, RunningAvgSamplesPerSec=41.90619863447883, CurrSamplesPerSec=51.23965960286492, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:47:09,442] [INFO] [logging.py:96:log_dist] [Rank 0] step=260, skipped=5, lr=[4.604488803736523e-06, 4.604488803736523e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 259|ppo_ep: 1|act_loss: 0.01480865478515625|cri_loss: 0.01153564453125|unsuper_loss: 0.0
average reward score: 3.353515625
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.39s (67.55%) |Training time=0.96s (27.11%) |Others=0.19 (5.34%)|CurSamplesPerSec=9.05 |AvgSamplesPerSec=8.86
epoch: 0|step: 260|ppo_ep: 1|act_loss: 0.013397216796875|cri_loss: 0.0184326171875|unsuper_loss: 0.0
average reward score: 4.1015625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.18%) |Training time=0.92s (26.43%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.86
epoch: 0|step: 261|ppo_ep: 1|act_loss: -0.0175323486328125|cri_loss: 0.01499176025390625|unsuper_loss: 0.0
average reward score: 3.763671875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.19%) |Training time=0.93s (26.43%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.86
epoch: 0|step: 262|ppo_ep: 1|act_loss: -0.01209259033203125|cri_loss: 0.00943756103515625|unsuper_loss: 0.0
average reward score: 3.4609375
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.42s (68.41%) |Training time=0.92s (26.13%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.06 |AvgSamplesPerSec=8.86
epoch: 0|step: 263|ppo_ep: 1|act_loss: 0.033782958984375|cri_loss: 0.01739501953125|unsuper_loss: 0.0
average reward score: 3.173828125
-------------------------------------------------------------------------------------
|E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.48s (68.54%) |Training time=0.95s (26.18%) |Others=0.19 (5.28%)|CurSamplesPerSec=8.85 |AvgSamplesPerSec=8.86
epoch: 0|step: 264|ppo_ep: 1|act_loss: 0.026336669921875|cri_loss: 0.0207672119140625|unsuper_loss: 0.0
average reward score: 3.603515625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.18%) |Training time=0.93s (26.41%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.86
epoch: 0|step: 265|ppo_ep: 1|act_loss: -0.024688720703125|cri_loss: 0.01983642578125|unsuper_loss: 0.0
average reward score: 4.24609375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.16%) |Training time=0.93s (26.48%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.86
epoch: 0|step: 266|ppo_ep: 1|act_loss: 0.032928466796875|cri_loss: 0.01474761962890625|unsuper_loss: 0.0
average reward score: 3.923828125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.27%) |Training time=0.92s (26.33%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.86
epoch: 0|step: 267|ppo_ep: 1|act_loss: 0.0802001953125|cri_loss: 0.01556396484375|unsuper_loss: 0.0
average reward score: 3.78125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.32%) |Training time=0.92s (26.25%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.86
epoch: 0|step: 268|ppo_ep: 1|act_loss: -0.068603515625|cri_loss: 0.0157623291015625|unsuper_loss: 0.0
average reward score: 4.515625
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.42s (68.42%) |Training time=0.92s (26.13%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.07 |AvgSamplesPerSec=8.86
[2023-06-30 05:47:44,455] [INFO] [logging.py:96:log_dist] [Rank 0] step=270, skipped=6, lr=[8.798222553968287e-06, 8.798222553968287e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:47:44,489] [INFO] [timer.py:215:stop] epoch=0/micro_step=270/global_step=270, RunningAvgSamplesPerSec=42.25042286187838, CurrSamplesPerSec=52.61292468447195, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:47:44,646] [INFO] [logging.py:96:log_dist] [Rank 0] step=270, skipped=5, lr=[4.553432724122265e-06, 4.553432724122265e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 269|ppo_ep: 1|act_loss: -0.060089111328125|cri_loss: 0.01084136962890625|unsuper_loss: 0.0
average reward score: 4.14453125
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.39s (67.97%) |Training time=0.94s (26.68%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=8.86
epoch: 0|step: 270|ppo_ep: 1|act_loss: 0.03887939453125|cri_loss: 0.0229644775390625|unsuper_loss: 0.0
average reward score: 4.03125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.17%) |Training time=0.93s (26.43%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.87
epoch: 0|step: 271|ppo_ep: 1|act_loss: 3.129243850708008e-05|cri_loss: 0.023590087890625|unsuper_loss: 0.0
average reward score: 3.873046875
-------------------------------------------------------------------------------------
|E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.43s (67.92%) |Training time=0.96s (26.77%) |Others=0.19 (5.31%)|CurSamplesPerSec=8.94 |AvgSamplesPerSec=8.87
epoch: 0|step: 272|ppo_ep: 1|act_loss: 0.0042266845703125|cri_loss: 0.0160675048828125|unsuper_loss: 0.0
average reward score: 3.55078125
-------------------------------------------------------------------------------------
|E2E latency=3.97s |Gather latency=0.00s (0.00%) |Generate time=2.44s (61.46%) |Training time=1.34s (33.74%) |Others=0.19 (4.79%)|CurSamplesPerSec=8.06 |AvgSamplesPerSec=8.86
epoch: 0|step: 273|ppo_ep: 1|act_loss: -0.0241241455078125|cri_loss: 0.01262664794921875|unsuper_loss: 0.0
average reward score: 4.07421875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.37%) |Training time=0.92s (26.22%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.86
epoch: 0|step: 274|ppo_ep: 1|act_loss: 0.01458740234375|cri_loss: 0.0111541748046875|unsuper_loss: 0.0
average reward score: 3.81640625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.16%) |Training time=0.92s (26.43%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.86
epoch: 0|step: 275|ppo_ep: 1|act_loss: 0.0298919677734375|cri_loss: 0.00945281982421875|unsuper_loss: 0.0
average reward score: 4.15625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.23%) |Training time=0.92s (26.38%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.87
epoch: 0|step: 276|ppo_ep: 1|act_loss: -0.01438140869140625|cri_loss: 0.00931549072265625|unsuper_loss: 0.0
average reward score: 3.64453125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.11%) |Training time=0.93s (26.51%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.87
epoch: 0|step: 277|ppo_ep: 1|act_loss: 0.035400390625|cri_loss: 0.021270751953125|unsuper_loss: 0.0
average reward score: 3.72265625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.07%) |Training time=0.92s (26.38%) |Others=0.19 (5.54%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.87
epoch: 0|step: 278|ppo_ep: 1|act_loss: -0.018463134765625|cri_loss: 0.01983642578125|unsuper_loss: 0.0
average reward score: 4.21484375
-------------------------------------------------------------------------------------
|E2E latency=4.03s |Gather latency=0.00s (0.00%) |Generate time=2.39s (59.25%) |Training time=1.45s (35.93%) |Others=0.19 (4.81%)|CurSamplesPerSec=7.95 |AvgSamplesPerSec=8.86
[2023-06-30 05:48:21,015] [INFO] [logging.py:96:log_dist] [Rank 0] step=280, skipped=6, lr=[8.694851800809824e-06, 8.694851800809824e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:48:21,048] [INFO] [timer.py:215:stop] epoch=0/micro_step=280/global_step=280, RunningAvgSamplesPerSec=42.28999125258335, CurrSamplesPerSec=29.809581026375774, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:48:21,209] [INFO] [logging.py:96:log_dist] [Rank 0] step=280, skipped=5, lr=[4.499598111849299e-06, 4.499598111849299e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 279|ppo_ep: 1|act_loss: 0.0149688720703125|cri_loss: 0.0271759033203125|unsuper_loss: 0.0
average reward score: 3.9375
-------------------------------------------------------------------------------------
|E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=2.40s (59.97%) |Training time=1.41s (35.17%) |Others=0.19 (4.86%)|CurSamplesPerSec=8.00 |AvgSamplesPerSec=8.86
epoch: 0|step: 280|ppo_ep: 1|act_loss: 0.0003762245178222656|cri_loss: 0.013702392578125|unsuper_loss: 0.0
average reward score: 3.544921875
-------------------------------------------------------------------------------------
|E2E latency=3.87s |Gather latency=0.00s (0.00%) |Generate time=2.46s (63.66%) |Training time=1.22s (31.44%) |Others=0.19 (4.90%)|CurSamplesPerSec=8.27 |AvgSamplesPerSec=8.86
epoch: 0|step: 281|ppo_ep: 1|act_loss: -0.05609130859375|cri_loss: 0.0127716064453125|unsuper_loss: 0.0
average reward score: 3.537109375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.63%) |Training time=0.90s (25.96%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.86
epoch: 0|step: 282|ppo_ep: 1|act_loss: 0.0205841064453125|cri_loss: 0.0086212158203125|unsuper_loss: 0.0
average reward score: 3.806640625
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.59%) |Training time=0.90s (25.98%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.86
epoch: 0|step: 283|ppo_ep: 1|act_loss: 0.01251983642578125|cri_loss: 0.00771331787109375|unsuper_loss: 0.0
average reward score: 3.63671875
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.80%) |Training time=0.89s (25.73%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.86
epoch: 0|step: 284|ppo_ep: 1|act_loss: 0.040985107421875|cri_loss: 0.01374053955078125|unsuper_loss: 0.0
average reward score: 4.0546875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.27%) |Training time=0.92s (26.31%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.86
epoch: 0|step: 285|ppo_ep: 1|act_loss: 0.054718017578125|cri_loss: 0.01253509521484375|unsuper_loss: 0.0
average reward score: 3.322265625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.41s (68.72%) |Training time=0.91s (25.88%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.86
epoch: 0|step: 286|ppo_ep: 1|act_loss: 0.038543701171875|cri_loss: 0.019256591796875|unsuper_loss: 0.0
average reward score: 3.962890625
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.74%) |Training time=0.90s (25.85%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.86
epoch: 0|step: 287|ppo_ep: 1|act_loss: -0.0156402587890625|cri_loss: 0.0075225830078125|unsuper_loss: 0.0
average reward score: 3.51953125
-------------------------------------------------------------------------------------
|E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.43s (68.15%) |Training time=0.94s (26.51%) |Others=0.19 (5.34%)|CurSamplesPerSec=8.99 |AvgSamplesPerSec=8.87
epoch: 0|step: 288|ppo_ep: 1|act_loss: -0.0168304443359375|cri_loss: 0.01154327392578125|unsuper_loss: 0.0
average reward score: 3.34375
-------------------------------------------------------------------------------------
|E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.46s (69.01%) |Training time=0.91s (25.55%) |Others=0.19 (5.44%)|CurSamplesPerSec=8.98 |AvgSamplesPerSec=8.87
[2023-06-30 05:48:56,601] [INFO] [logging.py:96:log_dist] [Rank 0] step=290, skipped=6, lr=[8.58624468929075e-06, 8.58624468929075e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:48:56,634] [INFO] [timer.py:215:stop] epoch=0/micro_step=290/global_step=290, RunningAvgSamplesPerSec=42.54413348714627, CurrSamplesPerSec=43.33162913799527, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:48:56,792] [INFO] [logging.py:96:log_dist] [Rank 0] step=290, skipped=5, lr=[4.443057811392445e-06, 4.443057811392445e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 289|ppo_ep: 1|act_loss: -0.03826904296875|cri_loss: 0.00897216796875|unsuper_loss: 0.0
average reward score: 4.11328125
-------------------------------------------------------------------------------------
|E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.42s (65.77%) |Training time=1.07s (29.09%) |Others=0.19 (5.14%)|CurSamplesPerSec=8.71 |AvgSamplesPerSec=8.87
epoch: 0|step: 290|ppo_ep: 1|act_loss: 0.0018758773803710938|cri_loss: 0.009796142578125|unsuper_loss: 0.0
average reward score: 3.94921875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.46%) |Training time=0.91s (26.14%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.87
epoch: 0|step: 291|ppo_ep: 1|act_loss: -0.0694580078125|cri_loss: 0.0222930908203125|unsuper_loss: 0.0
average reward score: 3.51953125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.33%) |Training time=0.92s (26.24%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.87
epoch: 0|step: 292|ppo_ep: 1|act_loss: -0.07135009765625|cri_loss: 0.01080322265625|unsuper_loss: 0.0
average reward score: 3.673828125
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.67%) |Training time=0.90s (25.91%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.87
epoch: 0|step: 293|ppo_ep: 1|act_loss: -0.041229248046875|cri_loss: 0.009246826171875|unsuper_loss: 0.0
average reward score: 3.595703125
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.35%) |Training time=0.91s (26.24%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.87
epoch: 0|step: 294|ppo_ep: 1|act_loss: 0.01470947265625|cri_loss: 0.009735107421875|unsuper_loss: 0.0
average reward score: 3.75
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.03%) |Training time=0.93s (26.58%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.87
epoch: 0|step: 295|ppo_ep: 1|act_loss: 0.0134124755859375|cri_loss: 0.0106201171875|unsuper_loss: 0.0
average reward score: 3.798828125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.36%) |Training time=0.91s (26.16%) |Others=0.19 (5.49%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.87
epoch: 0|step: 296|ppo_ep: 1|act_loss: 0.034637451171875|cri_loss: 0.01413726806640625|unsuper_loss: 0.0
average reward score: 3.806640625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.54%) |Training time=0.91s (26.03%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.87
epoch: 0|step: 297|ppo_ep: 1|act_loss: 0.042266845703125|cri_loss: 0.0113983154296875|unsuper_loss: 0.0
average reward score: 3.70703125
-------------------------------------------------------------------------------------
|E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.48s (69.03%) |Training time=0.92s (25.67%) |Others=0.19 (5.31%)|CurSamplesPerSec=8.92 |AvgSamplesPerSec=8.87
epoch: 0|step: 298|ppo_ep: 1|act_loss: -0.007427215576171875|cri_loss: 0.0189971923828125|unsuper_loss: 0.0
average reward score: 3.197265625
-------------------------------------------------------------------------------------
|E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.50s (69.28%) |Training time=0.92s (25.42%) |Others=0.19 (5.30%)|CurSamplesPerSec=8.86 |AvgSamplesPerSec=8.87
[2023-06-30 05:49:31,715] [INFO] [logging.py:96:log_dist] [Rank 0] step=300, skipped=6, lr=[8.472548177430567e-06, 8.472548177430567e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:49:31,748] [INFO] [timer.py:215:stop] epoch=0/micro_step=300/global_step=300, RunningAvgSamplesPerSec=42.86635875669379, CurrSamplesPerSec=55.87625585280591, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:49:31,906] [INFO] [logging.py:96:log_dist] [Rank 0] step=300, skipped=5, lr=[4.383888328336477e-06, 4.383888328336477e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 299|ppo_ep: 1|act_loss: -0.0258026123046875|cri_loss: 0.00966644287109375|unsuper_loss: 0.0
average reward score: 3.990234375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.59%) |Training time=0.90s (25.97%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.87
epoch: 0|step: 300|ppo_ep: 1|act_loss: -0.06744384765625|cri_loss: 0.0099029541015625|unsuper_loss: 0.0
average reward score: 3.798828125
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.54%) |Training time=0.91s (26.03%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.87
epoch: 0|step: 301|ppo_ep: 1|act_loss: -0.00824737548828125|cri_loss: 0.0127105712890625|unsuper_loss: 0.0
average reward score: 3.921875
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.61%) |Training time=0.90s (25.97%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.88
epoch: 0|step: 302|ppo_ep: 1|act_loss: -0.049163818359375|cri_loss: 0.0067291259765625|unsuper_loss: 0.0
average reward score: 3.791015625
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.65%) |Training time=0.90s (25.89%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.88
epoch: 0|step: 303|ppo_ep: 1|act_loss: -0.0199432373046875|cri_loss: 0.0155792236328125|unsuper_loss: 0.0
average reward score: 3.173828125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.40s (68.77%) |Training time=0.90s (25.83%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.88
epoch: 0|step: 304|ppo_ep: 1|act_loss: 0.00528717041015625|cri_loss: 0.00882720947265625|unsuper_loss: 0.0
average reward score: 3.54296875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.52%) |Training time=0.91s (26.02%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.88
epoch: 0|step: 305|ppo_ep: 1|act_loss: 0.029083251953125|cri_loss: 0.0118560791015625|unsuper_loss: 0.0
average reward score: 3.81640625
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.60%) |Training time=0.90s (25.98%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.88
epoch: 0|step: 306|ppo_ep: 1|act_loss: 0.00782012939453125|cri_loss: 0.01081085205078125|unsuper_loss: 0.0
average reward score: 3.93359375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.40s (68.69%) |Training time=0.90s (25.88%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.88
epoch: 0|step: 307|ppo_ep: 1|act_loss: -0.068359375|cri_loss: 0.01392364501953125|unsuper_loss: 0.0
average reward score: 4.15625
-------------------------------------------------------------------------------------
|E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.53s (69.90%) |Training time=0.89s (24.54%) |Others=0.20 (5.56%)|CurSamplesPerSec=8.83 |AvgSamplesPerSec=8.88
epoch: 0|step: 308|ppo_ep: 1|act_loss: -0.025177001953125|cri_loss: 0.01119232177734375|unsuper_loss: 0.0
average reward score: 3.82421875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.37s (67.78%) |Training time=0.94s (26.80%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.88
[2023-06-30 05:50:06,669] [INFO] [logging.py:96:log_dist] [Rank 0] step=310, skipped=6, lr=[8.353916109797776e-06, 8.353916109797776e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:50:06,702] [INFO] [timer.py:215:stop] epoch=0/micro_step=310/global_step=310, RunningAvgSamplesPerSec=43.18427876673568, CurrSamplesPerSec=54.22064402546822, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:50:06,860] [INFO] [logging.py:96:log_dist] [Rank 0] step=310, skipped=5, lr=[4.322169725855191e-06, 4.322169725855191e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 309|ppo_ep: 1|act_loss: -0.022979736328125|cri_loss: 0.017120361328125|unsuper_loss: 0.0
average reward score: 4.28515625
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.36s (68.03%) |Training time=0.92s (26.54%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.24 |AvgSamplesPerSec=8.88
epoch: 0|step: 310|ppo_ep: 1|act_loss: 0.0194854736328125|cri_loss: 0.0217132568359375|unsuper_loss: 0.0
average reward score: 4.0
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.36s (68.00%) |Training time=0.92s (26.58%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.88
epoch: 0|step: 311|ppo_ep: 1|act_loss: 0.0248260498046875|cri_loss: 0.020172119140625|unsuper_loss: 0.0
average reward score: 3.978515625
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.36s (67.97%) |Training time=0.92s (26.60%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.88
epoch: 0|step: 312|ppo_ep: 1|act_loss: 0.0011434555053710938|cri_loss: 0.0193634033203125|unsuper_loss: 0.0
average reward score: 4.23046875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.36s (67.71%) |Training time=0.94s (26.89%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.89
epoch: 0|step: 313|ppo_ep: 1|act_loss: 0.023529052734375|cri_loss: 0.0219879150390625|unsuper_loss: 0.0
average reward score: 3.55078125
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.36s (67.83%) |Training time=0.93s (26.75%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.89
epoch: 0|step: 314|ppo_ep: 1|act_loss: -0.036712646484375|cri_loss: 0.012786865234375|unsuper_loss: 0.0
average reward score: 4.5859375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.36s (67.93%) |Training time=0.93s (26.65%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.89
epoch: 0|step: 315|ppo_ep: 1|act_loss: -0.0203704833984375|cri_loss: 0.0105438232421875|unsuper_loss: 0.0
average reward score: 4.9453125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.29%) |Training time=0.92s (26.25%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.89
epoch: 0|step: 316|ppo_ep: 1|act_loss: -0.032379150390625|cri_loss: 0.0145263671875|unsuper_loss: 0.0
average reward score: 3.888671875
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.40s (67.82%) |Training time=0.95s (26.81%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.04 |AvgSamplesPerSec=8.89
epoch: 0|step: 317|ppo_ep: 1|act_loss: -0.046051025390625|cri_loss: 0.0191497802734375|unsuper_loss: 0.0
average reward score: 4.51953125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.37s (67.55%) |Training time=0.94s (26.98%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.89
epoch: 0|step: 318|ppo_ep: 1|act_loss: -0.029083251953125|cri_loss: 0.0103759765625|unsuper_loss: 0.0
average reward score: 3.560546875
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.36s (68.03%) |Training time=0.92s (26.54%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.89
[2023-06-30 05:50:41,555] [INFO] [logging.py:96:log_dist] [Rank 0] step=320, skipped=6, lr=[8.230509009340325e-06, 8.230509009340325e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:50:41,588] [INFO] [timer.py:215:stop] epoch=0/micro_step=320/global_step=320, RunningAvgSamplesPerSec=43.443456864600535, CurrSamplesPerSec=53.749096157327095, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:50:41,748] [INFO] [logging.py:96:log_dist] [Rank 0] step=320, skipped=5, lr=[4.257985516376644e-06, 4.257985516376644e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 319|ppo_ep: 1|act_loss: -0.06292724609375|cri_loss: 0.01352691650390625|unsuper_loss: 0.0
average reward score: 4.2109375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.36s (67.88%) |Training time=0.93s (26.65%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.89
epoch: 0|step: 320|ppo_ep: 1|act_loss: 0.0098419189453125|cri_loss: 0.00853729248046875|unsuper_loss: 0.0
average reward score: 4.640625
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.36s (67.99%) |Training time=0.92s (26.58%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.89
epoch: 0|step: 321|ppo_ep: 1|act_loss: 0.043426513671875|cri_loss: 0.0145721435546875|unsuper_loss: 0.0
average reward score: 3.76953125
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.36s (67.96%) |Training time=0.92s (26.63%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.89
epoch: 0|step: 322|ppo_ep: 1|act_loss: -0.00318145751953125|cri_loss: 0.0082855224609375|unsuper_loss: 0.0
average reward score: 3.98828125
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.36s (67.93%) |Training time=0.93s (26.64%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.89
epoch: 0|step: 323|ppo_ep: 1|act_loss: -0.0141754150390625|cri_loss: 0.01427459716796875|unsuper_loss: 0.0
average reward score: 3.6796875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.36s (67.81%) |Training time=0.93s (26.80%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.89
epoch: 0|step: 324|ppo_ep: 1|act_loss: 0.08453369140625|cri_loss: 0.038330078125|unsuper_loss: 0.0
average reward score: 3.810546875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.37s (67.92%) |Training time=0.93s (26.62%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.90
epoch: 0|step: 325|ppo_ep: 1|act_loss: -0.03765869140625|cri_loss: 0.01335906982421875|unsuper_loss: 0.0
average reward score: 4.3203125
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.43s (68.51%) |Training time=0.92s (26.04%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.04 |AvgSamplesPerSec=8.90
epoch: 0|step: 326|ppo_ep: 1|act_loss: -0.047027587890625|cri_loss: 0.01210784912109375|unsuper_loss: 0.0
average reward score: 3.63671875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.36s (67.80%) |Training time=0.93s (26.78%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.90
epoch: 0|step: 327|ppo_ep: 1|act_loss: -0.071044921875|cri_loss: 0.01168060302734375|unsuper_loss: 0.0
average reward score: 3.931640625
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.36s (66.82%) |Training time=0.98s (27.79%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.07 |AvgSamplesPerSec=8.90
epoch: 0|step: 328|ppo_ep: 1|act_loss: 0.0262908935546875|cri_loss: 0.005710601806640625|unsuper_loss: 0.0
average reward score: 4.05859375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.37s (68.07%) |Training time=0.92s (26.51%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.90
[2023-06-30 05:51:16,456] [INFO] [logging.py:96:log_dist] [Rank 0] step=330, skipped=6, lr=[8.10249386017944e-06, 8.10249386017944e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:51:16,490] [INFO] [timer.py:215:stop] epoch=0/micro_step=330/global_step=330, RunningAvgSamplesPerSec=43.6880202795452, CurrSamplesPerSec=54.1932562130297, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:51:16,643] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384
[2023-06-30 05:51:16,643] [INFO] [logging.py:96:log_dist] [Rank 0] step=330, skipped=6, lr=[4.198183347243233e-06, 4.198183347243233e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 329|ppo_ep: 1|act_loss: 0.0122528076171875|cri_loss: 0.0130462646484375|unsuper_loss: 0.0
average reward score: 4.08984375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.37s (68.26%) |Training time=0.92s (26.54%) |Others=0.18 (5.20%)|CurSamplesPerSec=9.23 |AvgSamplesPerSec=8.90
epoch: 0|step: 330|ppo_ep: 1|act_loss: -0.0223236083984375|cri_loss: 0.011383056640625|unsuper_loss: 0.0
average reward score: 4.4609375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.36s (67.80%) |Training time=0.93s (26.78%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.90
epoch: 0|step: 331|ppo_ep: 1|act_loss: 0.05450439453125|cri_loss: 0.02325439453125|unsuper_loss: 0.0
average reward score: 4.5078125
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.36s (67.85%) |Training time=0.93s (26.70%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.90
epoch: 0|step: 332|ppo_ep: 1|act_loss: 0.01091766357421875|cri_loss: 0.01358795166015625|unsuper_loss: 0.0
average reward score: 3.609375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.37s (68.03%) |Training time=0.92s (26.50%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.90
epoch: 0|step: 333|ppo_ep: 1|act_loss: 0.07049560546875|cri_loss: 0.0146484375|unsuper_loss: 0.0
average reward score: 4.27734375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.36s (67.98%) |Training time=0.92s (26.57%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.90
epoch: 0|step: 334|ppo_ep: 1|act_loss: -0.01016998291015625|cri_loss: 0.007205963134765625|unsuper_loss: 0.0
average reward score: 4.64453125
-------------------------------------------------------------------------------------
|E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.48s (69.84%) |Training time=0.88s (24.83%) |Others=0.19 (5.34%)|CurSamplesPerSec=9.01 |AvgSamplesPerSec=8.90
epoch: 0|step: 335|ppo_ep: 1|act_loss: -0.01378631591796875|cri_loss: 0.005397796630859375|unsuper_loss: 0.0
average reward score: 4.44921875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.36s (67.94%) |Training time=0.93s (26.64%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.90
epoch: 0|step: 336|ppo_ep: 1|act_loss: -0.032928466796875|cri_loss: 0.014892578125|unsuper_loss: 0.0
average reward score: 4.25390625
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.36s (68.31%) |Training time=0.91s (26.22%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.90
epoch: 0|step: 337|ppo_ep: 1|act_loss: 0.0095977783203125|cri_loss: 0.0112152099609375|unsuper_loss: 0.0
average reward score: 4.08203125
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.37s (68.09%) |Training time=0.92s (26.44%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.91
epoch: 0|step: 338|ppo_ep: 1|act_loss: 0.03814697265625|cri_loss: 0.015838623046875|unsuper_loss: 0.0
average reward score: 3.9765625
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.38s (67.57%) |Training time=0.95s (26.99%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=8.91
[2023-06-30 05:51:51,340] [INFO] [logging.py:96:log_dist] [Rank 0] step=340, skipped=6, lr=[7.970043881660744e-06, 7.970043881660744e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:51:51,373] [INFO] [timer.py:215:stop] epoch=0/micro_step=340/global_step=340, RunningAvgSamplesPerSec=43.940344159355746, CurrSamplesPerSec=55.48996000041343, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:51:51,531] [INFO] [logging.py:96:log_dist] [Rank 0] step=340, skipped=6, lr=[4.129556415368261e-06, 4.129556415368261e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 339|ppo_ep: 1|act_loss: -0.06097412109375|cri_loss: 0.01050567626953125|unsuper_loss: 0.0
average reward score: 4.0859375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.37s (68.42%) |Training time=0.91s (26.14%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.91
epoch: 0|step: 340|ppo_ep: 1|act_loss: 0.06085205078125|cri_loss: 0.025146484375|unsuper_loss: 0.0
average reward score: 3.89453125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.36s (67.62%) |Training time=0.94s (26.91%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.91
epoch: 0|step: 341|ppo_ep: 1|act_loss: 0.05560302734375|cri_loss: 0.01447296142578125|unsuper_loss: 0.0
average reward score: 4.078125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.19%) |Training time=0.92s (26.42%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.91
epoch: 0|step: 342|ppo_ep: 1|act_loss: -0.046905517578125|cri_loss: 0.01424407958984375|unsuper_loss: 0.0
average reward score: 4.77734375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.36s (67.94%) |Training time=0.92s (26.63%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.91
epoch: 0|step: 343|ppo_ep: 1|act_loss: 0.04876708984375|cri_loss: 0.019378662109375|unsuper_loss: 0.0
average reward score: 4.41796875
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.42s (68.67%) |Training time=0.92s (25.98%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.07 |AvgSamplesPerSec=8.91
epoch: 0|step: 344|ppo_ep: 1|act_loss: -0.068115234375|cri_loss: 0.01910400390625|unsuper_loss: 0.0
average reward score: 3.583984375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.49%) |Training time=0.91s (26.07%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.91
epoch: 0|step: 345|ppo_ep: 1|act_loss: -0.0028705596923828125|cri_loss: 0.01273345947265625|unsuper_loss: 0.0
average reward score: 4.625
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.37s (67.98%) |Training time=0.93s (26.59%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.91
epoch: 0|step: 346|ppo_ep: 1|act_loss: 0.0033206939697265625|cri_loss: 0.01137542724609375|unsuper_loss: 0.0
average reward score: 3.8125
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.36s (68.12%) |Training time=0.92s (26.42%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.91
epoch: 0|step: 347|ppo_ep: 1|act_loss: -0.0263824462890625|cri_loss: 0.006282806396484375|unsuper_loss: 0.0
average reward score: 3.953125
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.37s (68.29%) |Training time=0.91s (26.29%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.91
epoch: 0|step: 348|ppo_ep: 1|act_loss: 0.022857666015625|cri_loss: 0.0219573974609375|unsuper_loss: 0.0
average reward score: 3.71875
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.36s (68.24%) |Training time=0.91s (26.29%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.24 |AvgSamplesPerSec=8.91
[2023-06-30 05:52:26,143] [INFO] [logging.py:96:log_dist] [Rank 0] step=350, skipped=6, lr=[7.83333829396839e-06, 7.83333829396839e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:52:26,176] [INFO] [timer.py:215:stop] epoch=0/micro_step=350/global_step=350, RunningAvgSamplesPerSec=44.18509537763385, CurrSamplesPerSec=55.60110358998153, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:52:26,355] [INFO] [logging.py:96:log_dist] [Rank 0] step=350, skipped=6, lr=[4.058724504646834e-06, 4.058724504646834e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 349|ppo_ep: 1|act_loss: 0.0760498046875|cri_loss: 0.012451171875|unsuper_loss: 0.0
average reward score: 3.73828125
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.36s (67.88%) |Training time=0.91s (26.03%) |Others=0.21 (6.09%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.91
epoch: 0|step: 350|ppo_ep: 1|act_loss: -0.0265960693359375|cri_loss: 0.0186004638671875|unsuper_loss: 0.0
average reward score: 3.650390625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.36s (67.35%) |Training time=0.96s (27.26%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.92
epoch: 0|step: 351|ppo_ep: 1|act_loss: -0.000919342041015625|cri_loss: 0.013702392578125|unsuper_loss: 0.0
average reward score: 4.25390625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.01%) |Training time=1.00s (28.60%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.92
epoch: 0|step: 352|ppo_ep: 1|act_loss: 0.00801849365234375|cri_loss: 0.01354217529296875|unsuper_loss: 0.0
average reward score: 4.015625
-------------------------------------------------------------------------------------
|E2E latency=3.86s |Gather latency=0.00s (0.00%) |Generate time=2.37s (61.33%) |Training time=1.30s (33.77%) |Others=0.19 (4.91%)|CurSamplesPerSec=8.29 |AvgSamplesPerSec=8.91
epoch: 0|step: 353|ppo_ep: 1|act_loss: 0.0216217041015625|cri_loss: 0.0163116455078125|unsuper_loss: 0.0
average reward score: 3.716796875
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.37s (68.19%) |Training time=0.91s (26.33%) |Others=0.19 (5.48%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.91
epoch: 0|step: 354|ppo_ep: 1|act_loss: 0.062042236328125|cri_loss: 0.018768310546875|unsuper_loss: 0.0
average reward score: 3.75390625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.18%) |Training time=0.89s (25.32%) |Others=0.19 (5.50%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.92
epoch: 0|step: 355|ppo_ep: 1|act_loss: 0.055755615234375|cri_loss: 0.035003662109375|unsuper_loss: 0.0
average reward score: 4.19140625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.40s (68.65%) |Training time=0.90s (25.75%) |Others=0.20 (5.59%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.92
epoch: 0|step: 356|ppo_ep: 1|act_loss: 0.0125885009765625|cri_loss: 0.00804901123046875|unsuper_loss: 0.0
average reward score: 4.29296875
-------------------------------------------------------------------------------------
|E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.52s (70.53%) |Training time=0.86s (24.14%) |Others=0.19 (5.33%)|CurSamplesPerSec=8.96 |AvgSamplesPerSec=8.92
epoch: 0|step: 357|ppo_ep: 1|act_loss: 0.0279693603515625|cri_loss: 0.01204681396484375|unsuper_loss: 0.0
average reward score: 3.71875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.30%) |Training time=0.99s (28.29%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.92
epoch: 0|step: 358|ppo_ep: 1|act_loss: -0.10028076171875|cri_loss: 0.0109710693359375|unsuper_loss: 0.0
average reward score: 4.3671875
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.34s (66.65%) |Training time=0.98s (27.87%) |Others=0.19 (5.48%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.92
[2023-06-30 05:53:01,574] [INFO] [logging.py:96:log_dist] [Rank 0] step=360, skipped=6, lr=[7.692562075619359e-06, 7.692562075619359e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:53:01,607] [INFO] [timer.py:215:stop] epoch=0/micro_step=360/global_step=360, RunningAvgSamplesPerSec=44.31599042137901, CurrSamplesPerSec=48.895329364178735, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:53:01,766] [INFO] [logging.py:96:log_dist] [Rank 0] step=360, skipped=6, lr=[3.985783458870134e-06, 3.985783458870134e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 359|ppo_ep: 1|act_loss: -0.039703369140625|cri_loss: 0.0128173828125|unsuper_loss: 0.0
average reward score: 3.97265625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.36%) |Training time=0.98s (28.22%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.92
epoch: 0|step: 360|ppo_ep: 1|act_loss: -0.016815185546875|cri_loss: 0.0153045654296875|unsuper_loss: 0.0
average reward score: 3.69921875
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.58%) |Training time=1.02s (28.93%) |Others=0.19 (5.49%)|CurSamplesPerSec=9.05 |AvgSamplesPerSec=8.92
epoch: 0|step: 361|ppo_ep: 1|act_loss: -0.059844970703125|cri_loss: 0.006988525390625|unsuper_loss: 0.0
average reward score: 4.01953125
-------------------------------------------------------------------------------------
|E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.56%) |Training time=1.01s (28.16%) |Others=0.19 (5.29%)|CurSamplesPerSec=8.92 |AvgSamplesPerSec=8.92
epoch: 0|step: 362|ppo_ep: 1|act_loss: 0.003192901611328125|cri_loss: 0.01190185546875|unsuper_loss: 0.0
average reward score: 4.1796875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.49%) |Training time=0.98s (28.11%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.92
epoch: 0|step: 363|ppo_ep: 1|act_loss: 0.01528167724609375|cri_loss: 0.01519775390625|unsuper_loss: 0.0
average reward score: 3.87890625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.36%) |Training time=0.99s (28.26%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.92
epoch: 0|step: 364|ppo_ep: 1|act_loss: 0.07867431640625|cri_loss: 0.046356201171875|unsuper_loss: 0.0
average reward score: 4.515625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.33%) |Training time=0.99s (28.27%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.92
epoch: 0|step: 365|ppo_ep: 1|act_loss: 0.0679931640625|cri_loss: 0.01065826416015625|unsuper_loss: 0.0
average reward score: 3.60546875
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.34s (66.29%) |Training time=1.00s (28.31%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.92
epoch: 0|step: 366|ppo_ep: 1|act_loss: 0.024383544921875|cri_loss: 0.0162200927734375|unsuper_loss: 0.0
average reward score: 3.802734375
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.35s (66.34%) |Training time=1.00s (28.20%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.04 |AvgSamplesPerSec=8.92
epoch: 0|step: 367|ppo_ep: 1|act_loss: -0.048431396484375|cri_loss: 0.0146636962890625|unsuper_loss: 0.0
average reward score: 4.15234375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.34s (66.74%) |Training time=0.98s (27.88%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.92
epoch: 0|step: 368|ppo_ep: 1|act_loss: 0.01148223876953125|cri_loss: 0.011199951171875|unsuper_loss: 0.0
average reward score: 4.08984375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.25%) |Training time=0.99s (28.36%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.92
[2023-06-30 05:53:36,761] [INFO] [logging.py:96:log_dist] [Rank 0] step=370, skipped=6, lr=[7.5479057131660736e-06, 7.5479057131660736e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:53:36,794] [INFO] [timer.py:215:stop] epoch=0/micro_step=370/global_step=370, RunningAvgSamplesPerSec=44.40915037710428, CurrSamplesPerSec=47.076116796483014, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:53:36,957] [INFO] [logging.py:96:log_dist] [Rank 0] step=370, skipped=6, lr=[3.910831975733717e-06, 3.910831975733717e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 369|ppo_ep: 1|act_loss: -0.0286407470703125|cri_loss: 0.01837158203125|unsuper_loss: 0.0
average reward score: 4.1171875
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.80%) |Training time=1.01s (28.73%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.92
epoch: 0|step: 370|ppo_ep: 1|act_loss: -0.0307464599609375|cri_loss: 0.01082611083984375|unsuper_loss: 0.0
average reward score: 4.328125
-------------------------------------------------------------------------------------
|E2E latency=3.87s |Gather latency=0.00s (0.00%) |Generate time=2.41s (62.14%) |Training time=1.27s (32.84%) |Others=0.19 (5.02%)|CurSamplesPerSec=8.26 |AvgSamplesPerSec=8.92
epoch: 0|step: 371|ppo_ep: 1|act_loss: 0.027557373046875|cri_loss: 0.0166168212890625|unsuper_loss: 0.0
average reward score: 3.96875
-------------------------------------------------------------------------------------
|E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=2.31s (57.69%) |Training time=1.50s (37.45%) |Others=0.19 (4.86%)|CurSamplesPerSec=8.00 |AvgSamplesPerSec=8.92
epoch: 0|step: 372|ppo_ep: 1|act_loss: 0.0004813671112060547|cri_loss: 0.0118408203125|unsuper_loss: 0.0
average reward score: 4.37890625
-------------------------------------------------------------------------------------
|E2E latency=4.01s |Gather latency=0.00s (0.00%) |Generate time=2.32s (57.78%) |Training time=1.49s (37.25%) |Others=0.20 (4.98%)|CurSamplesPerSec=7.98 |AvgSamplesPerSec=8.92
epoch: 0|step: 373|ppo_ep: 1|act_loss: 0.037322998046875|cri_loss: 0.009857177734375|unsuper_loss: 0.0
average reward score: 4.5390625
-------------------------------------------------------------------------------------
|E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=2.33s (58.28%) |Training time=1.47s (36.84%) |Others=0.20 (4.88%)|CurSamplesPerSec=8.00 |AvgSamplesPerSec=8.91
epoch: 0|step: 374|ppo_ep: 1|act_loss: 0.04791259765625|cri_loss: 0.0125885009765625|unsuper_loss: 0.0
average reward score: 4.2890625
-------------------------------------------------------------------------------------
|E2E latency=4.02s |Gather latency=0.00s (0.00%) |Generate time=2.32s (57.82%) |Training time=1.50s (37.30%) |Others=0.20 (4.88%)|CurSamplesPerSec=7.96 |AvgSamplesPerSec=8.91
epoch: 0|step: 375|ppo_ep: 1|act_loss: -0.0125579833984375|cri_loss: 0.007411956787109375|unsuper_loss: 0.0
average reward score: 3.94921875
-------------------------------------------------------------------------------------
|E2E latency=4.03s |Gather latency=0.00s (0.00%) |Generate time=2.36s (58.51%) |Training time=1.48s (36.63%) |Others=0.20 (4.86%)|CurSamplesPerSec=7.93 |AvgSamplesPerSec=8.91
epoch: 0|step: 376|ppo_ep: 1|act_loss: 0.0157318115234375|cri_loss: 0.00925445556640625|unsuper_loss: 0.0
average reward score: 3.525390625
-------------------------------------------------------------------------------------
|E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=2.32s (58.11%) |Training time=1.48s (37.00%) |Others=0.20 (4.90%)|CurSamplesPerSec=8.02 |AvgSamplesPerSec=8.90
epoch: 0|step: 377|ppo_ep: 1|act_loss: 0.0175628662109375|cri_loss: 0.0164031982421875|unsuper_loss: 0.0
average reward score: 3.849609375
-------------------------------------------------------------------------------------
|E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=2.31s (57.72%) |Training time=1.49s (37.35%) |Others=0.20 (4.93%)|CurSamplesPerSec=8.00 |AvgSamplesPerSec=8.90
epoch: 0|step: 378|ppo_ep: 1|act_loss: 0.0709228515625|cri_loss: 0.0173187255859375|unsuper_loss: 0.0
average reward score: 3.86328125
-------------------------------------------------------------------------------------
|E2E latency=4.06s |Gather latency=0.00s (0.00%) |Generate time=2.39s (58.85%) |Training time=1.48s (36.36%) |Others=0.19 (4.79%)|CurSamplesPerSec=7.89 |AvgSamplesPerSec=8.90
[2023-06-30 05:54:16,726] [INFO] [logging.py:96:log_dist] [Rank 0] step=380, skipped=6, lr=[7.399564943446002e-06, 7.399564943446002e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:54:16,759] [INFO] [timer.py:215:stop] epoch=0/micro_step=380/global_step=380, RunningAvgSamplesPerSec=43.74354431882172, CurrSamplesPerSec=27.770818793684896, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:54:16,920] [INFO] [logging.py:96:log_dist] [Rank 0] step=380, skipped=6, lr=[3.833971473288084e-06, 3.833971473288084e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 379|ppo_ep: 1|act_loss: -0.0738525390625|cri_loss: 0.00991058349609375|unsuper_loss: 0.0
average reward score: 4.34765625
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.31s (57.94%) |Training time=1.48s (37.17%) |Others=0.19 (4.89%)|CurSamplesPerSec=8.03 |AvgSamplesPerSec=8.90
epoch: 0|step: 380|ppo_ep: 1|act_loss: -0.01165771484375|cri_loss: 0.0087890625|unsuper_loss: 0.0
average reward score: 4.1953125
-------------------------------------------------------------------------------------
|E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=2.31s (57.85%) |Training time=1.49s (37.26%) |Others=0.20 (4.88%)|CurSamplesPerSec=8.01 |AvgSamplesPerSec=8.89
epoch: 0|step: 381|ppo_ep: 1|act_loss: -0.031890869140625|cri_loss: 0.01226043701171875|unsuper_loss: 0.0
average reward score: 3.8671875
-------------------------------------------------------------------------------------
|E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=2.32s (58.09%) |Training time=1.48s (37.02%) |Others=0.19 (4.89%)|CurSamplesPerSec=8.03 |AvgSamplesPerSec=8.89
epoch: 0|step: 382|ppo_ep: 1|act_loss: -0.08111572265625|cri_loss: 0.02081298828125|unsuper_loss: 0.0
average reward score: 4.55859375
-------------------------------------------------------------------------------------
|E2E latency=4.01s |Gather latency=0.00s (0.00%) |Generate time=2.33s (58.01%) |Training time=1.49s (37.14%) |Others=0.19 (4.85%)|CurSamplesPerSec=7.97 |AvgSamplesPerSec=8.89
epoch: 0|step: 383|ppo_ep: 1|act_loss: -0.032470703125|cri_loss: 0.00876617431640625|unsuper_loss: 0.0
average reward score: 4.0234375
-------------------------------------------------------------------------------------
|E2E latency=4.06s |Gather latency=0.00s (0.00%) |Generate time=2.50s (61.62%) |Training time=1.36s (33.57%) |Others=0.20 (4.81%)|CurSamplesPerSec=7.89 |AvgSamplesPerSec=8.89
epoch: 0|step: 384|ppo_ep: 1|act_loss: -0.0394287109375|cri_loss: 0.00745391845703125|unsuper_loss: 0.0
average reward score: 4.19140625
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.43s (61.15%) |Training time=1.35s (33.94%) |Others=0.20 (4.91%)|CurSamplesPerSec=8.04 |AvgSamplesPerSec=8.88
epoch: 0|step: 385|ppo_ep: 1|act_loss: -0.0003554821014404297|cri_loss: 0.0060882568359375|unsuper_loss: 0.0
average reward score: 4.0390625
-------------------------------------------------------------------------------------
|E2E latency=4.02s |Gather latency=0.00s (0.00%) |Generate time=2.44s (60.80%) |Training time=1.38s (34.29%) |Others=0.20 (4.91%)|CurSamplesPerSec=7.96 |AvgSamplesPerSec=8.88
epoch: 0|step: 386|ppo_ep: 1|act_loss: 7.253885269165039e-05|cri_loss: 0.0085601806640625|unsuper_loss: 0.0
average reward score: 3.71484375
-------------------------------------------------------------------------------------
|E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=2.45s (61.33%) |Training time=1.35s (33.80%) |Others=0.19 (4.87%)|CurSamplesPerSec=8.01 |AvgSamplesPerSec=8.88
epoch: 0|step: 387|ppo_ep: 1|act_loss: 0.0211181640625|cri_loss: 0.01025390625|unsuper_loss: 0.0
average reward score: 3.765625
-------------------------------------------------------------------------------------
|E2E latency=3.97s |Gather latency=0.00s (0.00%) |Generate time=2.42s (61.13%) |Training time=1.35s (33.95%) |Others=0.20 (4.93%)|CurSamplesPerSec=8.07 |AvgSamplesPerSec=8.88
epoch: 0|step: 388|ppo_ep: 1|act_loss: 0.0689697265625|cri_loss: 0.041351318359375|unsuper_loss: 0.0
average reward score: 4.0546875
-------------------------------------------------------------------------------------
|E2E latency=4.01s |Gather latency=0.00s (0.00%) |Generate time=2.45s (61.18%) |Training time=1.36s (33.95%) |Others=0.20 (4.87%)|CurSamplesPerSec=7.98 |AvgSamplesPerSec=8.87
[2023-06-30 05:54:56,726] [INFO] [logging.py:96:log_dist] [Rank 0] step=390, skipped=6, lr=[7.247740488727002e-06, 7.247740488727002e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:54:56,759] [INFO] [timer.py:215:stop] epoch=0/micro_step=390/global_step=390, RunningAvgSamplesPerSec=43.23402752701912, CurrSamplesPerSec=31.07263716514222, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:54:56,920] [INFO] [logging.py:96:log_dist] [Rank 0] step=390, skipped=6, lr=[3.7553059527082913e-06, 3.7553059527082913e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 389|ppo_ep: 1|act_loss: -0.029388427734375|cri_loss: 0.008270263671875|unsuper_loss: 0.0
average reward score: 4.171875
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.42s (60.94%) |Training time=1.36s (34.19%) |Others=0.19 (4.86%)|CurSamplesPerSec=8.04 |AvgSamplesPerSec=8.87
epoch: 0|step: 390|ppo_ep: 1|act_loss: -0.00017333030700683594|cri_loss: 0.00835418701171875|unsuper_loss: 0.0
average reward score: 4.26171875
-------------------------------------------------------------------------------------
|E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=2.45s (61.40%) |Training time=1.35s (33.73%) |Others=0.19 (4.86%)|CurSamplesPerSec=8.01 |AvgSamplesPerSec=8.87
epoch: 0|step: 391|ppo_ep: 1|act_loss: -0.044769287109375|cri_loss: 0.0067291259765625|unsuper_loss: 0.0
average reward score: 4.45703125
-------------------------------------------------------------------------------------
|E2E latency=4.01s |Gather latency=0.00s (0.00%) |Generate time=2.46s (61.43%) |Training time=1.35s (33.69%) |Others=0.20 (4.88%)|CurSamplesPerSec=7.98 |AvgSamplesPerSec=8.87
epoch: 0|step: 392|ppo_ep: 1|act_loss: -0.11419677734375|cri_loss: 0.0197906494140625|unsuper_loss: 0.0
average reward score: 4.40234375
-------------------------------------------------------------------------------------
|E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=2.45s (61.22%) |Training time=1.35s (33.91%) |Others=0.19 (4.88%)|CurSamplesPerSec=8.01 |AvgSamplesPerSec=8.86
epoch: 0|step: 393|ppo_ep: 1|act_loss: -0.05218505859375|cri_loss: 0.01096343994140625|unsuper_loss: 0.0
average reward score: 4.109375
-------------------------------------------------------------------------------------
|E2E latency=4.07s |Gather latency=0.00s (0.00%) |Generate time=2.44s (59.98%) |Training time=1.43s (35.19%) |Others=0.20 (4.84%)|CurSamplesPerSec=7.85 |AvgSamplesPerSec=8.86
epoch: 0|step: 394|ppo_ep: 1|act_loss: 0.0313720703125|cri_loss: 0.03961181640625|unsuper_loss: 0.0
average reward score: 4.078125
-------------------------------------------------------------------------------------
|E2E latency=4.01s |Gather latency=0.00s (0.00%) |Generate time=2.43s (60.72%) |Training time=1.38s (34.44%) |Others=0.19 (4.84%)|CurSamplesPerSec=7.99 |AvgSamplesPerSec=8.86
epoch: 0|step: 395|ppo_ep: 1|act_loss: -0.0180206298828125|cri_loss: 0.006103515625|unsuper_loss: 0.0
average reward score: 4.5859375
-------------------------------------------------------------------------------------
|E2E latency=3.89s |Gather latency=0.00s (0.00%) |Generate time=2.43s (62.42%) |Training time=1.27s (32.73%) |Others=0.19 (4.85%)|CurSamplesPerSec=8.22 |AvgSamplesPerSec=8.86
epoch: 0|step: 396|ppo_ep: 1|act_loss: 0.0273895263671875|cri_loss: 0.013580322265625|unsuper_loss: 0.0
average reward score: 4.0234375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.45s (70.01%) |Training time=0.86s (24.60%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.86
epoch: 0|step: 397|ppo_ep: 1|act_loss: 0.0784912109375|cri_loss: 0.028228759765625|unsuper_loss: 0.0
average reward score: 3.744140625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.92%) |Training time=0.86s (24.67%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.86
epoch: 0|step: 398|ppo_ep: 1|act_loss: 0.01139068603515625|cri_loss: 0.01259613037109375|unsuper_loss: 0.0
average reward score: 4.27734375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.46s (70.44%) |Training time=0.84s (24.15%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.86
[2023-06-30 05:55:34,690] [INFO] [logging.py:96:log_dist] [Rank 0] step=400, skipped=6, lr=[7.09263778510682e-06, 7.09263778510682e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:55:34,723] [INFO] [timer.py:215:stop] epoch=0/micro_step=400/global_step=400, RunningAvgSamplesPerSec=43.10742610651081, CurrSamplesPerSec=59.0679186941993, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:55:34,881] [INFO] [logging.py:96:log_dist] [Rank 0] step=400, skipped=6, lr=[3.6749418575683005e-06, 3.6749418575683005e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 399|ppo_ep: 1|act_loss: 0.002788543701171875|cri_loss: 0.0146942138671875|unsuper_loss: 0.0
average reward score: 4.18359375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.63%) |Training time=0.87s (24.98%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.86
epoch: 0|step: 400|ppo_ep: 1|act_loss: 0.0240478515625|cri_loss: 0.0180206298828125|unsuper_loss: 0.0
average reward score: 3.91015625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.47s (70.84%) |Training time=0.83s (23.71%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.86
epoch: 0|step: 401|ppo_ep: 1|act_loss: 0.07086181640625|cri_loss: 0.01593017578125|unsuper_loss: 0.0
average reward score: 3.88671875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.47s (70.60%) |Training time=0.84s (23.93%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.86
epoch: 0|step: 402|ppo_ep: 1|act_loss: -0.03582763671875|cri_loss: 0.00405120849609375|unsuper_loss: 0.0
average reward score: 4.05859375
-------------------------------------------------------------------------------------
|E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.53s (70.42%) |Training time=0.87s (24.15%) |Others=0.20 (5.44%)|CurSamplesPerSec=8.92 |AvgSamplesPerSec=8.86
epoch: 0|step: 403|ppo_ep: 1|act_loss: 0.0123748779296875|cri_loss: 0.01336669921875|unsuper_loss: 0.0
average reward score: 4.4296875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.50%) |Training time=0.88s (25.10%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.86
epoch: 0|step: 404|ppo_ep: 1|act_loss: -0.039581298828125|cri_loss: 0.018402099609375|unsuper_loss: 0.0
average reward score: 4.30078125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.58%) |Training time=0.87s (24.84%) |Others=0.19 (5.58%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.86
epoch: 0|step: 405|ppo_ep: 1|act_loss: -0.058319091796875|cri_loss: 0.0109710693359375|unsuper_loss: 0.0
average reward score: 3.9765625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.46%) |Training time=0.88s (25.13%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.86
epoch: 0|step: 406|ppo_ep: 1|act_loss: 0.0535888671875|cri_loss: 0.0196380615234375|unsuper_loss: 0.0
average reward score: 3.98828125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.78%) |Training time=0.87s (24.83%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.86
epoch: 0|step: 407|ppo_ep: 1|act_loss: -0.05902099609375|cri_loss: 0.0194091796875|unsuper_loss: 0.0
average reward score: 4.25
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.78%) |Training time=0.86s (24.81%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.86
epoch: 0|step: 408|ppo_ep: 1|act_loss: 0.038421630859375|cri_loss: 0.0214691162109375|unsuper_loss: 0.0
average reward score: 3.984375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.57%) |Training time=0.87s (24.96%) |Others=0.19 (5.48%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.86
[2023-06-30 05:56:09,703] [INFO] [logging.py:96:log_dist] [Rank 0] step=410, skipped=6, lr=[6.934466704534219e-06, 6.934466704534219e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:56:09,736] [INFO] [timer.py:215:stop] epoch=0/micro_step=410/global_step=410, RunningAvgSamplesPerSec=43.41071976076726, CurrSamplesPerSec=59.7546234709378, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:56:09,894] [INFO] [logging.py:96:log_dist] [Rank 0] step=410, skipped=6, lr=[3.5929879298104766e-06, 3.5929879298104766e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 409|ppo_ep: 1|act_loss: -0.05084228515625|cri_loss: 0.00968170166015625|unsuper_loss: 0.0
average reward score: 4.03515625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.77%) |Training time=0.87s (24.82%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.87
epoch: 0|step: 410|ppo_ep: 1|act_loss: 0.01177978515625|cri_loss: 0.0135498046875|unsuper_loss: 0.0
average reward score: 4.515625
-------------------------------------------------------------------------------------
|E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.45s (68.92%) |Training time=0.92s (25.70%) |Others=0.19 (5.38%)|CurSamplesPerSec=8.99 |AvgSamplesPerSec=8.87
epoch: 0|step: 411|ppo_ep: 1|act_loss: -0.025299072265625|cri_loss: 0.009765625|unsuper_loss: 0.0
average reward score: 4.23046875
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.53s (71.41%) |Training time=0.82s (23.10%) |Others=0.19 (5.49%)|CurSamplesPerSec=9.03 |AvgSamplesPerSec=8.87
epoch: 0|step: 412|ppo_ep: 1|act_loss: 0.004520416259765625|cri_loss: 0.0141143798828125|unsuper_loss: 0.0
average reward score: 3.84375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.54%) |Training time=0.87s (25.01%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.87
epoch: 0|step: 413|ppo_ep: 1|act_loss: -0.037506103515625|cri_loss: 0.008148193359375|unsuper_loss: 0.0
average reward score: 4.0625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.45%) |Training time=0.88s (25.13%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.87
epoch: 0|step: 414|ppo_ep: 1|act_loss: 0.0014238357543945312|cri_loss: 0.01406097412109375|unsuper_loss: 0.0
average reward score: 4.578125
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.19%) |Training time=0.89s (25.42%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.87
epoch: 0|step: 415|ppo_ep: 1|act_loss: -0.020233154296875|cri_loss: 0.0099029541015625|unsuper_loss: 0.0
average reward score: 4.1484375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.65%) |Training time=0.87s (24.89%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.87
epoch: 0|step: 416|ppo_ep: 1|act_loss: 0.023773193359375|cri_loss: 0.0111236572265625|unsuper_loss: 0.0
average reward score: 4.4609375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.42%) |Training time=0.88s (25.16%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.87
epoch: 0|step: 417|ppo_ep: 1|act_loss: 0.0760498046875|cri_loss: 0.02392578125|unsuper_loss: 0.0
average reward score: 4.28125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.47%) |Training time=0.88s (25.14%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.87
epoch: 0|step: 418|ppo_ep: 1|act_loss: -0.0180816650390625|cri_loss: 0.00921630859375|unsuper_loss: 0.0
average reward score: 5.0546875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.42%) |Training time=0.88s (25.16%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.87
[2023-06-30 05:56:44,793] [INFO] [logging.py:96:log_dist] [Rank 0] step=420, skipped=6, lr=[6.773441270827885e-06, 6.773441270827885e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:56:44,827] [INFO] [timer.py:215:stop] epoch=0/micro_step=420/global_step=420, RunningAvgSamplesPerSec=43.68384661075419, CurrSamplesPerSec=57.814391211305654, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:56:44,985] [INFO] [logging.py:96:log_dist] [Rank 0] step=420, skipped=6, lr=[3.5095550626051217e-06, 3.5095550626051217e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 419|ppo_ep: 1|act_loss: 0.024658203125|cri_loss: 0.0294647216796875|unsuper_loss: 0.0
average reward score: 3.9296875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.34%) |Training time=0.88s (25.27%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.87
epoch: 0|step: 420|ppo_ep: 1|act_loss: 0.1778564453125|cri_loss: 0.150146484375|unsuper_loss: 0.0
average reward score: 4.5703125
-------------------------------------------------------------------------------------
|E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.52s (69.13%) |Training time=0.93s (25.62%) |Others=0.19 (5.25%)|CurSamplesPerSec=8.79 |AvgSamplesPerSec=8.87
epoch: 0|step: 421|ppo_ep: 1|act_loss: -0.009429931640625|cri_loss: 0.0142974853515625|unsuper_loss: 0.0
average reward score: 4.18359375
-------------------------------------------------------------------------------------
|E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.47s (69.32%) |Training time=0.90s (25.38%) |Others=0.19 (5.30%)|CurSamplesPerSec=9.00 |AvgSamplesPerSec=8.87
epoch: 0|step: 422|ppo_ep: 1|act_loss: -0.025848388671875|cri_loss: 0.01366424560546875|unsuper_loss: 0.0
average reward score: 4.4765625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.45s (69.59%) |Training time=0.88s (25.03%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.87
epoch: 0|step: 423|ppo_ep: 1|act_loss: 4.792213439941406e-05|cri_loss: 0.01058197021484375|unsuper_loss: 0.0
average reward score: 4.3046875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.34%) |Training time=0.88s (25.20%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.87
epoch: 0|step: 424|ppo_ep: 1|act_loss: -0.0090789794921875|cri_loss: 0.01104736328125|unsuper_loss: 0.0
average reward score: 4.4296875
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.47s (69.85%) |Training time=0.88s (24.81%) |Others=0.19 (5.34%)|CurSamplesPerSec=9.03 |AvgSamplesPerSec=8.87
epoch: 0|step: 425|ppo_ep: 1|act_loss: -0.004344940185546875|cri_loss: 0.01605224609375|unsuper_loss: 0.0
average reward score: 4.01171875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.20%) |Training time=0.89s (25.41%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.87
epoch: 0|step: 426|ppo_ep: 1|act_loss: -0.06842041015625|cri_loss: 0.0250244140625|unsuper_loss: 0.0
average reward score: 3.986328125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.30%) |Training time=0.89s (25.33%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.87
epoch: 0|step: 427|ppo_ep: 1|act_loss: -0.0002505779266357422|cri_loss: 0.00992584228515625|unsuper_loss: 0.0
average reward score: 4.18359375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.34%) |Training time=0.88s (25.26%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.88
epoch: 0|step: 428|ppo_ep: 1|act_loss: -0.0253448486328125|cri_loss: 0.01192474365234375|unsuper_loss: 0.0
average reward score: 4.44140625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.54%) |Training time=0.87s (25.07%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.88
[2023-06-30 05:57:20,061] [INFO] [logging.py:96:log_dist] [Rank 0] step=430, skipped=6, lr=[6.60977937007738e-06, 6.60977937007738e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:57:20,094] [INFO] [timer.py:215:stop] epoch=0/micro_step=430/global_step=430, RunningAvgSamplesPerSec=43.92255334602433, CurrSamplesPerSec=54.53575480192565, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:57:20,253] [INFO] [logging.py:96:log_dist] [Rank 0] step=430, skipped=6, lr=[3.4247561502991604e-06, 3.4247561502991604e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 429|ppo_ep: 1|act_loss: 0.04827880859375|cri_loss: 0.03143310546875|unsuper_loss: 0.0
average reward score: 3.888671875
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.43s (68.70%) |Training time=0.92s (25.94%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.03 |AvgSamplesPerSec=8.88
epoch: 0|step: 430|ppo_ep: 1|act_loss: -0.047882080078125|cri_loss: 0.039764404296875|unsuper_loss: 0.0
average reward score: 3.986328125
-------------------------------------------------------------------------------------
|E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.45s (68.22%) |Training time=0.95s (26.43%) |Others=0.19 (5.35%)|CurSamplesPerSec=8.90 |AvgSamplesPerSec=8.88
epoch: 0|step: 431|ppo_ep: 1|act_loss: -0.0034942626953125|cri_loss: 0.01494598388671875|unsuper_loss: 0.0
average reward score: 4.76171875
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.44s (68.97%) |Training time=0.91s (25.68%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.05 |AvgSamplesPerSec=8.88
epoch: 0|step: 432|ppo_ep: 1|act_loss: -0.020416259765625|cri_loss: 0.0108642578125|unsuper_loss: 0.0
average reward score: 4.32421875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.06%) |Training time=0.90s (25.57%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.88
epoch: 0|step: 433|ppo_ep: 1|act_loss: 0.05987548828125|cri_loss: 0.017578125|unsuper_loss: 0.0
average reward score: 4.2578125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.20%) |Training time=0.89s (25.42%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.88
epoch: 0|step: 434|ppo_ep: 1|act_loss: -0.005115509033203125|cri_loss: 0.007236480712890625|unsuper_loss: 0.0
average reward score: 4.16796875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.47%) |Training time=0.88s (25.14%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.88
epoch: 0|step: 435|ppo_ep: 1|act_loss: 0.056060791015625|cri_loss: 0.02459716796875|unsuper_loss: 0.0
average reward score: 4.4296875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.49%) |Training time=0.88s (25.11%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.88
epoch: 0|step: 436|ppo_ep: 1|act_loss: 0.04522705078125|cri_loss: 0.01922607421875|unsuper_loss: 0.0
average reward score: 4.015625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.45%) |Training time=0.88s (25.16%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.88
epoch: 0|step: 437|ppo_ep: 1|act_loss: 0.032867431640625|cri_loss: 0.01165008544921875|unsuper_loss: 0.0
average reward score: 4.609375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.36%) |Training time=0.88s (25.27%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.88
epoch: 0|step: 438|ppo_ep: 1|act_loss: 0.0141754150390625|cri_loss: 0.0321044921875|unsuper_loss: 0.0
average reward score: 4.34765625
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.43s (68.74%) |Training time=0.91s (25.81%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.06 |AvgSamplesPerSec=8.88
[2023-06-30 05:57:55,216] [INFO] [logging.py:96:log_dist] [Rank 0] step=440, skipped=6, lr=[6.443702455817986e-06, 6.443702455817986e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:57:55,249] [INFO] [timer.py:215:stop] epoch=0/micro_step=440/global_step=440, RunningAvgSamplesPerSec=44.14942741671454, CurrSamplesPerSec=58.41743099101091, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:57:55,408] [INFO] [logging.py:96:log_dist] [Rank 0] step=440, skipped=6, lr=[3.3387059356569875e-06, 3.3387059356569875e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 439|ppo_ep: 1|act_loss: 0.00982666015625|cri_loss: 0.0374755859375|unsuper_loss: 0.0
average reward score: 4.19140625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.58%) |Training time=0.88s (25.03%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.88
epoch: 0|step: 440|ppo_ep: 1|act_loss: 0.00434112548828125|cri_loss: 0.01287841796875|unsuper_loss: 0.0
average reward score: 4.57421875
-------------------------------------------------------------------------------------
|E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.50s (70.10%) |Training time=0.88s (24.60%) |Others=0.19 (5.30%)|CurSamplesPerSec=8.97 |AvgSamplesPerSec=8.88
epoch: 0|step: 441|ppo_ep: 1|act_loss: 0.006542205810546875|cri_loss: 0.0172882080078125|unsuper_loss: 0.0
average reward score: 4.05078125
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.43s (70.05%) |Training time=0.85s (24.49%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.88
epoch: 0|step: 442|ppo_ep: 1|act_loss: -0.052947998046875|cri_loss: 0.016998291015625|unsuper_loss: 0.0
average reward score: 4.5
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.42s (70.24%) |Training time=0.84s (24.30%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.28 |AvgSamplesPerSec=8.88
epoch: 0|step: 443|ppo_ep: 1|act_loss: 0.0955810546875|cri_loss: 0.0443115234375|unsuper_loss: 0.0
average reward score: 4.921875
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.42s (70.22%) |Training time=0.84s (24.32%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.29 |AvgSamplesPerSec=8.88
epoch: 0|step: 444|ppo_ep: 1|act_loss: -0.016876220703125|cri_loss: 0.0155181884765625|unsuper_loss: 0.0
average reward score: 4.48046875
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.42s (70.31%) |Training time=0.83s (24.21%) |Others=0.19 (5.48%)|CurSamplesPerSec=9.29 |AvgSamplesPerSec=8.88
epoch: 0|step: 445|ppo_ep: 1|act_loss: -0.0084381103515625|cri_loss: 0.030426025390625|unsuper_loss: 0.0
average reward score: 3.998046875
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.42s (70.25%) |Training time=0.84s (24.29%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.29 |AvgSamplesPerSec=8.89
epoch: 0|step: 446|ppo_ep: 1|act_loss: -0.01557159423828125|cri_loss: 0.00982666015625|unsuper_loss: 0.0
average reward score: 4.37890625
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.43s (70.34%) |Training time=0.83s (24.15%) |Others=0.19 (5.51%)|CurSamplesPerSec=9.28 |AvgSamplesPerSec=8.89
epoch: 0|step: 447|ppo_ep: 1|act_loss: 0.0034580230712890625|cri_loss: 0.019805908203125|unsuper_loss: 0.0
average reward score: 4.4296875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.53%) |Training time=0.87s (25.03%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.89
epoch: 0|step: 448|ppo_ep: 1|act_loss: -0.01454925537109375|cri_loss: 0.032958984375|unsuper_loss: 0.0
average reward score: 4.515625
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.48s (70.34%) |Training time=0.85s (24.22%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.89
[2023-06-30 05:58:29,962] [INFO] [logging.py:96:log_dist] [Rank 0] step=450, skipped=6, lr=[6.275435249378385e-06, 6.275435249378385e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:58:29,996] [INFO] [timer.py:215:stop] epoch=0/micro_step=450/global_step=450, RunningAvgSamplesPerSec=44.42989522363758, CurrSamplesPerSec=58.49179436269361, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:58:30,154] [INFO] [logging.py:96:log_dist] [Rank 0] step=450, skipped=6, lr=[3.2515208546001997e-06, 3.2515208546001997e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 449|ppo_ep: 1|act_loss: -0.036407470703125|cri_loss: 0.0286407470703125|unsuper_loss: 0.0
average reward score: 4.1484375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.36%) |Training time=0.88s (25.22%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.89
epoch: 0|step: 450|ppo_ep: 1|act_loss: -0.0007543563842773438|cri_loss: 0.023773193359375|unsuper_loss: 0.0
average reward score: 4.140625
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.12%) |Training time=0.89s (25.38%) |Others=0.19 (5.50%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.89
[2023-06-30 05:58:36,910] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 451|ppo_ep: 1|act_loss: 0.0071258544921875|cri_loss: 0.038299560546875|unsuper_loss: 0.0
average reward score: 3.869140625
-------------------------------------------------------------------------------------
|E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.42s (71.22%) |Training time=0.79s (23.24%) |Others=0.19 (5.54%)|CurSamplesPerSec=9.42 |AvgSamplesPerSec=8.89
epoch: 0|step: 452|ppo_ep: 1|act_loss: 0.035369873046875|cri_loss: 0.04986572265625|unsuper_loss: 0.0
average reward score: 4.26171875
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.42s (70.02%) |Training time=0.85s (24.53%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.27 |AvgSamplesPerSec=8.89
epoch: 0|step: 453|ppo_ep: 1|act_loss: -0.0048065185546875|cri_loss: 0.0400390625|unsuper_loss: 0.0
average reward score: 4.125
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.42s (70.13%) |Training time=0.84s (24.42%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.27 |AvgSamplesPerSec=8.89
epoch: 0|step: 454|ppo_ep: 1|act_loss: -0.0203857421875|cri_loss: 0.044586181640625|unsuper_loss: 0.0
average reward score: 4.3359375
-------------------------------------------------------------------------------------
|E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.42s (70.23%) |Training time=0.84s (24.31%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.30 |AvgSamplesPerSec=8.89
[2023-06-30 05:58:50,662] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384
epoch: 0|step: 455|ppo_ep: 1|act_loss: -0.040252685546875|cri_loss: 0.03717041015625|unsuper_loss: 0.0
average reward score: 3.958984375
-------------------------------------------------------------------------------------
|E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.42s (70.94%) |Training time=0.80s (23.49%) |Others=0.19 (5.57%)|CurSamplesPerSec=9.39 |AvgSamplesPerSec=8.89
[2023-06-30 05:58:54,103] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192
epoch: 0|step: 456|ppo_ep: 1|act_loss: 0.0002772808074951172|cri_loss: 0.1002197265625|unsuper_loss: 0.0
average reward score: 3.5546875
-------------------------------------------------------------------------------------
|E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.42s (70.26%) |Training time=0.83s (24.21%) |Others=0.19 (5.53%)|CurSamplesPerSec=9.30 |AvgSamplesPerSec=8.89
epoch: 0|step: 457|ppo_ep: 1|act_loss: 0.01239776611328125|cri_loss: 0.060791015625|unsuper_loss: 0.0
average reward score: 4.390625
-------------------------------------------------------------------------------------
|E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.49s (69.78%) |Training time=0.89s (24.92%) |Others=0.19 (5.30%)|CurSamplesPerSec=8.97 |AvgSamplesPerSec=8.89
[2023-06-30 05:59:01,272] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 458|ppo_ep: 1|act_loss: -0.01523590087890625|cri_loss: 0.05206298828125|unsuper_loss: 0.0
average reward score: 3.83203125
-------------------------------------------------------------------------------------
|E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.42s (70.49%) |Training time=0.84s (24.37%) |Others=0.18 (5.14%)|CurSamplesPerSec=9.30 |AvgSamplesPerSec=8.90
[2023-06-30 05:59:04,566] [INFO] [logging.py:96:log_dist] [Rank 0] step=460, skipped=9, lr=[6.1564667964686156e-06, 6.1564667964686156e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:59:04,599] [INFO] [timer.py:215:stop] epoch=0/micro_step=460/global_step=460, RunningAvgSamplesPerSec=44.70167575575559, CurrSamplesPerSec=53.60321289246587, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:59:04,758] [INFO] [logging.py:96:log_dist] [Rank 0] step=460, skipped=7, lr=[3.1721814451696215e-06, 3.1721814451696215e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 459|ppo_ep: 1|act_loss: -0.10467529296875|cri_loss: 0.0799560546875|unsuper_loss: 0.0
average reward score: 4.3828125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.37s (68.03%) |Training time=0.93s (26.57%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.90
epoch: 0|step: 460|ppo_ep: 1|act_loss: 0.0093536376953125|cri_loss: 0.064453125|unsuper_loss: 0.0
average reward score: 3.515625
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.34s (67.31%) |Training time=0.95s (27.25%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.90
[2023-06-30 05:59:11,679] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384
epoch: 0|step: 461|ppo_ep: 1|act_loss: 0.09210205078125|cri_loss: 0.08001708984375|unsuper_loss: 0.0
average reward score: 3.72265625
-------------------------------------------------------------------------------------
|E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.31s (67.20%) |Training time=0.95s (27.69%) |Others=0.18 (5.11%)|CurSamplesPerSec=9.29 |AvgSamplesPerSec=8.90
epoch: 0|step: 462|ppo_ep: 1|act_loss: -0.012054443359375|cri_loss: 0.037628173828125|unsuper_loss: 0.0
average reward score: 4.1171875
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.64%) |Training time=1.02s (28.95%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=8.90
epoch: 0|step: 463|ppo_ep: 1|act_loss: 0.0285186767578125|cri_loss: 0.09063720703125|unsuper_loss: 0.0
average reward score: 4.02734375
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.32s (67.01%) |Training time=0.95s (27.57%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.26 |AvgSamplesPerSec=8.90
epoch: 0|step: 464|ppo_ep: 1|act_loss: 0.1329345703125|cri_loss: 0.0718994140625|unsuper_loss: 0.0
average reward score: 3.232421875
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.94%) |Training time=0.95s (27.62%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.27 |AvgSamplesPerSec=8.90
epoch: 0|step: 465|ppo_ep: 1|act_loss: 0.037139892578125|cri_loss: 0.064697265625|unsuper_loss: 0.0
average reward score: 3.453125
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.33%) |Training time=0.98s (28.27%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.90
epoch: 0|step: 466|ppo_ep: 1|act_loss: -0.04022216796875|cri_loss: 0.08026123046875|unsuper_loss: 0.0
average reward score: 3.998046875
-------------------------------------------------------------------------------------
|E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.18%) |Training time=1.03s (28.61%) |Others=0.19 (5.21%)|CurSamplesPerSec=8.85 |AvgSamplesPerSec=8.90
epoch: 0|step: 467|ppo_ep: 1|act_loss: -0.033203125|cri_loss: 0.074462890625|unsuper_loss: 0.0
average reward score: 3.373046875
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.68%) |Training time=0.97s (27.90%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.23 |AvgSamplesPerSec=8.90
epoch: 0|step: 468|ppo_ep: 1|act_loss: 0.0007905960083007812|cri_loss: 0.04815673828125|unsuper_loss: 0.0
average reward score: 3.546875
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.39s (69.03%) |Training time=0.88s (25.53%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.90
[2023-06-30 05:59:39,454] [INFO] [logging.py:96:log_dist] [Rank 0] step=470, skipped=9, lr=[5.9850000650835e-06, 5.9850000650835e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 05:59:39,488] [INFO] [timer.py:215:stop] epoch=0/micro_step=470/global_step=470, RunningAvgSamplesPerSec=44.81149791960884, CurrSamplesPerSec=52.02995785432296, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 05:59:39,646] [INFO] [logging.py:96:log_dist] [Rank 0] step=470, skipped=8, lr=[3.0921052929875482e-06, 3.0921052929875482e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 469|ppo_ep: 1|act_loss: -0.116455078125|cri_loss: 0.064697265625|unsuper_loss: 0.0
average reward score: 3.84375
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.38s (67.71%) |Training time=0.95s (26.93%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=8.90
epoch: 0|step: 470|ppo_ep: 1|act_loss: 0.09613037109375|cri_loss: 0.07244873046875|unsuper_loss: 0.0
average reward score: 3.8984375
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.39s (69.01%) |Training time=0.88s (25.51%) |Others=0.19 (5.48%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.90
epoch: 0|step: 471|ppo_ep: 1|act_loss: -0.03240966796875|cri_loss: 0.04156494140625|unsuper_loss: 0.0
average reward score: 3.84375
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.85%) |Training time=0.89s (25.67%) |Others=0.19 (5.48%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.90
epoch: 0|step: 472|ppo_ep: 1|act_loss: 0.06707763671875|cri_loss: 0.0452880859375|unsuper_loss: 0.0
average reward score: 4.1875
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.99%) |Training time=0.88s (25.54%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.90
epoch: 0|step: 473|ppo_ep: 1|act_loss: -0.005523681640625|cri_loss: 0.05377197265625|unsuper_loss: 0.0
average reward score: 4.20703125
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.37s (68.70%) |Training time=0.89s (25.87%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.27 |AvgSamplesPerSec=8.90
epoch: 0|step: 474|ppo_ep: 1|act_loss: 0.0450439453125|cri_loss: 0.02313232421875|unsuper_loss: 0.0
average reward score: 3.908203125
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.55%) |Training time=0.91s (26.04%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.90
[2023-06-30 06:00:00,364] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096
epoch: 0|step: 475|ppo_ep: 1|act_loss: -0.0164794921875|cri_loss: 0.06781005859375|unsuper_loss: 0.0
average reward score: 3.83984375
-------------------------------------------------------------------------------------
|E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.49s (69.77%) |Training time=0.89s (24.93%) |Others=0.19 (5.30%)|CurSamplesPerSec=8.98 |AvgSamplesPerSec=8.90
epoch: 0|step: 476|ppo_ep: 1|act_loss: 0.0309295654296875|cri_loss: 0.037078857421875|unsuper_loss: 0.0
average reward score: 3.775390625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.40s (68.64%) |Training time=0.91s (25.95%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.91
epoch: 0|step: 477|ppo_ep: 1|act_loss: -0.04290771484375|cri_loss: 0.04345703125|unsuper_loss: 0.0
average reward score: 3.828125
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.28s (66.08%) |Training time=0.98s (28.48%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.27 |AvgSamplesPerSec=8.91
[2023-06-30 06:00:10,725] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048
epoch: 0|step: 478|ppo_ep: 1|act_loss: 0.037750244140625|cri_loss: 0.0260009765625|unsuper_loss: 0.0
average reward score: 3.5859375
-------------------------------------------------------------------------------------
|E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.29s (67.12%) |Training time=0.93s (27.36%) |Others=0.19 (5.52%)|CurSamplesPerSec=9.38 |AvgSamplesPerSec=8.91
[2023-06-30 06:00:14,224] [INFO] [logging.py:96:log_dist] [Rank 0] step=480, skipped=11, lr=[5.846685346835875e-06, 5.846685346835875e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:00:14,258] [INFO] [timer.py:215:stop] epoch=0/micro_step=480/global_step=480, RunningAvgSamplesPerSec=44.97652201688404, CurrSamplesPerSec=46.17494483476576, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:00:14,415] [INFO] [logging.py:96:log_dist] [Rank 0] step=480, skipped=8, lr=[3.002374483561064e-06, 3.002374483561064e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 479|ppo_ep: 1|act_loss: 0.0008816719055175781|cri_loss: 0.05181884765625|unsuper_loss: 0.0
average reward score: 3.87109375
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.73%) |Training time=1.02s (28.92%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.06 |AvgSamplesPerSec=8.91
epoch: 0|step: 480|ppo_ep: 1|act_loss: -0.037322998046875|cri_loss: 0.026641845703125|unsuper_loss: 0.0
average reward score: 3.548828125
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.65%) |Training time=1.01s (28.94%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.91
epoch: 0|step: 481|ppo_ep: 1|act_loss: 0.040283203125|cri_loss: 0.014251708984375|unsuper_loss: 0.0
average reward score: 4.08984375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.53%) |Training time=1.01s (29.05%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.91
epoch: 0|step: 482|ppo_ep: 1|act_loss: 0.007183074951171875|cri_loss: 0.0309600830078125|unsuper_loss: 0.0
average reward score: 4.21875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.55%) |Training time=1.01s (28.93%) |Others=0.19 (5.52%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.91
epoch: 0|step: 483|ppo_ep: 1|act_loss: 0.005859375|cri_loss: 0.0101470947265625|unsuper_loss: 0.0
average reward score: 4.03515625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.19%) |Training time=1.02s (29.35%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.91
epoch: 0|step: 484|ppo_ep: 1|act_loss: 0.03564453125|cri_loss: 0.01247406005859375|unsuper_loss: 0.0
average reward score: 4.08984375
-------------------------------------------------------------------------------------
|E2E latency=3.80s |Gather latency=0.00s (0.00%) |Generate time=2.33s (61.33%) |Training time=1.28s (33.70%) |Others=0.19 (4.97%)|CurSamplesPerSec=8.42 |AvgSamplesPerSec=8.91
epoch: 0|step: 485|ppo_ep: 1|act_loss: 0.0328369140625|cri_loss: 0.0159759521484375|unsuper_loss: 0.0
average reward score: 4.21875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.94%) |Training time=1.00s (28.60%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.91
epoch: 0|step: 486|ppo_ep: 1|act_loss: -0.0166473388671875|cri_loss: 0.0194091796875|unsuper_loss: 0.0
average reward score: 3.927734375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.49%) |Training time=1.01s (29.08%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.91
epoch: 0|step: 487|ppo_ep: 1|act_loss: 0.004970550537109375|cri_loss: 0.030609130859375|unsuper_loss: 0.0
average reward score: 3.5546875
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.34s (66.24%) |Training time=1.00s (28.41%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.91
epoch: 0|step: 488|ppo_ep: 1|act_loss: -0.09661865234375|cri_loss: 0.01434326171875|unsuper_loss: 0.0
average reward score: 4.390625
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.30s (64.97%) |Training time=1.04s (29.30%) |Others=0.20 (5.74%)|CurSamplesPerSec=9.05 |AvgSamplesPerSec=8.91
[2023-06-30 06:00:49,443] [INFO] [logging.py:96:log_dist] [Rank 0] step=490, skipped=11, lr=[5.672561797315879e-06, 5.672561797315879e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:00:49,476] [INFO] [timer.py:215:stop] epoch=0/micro_step=490/global_step=490, RunningAvgSamplesPerSec=44.985983936183395, CurrSamplesPerSec=50.78186093585948, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:00:49,640] [INFO] [logging.py:96:log_dist] [Rank 0] step=490, skipped=8, lr=[2.911963903186606e-06, 2.911963903186606e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 489|ppo_ep: 1|act_loss: -0.06689453125|cri_loss: 0.028717041015625|unsuper_loss: 0.0
average reward score: 3.599609375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.33s (66.86%) |Training time=0.96s (27.52%) |Others=0.20 (5.62%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.91
epoch: 0|step: 490|ppo_ep: 1|act_loss: -0.08734130859375|cri_loss: 0.0191802978515625|unsuper_loss: 0.0
average reward score: 4.24609375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.76%) |Training time=1.01s (28.81%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.91
epoch: 0|step: 491|ppo_ep: 1|act_loss: -0.06390380859375|cri_loss: 0.025848388671875|unsuper_loss: 0.0
average reward score: 3.83203125
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.78%) |Training time=1.00s (28.81%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.91
epoch: 0|step: 492|ppo_ep: 1|act_loss: 0.029022216796875|cri_loss: 0.007717132568359375|unsuper_loss: 0.0
average reward score: 4.0859375
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.19%) |Training time=1.03s (29.41%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.91
epoch: 0|step: 493|ppo_ep: 1|act_loss: 0.060302734375|cri_loss: 0.0242919921875|unsuper_loss: 0.0
average reward score: 3.56640625
-------------------------------------------------------------------------------------
|E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.25%) |Training time=1.02s (28.40%) |Others=0.19 (5.35%)|CurSamplesPerSec=8.87 |AvgSamplesPerSec=8.91
epoch: 0|step: 494|ppo_ep: 1|act_loss: 0.07757568359375|cri_loss: 0.0230255126953125|unsuper_loss: 0.0
average reward score: 3.923828125
-------------------------------------------------------------------------------------
|E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.36s (66.50%) |Training time=1.00s (28.16%) |Others=0.19 (5.33%)|CurSamplesPerSec=9.01 |AvgSamplesPerSec=8.91
epoch: 0|step: 495|ppo_ep: 1|act_loss: 0.009613037109375|cri_loss: 0.01509857177734375|unsuper_loss: 0.0
average reward score: 4.48046875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.60%) |Training time=1.01s (29.00%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.91
epoch: 0|step: 496|ppo_ep: 1|act_loss: -0.0066375732421875|cri_loss: 0.0117950439453125|unsuper_loss: 0.0
average reward score: 4.3203125
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.68%) |Training time=1.00s (28.84%) |Others=0.19 (5.48%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.91
epoch: 0|step: 497|ppo_ep: 1|act_loss: -0.0001475811004638672|cri_loss: 0.006114959716796875|unsuper_loss: 0.0
average reward score: 3.880859375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.76%) |Training time=1.00s (28.82%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.91
epoch: 0|step: 498|ppo_ep: 1|act_loss: 0.07708740234375|cri_loss: 0.091064453125|unsuper_loss: 0.0
average reward score: 3.888671875
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.33s (65.84%) |Training time=1.02s (28.82%) |Others=0.19 (5.34%)|CurSamplesPerSec=9.04 |AvgSamplesPerSec=8.91
[2023-06-30 06:01:24,544] [INFO] [logging.py:96:log_dist] [Rank 0] step=500, skipped=11, lr=[5.49729139837323e-06, 5.49729139837323e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:01:24,577] [INFO] [timer.py:215:stop] epoch=0/micro_step=500/global_step=500, RunningAvgSamplesPerSec=45.02319946529271, CurrSamplesPerSec=46.949155044321216, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:01:24,735] [INFO] [logging.py:96:log_dist] [Rank 0] step=500, skipped=8, lr=[2.820995887866378e-06, 2.820995887866378e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 499|ppo_ep: 1|act_loss: -0.0158843994140625|cri_loss: 0.014434814453125|unsuper_loss: 0.0
average reward score: 4.5234375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.63%) |Training time=1.01s (28.97%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.92
epoch: 0|step: 500|ppo_ep: 1|act_loss: 0.053863525390625|cri_loss: 0.03179931640625|unsuper_loss: 0.0
average reward score: 4.37890625
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.82%) |Training time=1.00s (28.79%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.92
epoch: 0|step: 501|ppo_ep: 1|act_loss: -0.01910400390625|cri_loss: 0.01018524169921875|unsuper_loss: 0.0
average reward score: 3.875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.99%) |Training time=1.00s (28.59%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.92
epoch: 0|step: 502|ppo_ep: 1|act_loss: -0.060302734375|cri_loss: 0.01105499267578125|unsuper_loss: 0.0
average reward score: 4.1171875
-------------------------------------------------------------------------------------
|E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.48%) |Training time=1.02s (28.28%) |Others=0.19 (5.24%)|CurSamplesPerSec=8.85 |AvgSamplesPerSec=8.92
epoch: 0|step: 503|ppo_ep: 1|act_loss: 0.041717529296875|cri_loss: 0.022857666015625|unsuper_loss: 0.0
average reward score: 4.265625
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.65%) |Training time=1.00s (28.83%) |Others=0.19 (5.52%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.92
epoch: 0|step: 504|ppo_ep: 1|act_loss: -0.0159759521484375|cri_loss: 0.00789642333984375|unsuper_loss: 0.0
average reward score: 4.23046875
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.71%) |Training time=1.00s (28.87%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.92
epoch: 0|step: 505|ppo_ep: 1|act_loss: 0.049560546875|cri_loss: 0.01284027099609375|unsuper_loss: 0.0
average reward score: 4.0625
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.30s (66.38%) |Training time=0.98s (28.19%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.92
epoch: 0|step: 506|ppo_ep: 1|act_loss: 0.0167388916015625|cri_loss: 0.0284576416015625|unsuper_loss: 0.0
average reward score: 3.41796875
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.79%) |Training time=1.00s (28.80%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.23 |AvgSamplesPerSec=8.92
epoch: 0|step: 507|ppo_ep: 1|act_loss: 0.061492919921875|cri_loss: 0.01910400390625|unsuper_loss: 0.0
average reward score: 3.90234375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.61%) |Training time=1.01s (28.98%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.92
epoch: 0|step: 508|ppo_ep: 1|act_loss: -0.0189971923828125|cri_loss: 0.00949859619140625|unsuper_loss: 0.0
average reward score: 3.9296875
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.08%) |Training time=1.00s (28.53%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.92
[2023-06-30 06:01:59,488] [INFO] [logging.py:96:log_dist] [Rank 0] step=510, skipped=11, lr=[5.321111311187764e-06, 5.321111311187764e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:01:59,521] [INFO] [timer.py:215:stop] epoch=0/micro_step=510/global_step=510, RunningAvgSamplesPerSec=45.07054336632895, CurrSamplesPerSec=47.73453713750817, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:01:59,678] [INFO] [logging.py:96:log_dist] [Rank 0] step=510, skipped=8, lr=[2.729593527876723e-06, 2.729593527876723e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 509|ppo_ep: 1|act_loss: 0.0293731689453125|cri_loss: 0.010467529296875|unsuper_loss: 0.0
average reward score: 4.2578125
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.87%) |Training time=1.00s (28.72%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.92
epoch: 0|step: 510|ppo_ep: 1|act_loss: -0.077880859375|cri_loss: 0.01386260986328125|unsuper_loss: 0.0
average reward score: 4.2109375
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.28s (64.91%) |Training time=1.05s (29.73%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=8.92
epoch: 0|step: 511|ppo_ep: 1|act_loss: -0.0237884521484375|cri_loss: 0.00981903076171875|unsuper_loss: 0.0
average reward score: 4.08203125
-------------------------------------------------------------------------------------
|E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.42s (65.50%) |Training time=1.09s (29.36%) |Others=0.19 (5.13%)|CurSamplesPerSec=8.66 |AvgSamplesPerSec=8.92
epoch: 0|step: 512|ppo_ep: 1|act_loss: -0.033843994140625|cri_loss: 0.0156707763671875|unsuper_loss: 0.0
average reward score: 4.5625
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.28s (66.25%) |Training time=0.98s (28.28%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.28 |AvgSamplesPerSec=8.92
epoch: 0|step: 513|ppo_ep: 1|act_loss: -0.01421356201171875|cri_loss: 0.019287109375|unsuper_loss: 0.0
average reward score: 4.10546875
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.81%) |Training time=0.99s (28.77%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.92
epoch: 0|step: 514|ppo_ep: 1|act_loss: 0.03204345703125|cri_loss: 0.0089111328125|unsuper_loss: 0.0
average reward score: 4.32421875
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.28s (66.08%) |Training time=0.98s (28.50%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.27 |AvgSamplesPerSec=8.92
epoch: 0|step: 515|ppo_ep: 1|act_loss: 0.051300048828125|cri_loss: 0.00896453857421875|unsuper_loss: 0.0
average reward score: 4.03125
-------------------------------------------------------------------------------------
|E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.28s (66.09%) |Training time=0.98s (28.47%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.29 |AvgSamplesPerSec=8.92
epoch: 0|step: 516|ppo_ep: 1|act_loss: 0.035797119140625|cri_loss: 0.04547119140625|unsuper_loss: 0.0
average reward score: 4.1171875
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.84%) |Training time=1.00s (28.74%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.24 |AvgSamplesPerSec=8.92
epoch: 0|step: 517|ppo_ep: 1|act_loss: 0.0352783203125|cri_loss: 0.0112152099609375|unsuper_loss: 0.0
average reward score: 4.32421875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.30s (66.09%) |Training time=0.99s (28.48%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.92
epoch: 0|step: 518|ppo_ep: 1|act_loss: 0.01397705078125|cri_loss: 0.006542205810546875|unsuper_loss: 0.0
average reward score: 4.15234375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.64%) |Training time=1.01s (29.00%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.92
[2023-06-30 06:02:34,457] [INFO] [logging.py:96:log_dist] [Rank 0] step=520, skipped=11, lr=[5.144259927853028e-06, 5.144259927853028e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:02:34,490] [INFO] [timer.py:215:stop] epoch=0/micro_step=520/global_step=520, RunningAvgSamplesPerSec=45.1065882541887, CurrSamplesPerSec=46.06012420177572, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:02:34,648] [INFO] [logging.py:96:log_dist] [Rank 0] step=520, skipped=8, lr=[2.6378805012127053e-06, 2.6378805012127053e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 519|ppo_ep: 1|act_loss: -0.0026073455810546875|cri_loss: 0.0051727294921875|unsuper_loss: 0.0
average reward score: 4.24609375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.41%) |Training time=1.03s (29.21%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.92
epoch: 0|step: 520|ppo_ep: 1|act_loss: -0.04644775390625|cri_loss: 0.015960693359375|unsuper_loss: 0.0
average reward score: 4.3515625
-------------------------------------------------------------------------------------
|E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.37s (65.72%) |Training time=1.04s (29.02%) |Others=0.19 (5.27%)|CurSamplesPerSec=8.89 |AvgSamplesPerSec=8.92
epoch: 0|step: 521|ppo_ep: 1|act_loss: -0.0239715576171875|cri_loss: 0.01461029052734375|unsuper_loss: 0.0
average reward score: 4.07421875
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.28s (66.06%) |Training time=0.99s (28.51%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.26 |AvgSamplesPerSec=8.92
epoch: 0|step: 522|ppo_ep: 1|act_loss: 0.026275634765625|cri_loss: 0.0210723876953125|unsuper_loss: 0.0
average reward score: 4.0546875
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.81%) |Training time=1.00s (28.79%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.93
epoch: 0|step: 523|ppo_ep: 1|act_loss: -0.0411376953125|cri_loss: 0.012786865234375|unsuper_loss: 0.0
average reward score: 4.15625
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.96%) |Training time=0.99s (28.64%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.93
epoch: 0|step: 524|ppo_ep: 1|act_loss: 0.007312774658203125|cri_loss: 0.01068115234375|unsuper_loss: 0.0
average reward score: 3.716796875
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.29s (66.01%) |Training time=0.99s (28.58%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.23 |AvgSamplesPerSec=8.93
epoch: 0|step: 525|ppo_ep: 1|act_loss: 0.0237884521484375|cri_loss: 0.012664794921875|unsuper_loss: 0.0
average reward score: 4.3671875
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.82%) |Training time=1.00s (28.79%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.23 |AvgSamplesPerSec=8.93
epoch: 0|step: 526|ppo_ep: 1|act_loss: -0.0274505615234375|cri_loss: 0.0037593841552734375|unsuper_loss: 0.0
average reward score: 5.05078125
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.93%) |Training time=0.99s (28.66%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.93
epoch: 0|step: 527|ppo_ep: 1|act_loss: -0.0175018310546875|cri_loss: 0.00643157958984375|unsuper_loss: 0.0
average reward score: 4.37109375
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.12%) |Training time=1.04s (29.47%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.93
epoch: 0|step: 528|ppo_ep: 1|act_loss: -0.02667236328125|cri_loss: 0.00913238525390625|unsuper_loss: 0.0
average reward score: 3.99609375
-------------------------------------------------------------------------------------
|E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.35s (66.12%) |Training time=1.01s (28.51%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.01 |AvgSamplesPerSec=8.93
[2023-06-30 06:03:09,524] [INFO] [logging.py:96:log_dist] [Rank 0] step=530, skipped=11, lr=[4.966976548804123e-06, 4.966976548804123e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:03:09,557] [INFO] [timer.py:215:stop] epoch=0/micro_step=530/global_step=530, RunningAvgSamplesPerSec=45.14461428457451, CurrSamplesPerSec=47.290502206716056, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:03:09,715] [INFO] [logging.py:96:log_dist] [Rank 0] step=530, skipped=8, lr=[2.5459809062374304e-06, 2.5459809062374304e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 529|ppo_ep: 1|act_loss: 0.01200103759765625|cri_loss: 0.0075225830078125|unsuper_loss: 0.0
average reward score: 4.4609375
-------------------------------------------------------------------------------------
|E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.58%) |Training time=1.01s (28.13%) |Others=0.19 (5.29%)|CurSamplesPerSec=8.92 |AvgSamplesPerSec=8.93
epoch: 0|step: 530|ppo_ep: 1|act_loss: -0.0019292831420898438|cri_loss: 0.00745391845703125|unsuper_loss: 0.0
average reward score: 3.783203125
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.67%) |Training time=1.00s (28.93%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.93
epoch: 0|step: 531|ppo_ep: 1|act_loss: 0.003936767578125|cri_loss: 0.0264434814453125|unsuper_loss: 0.0
average reward score: 4.53125
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.84%) |Training time=1.00s (28.76%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.93
epoch: 0|step: 532|ppo_ep: 1|act_loss: -0.0312347412109375|cri_loss: 0.0127105712890625|unsuper_loss: 0.0
average reward score: 4.3203125
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.69%) |Training time=1.01s (28.93%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.93
epoch: 0|step: 533|ppo_ep: 1|act_loss: -0.021209716796875|cri_loss: 0.00754547119140625|unsuper_loss: 0.0
average reward score: 4.13671875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.37%) |Training time=1.02s (29.25%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.93
epoch: 0|step: 534|ppo_ep: 1|act_loss: 0.044921875|cri_loss: 0.01214599609375|unsuper_loss: 0.0
average reward score: 4.34375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.70%) |Training time=1.00s (28.90%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.23 |AvgSamplesPerSec=8.93
epoch: 0|step: 535|ppo_ep: 1|act_loss: 0.02679443359375|cri_loss: 0.00783538818359375|unsuper_loss: 0.0
average reward score: 4.1328125
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.65%) |Training time=1.00s (28.89%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.93
epoch: 0|step: 536|ppo_ep: 1|act_loss: -0.0186920166015625|cri_loss: 0.00928497314453125|unsuper_loss: 0.0
average reward score: 4.1015625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.37%) |Training time=1.02s (29.26%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.93
epoch: 0|step: 537|ppo_ep: 1|act_loss: -0.047943115234375|cri_loss: 0.01186370849609375|unsuper_loss: 0.0
average reward score: 4.2421875
-------------------------------------------------------------------------------------
|E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.00%) |Training time=1.04s (29.11%) |Others=0.21 (5.89%)|CurSamplesPerSec=8.96 |AvgSamplesPerSec=8.93
epoch: 0|step: 538|ppo_ep: 1|act_loss: 0.0007729530334472656|cri_loss: 0.01389312744140625|unsuper_loss: 0.0
average reward score: 3.892578125
-------------------------------------------------------------------------------------
|E2E latency=4.02s |Gather latency=0.00s (0.00%) |Generate time=2.44s (60.72%) |Training time=1.39s (34.46%) |Others=0.19 (4.82%)|CurSamplesPerSec=7.95 |AvgSamplesPerSec=8.93
[2023-06-30 06:03:45,050] [INFO] [logging.py:96:log_dist] [Rank 0] step=540, skipped=11, lr=[4.789501059016457e-06, 4.789501059016457e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:03:45,083] [INFO] [timer.py:215:stop] epoch=0/micro_step=540/global_step=540, RunningAvgSamplesPerSec=45.13469358431597, CurrSamplesPerSec=48.832087830846454, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:03:45,241] [INFO] [logging.py:96:log_dist] [Rank 0] step=540, skipped=8, lr=[2.454019093762571e-06, 2.454019093762571e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 539|ppo_ep: 1|act_loss: -0.020721435546875|cri_loss: 0.0085296630859375|unsuper_loss: 0.0
average reward score: 4.1875
-------------------------------------------------------------------------------------
|E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.42s (67.29%) |Training time=0.99s (27.41%) |Others=0.19 (5.31%)|CurSamplesPerSec=8.89 |AvgSamplesPerSec=8.93
epoch: 0|step: 540|ppo_ep: 1|act_loss: -0.032867431640625|cri_loss: 0.010467529296875|unsuper_loss: 0.0
average reward score: 3.49609375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.63%) |Training time=0.86s (24.90%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.23 |AvgSamplesPerSec=8.93
epoch: 0|step: 541|ppo_ep: 1|act_loss: 0.06732177734375|cri_loss: 0.01107025146484375|unsuper_loss: 0.0
average reward score: 3.296875
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.60%) |Training time=0.86s (24.90%) |Others=0.19 (5.51%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.93
epoch: 0|step: 542|ppo_ep: 1|act_loss: 0.043853759765625|cri_loss: 0.01200103759765625|unsuper_loss: 0.0
average reward score: 4.16015625
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.61%) |Training time=0.86s (24.97%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.24 |AvgSamplesPerSec=8.93
epoch: 0|step: 543|ppo_ep: 1|act_loss: 0.00897216796875|cri_loss: 0.00519561767578125|unsuper_loss: 0.0
average reward score: 4.33984375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.37%) |Training time=0.87s (25.11%) |Others=0.19 (5.52%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.93
epoch: 0|step: 544|ppo_ep: 1|act_loss: 0.030975341796875|cri_loss: 0.02166748046875|unsuper_loss: 0.0
average reward score: 3.65234375
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.71%) |Training time=0.86s (24.86%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.93
epoch: 0|step: 545|ppo_ep: 1|act_loss: -0.024505615234375|cri_loss: 0.00530242919921875|unsuper_loss: 0.0
average reward score: 3.94140625
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.29%) |Training time=0.88s (25.30%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.93
epoch: 0|step: 546|ppo_ep: 1|act_loss: -0.04931640625|cri_loss: 0.00975799560546875|unsuper_loss: 0.0
average reward score: 4.30859375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.27%) |Training time=0.89s (25.35%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.93
epoch: 0|step: 547|ppo_ep: 1|act_loss: -0.0038909912109375|cri_loss: 0.01065826416015625|unsuper_loss: 0.0
average reward score: 3.732421875
-------------------------------------------------------------------------------------
|E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.57s (70.17%) |Training time=0.90s (24.64%) |Others=0.19 (5.18%)|CurSamplesPerSec=8.73 |AvgSamplesPerSec=8.93
epoch: 0|step: 548|ppo_ep: 1|act_loss: -0.02227783203125|cri_loss: 0.00725555419921875|unsuper_loss: 0.0
average reward score: 4.57421875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.40%) |Training time=0.88s (25.19%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.93
[2023-06-30 06:04:19,992] [INFO] [logging.py:96:log_dist] [Rank 0] step=550, skipped=11, lr=[4.6120736034135566e-06, 4.6120736034135566e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:04:20,026] [INFO] [timer.py:215:stop] epoch=0/micro_step=550/global_step=550, RunningAvgSamplesPerSec=45.326886226252164, CurrSamplesPerSec=58.399130133234245, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:04:20,185] [INFO] [logging.py:96:log_dist] [Rank 0] step=550, skipped=8, lr=[2.3621194987872955e-06, 2.3621194987872955e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 549|ppo_ep: 1|act_loss: 0.0016775131225585938|cri_loss: 0.0086822509765625|unsuper_loss: 0.0
average reward score: 4.0703125
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.40s (69.26%) |Training time=0.88s (25.30%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.93
epoch: 0|step: 550|ppo_ep: 1|act_loss: -0.008270263671875|cri_loss: 0.0111541748046875|unsuper_loss: 0.0
average reward score: 4.125
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.63%) |Training time=0.86s (24.86%) |Others=0.19 (5.50%)|CurSamplesPerSec=9.23 |AvgSamplesPerSec=8.93
epoch: 0|step: 551|ppo_ep: 1|act_loss: 0.0026454925537109375|cri_loss: 0.01219940185546875|unsuper_loss: 0.0
average reward score: 3.427734375
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.40s (69.32%) |Training time=0.87s (25.14%) |Others=0.19 (5.55%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.93
epoch: 0|step: 552|ppo_ep: 1|act_loss: -0.0243377685546875|cri_loss: 0.01456451416015625|unsuper_loss: 0.0
average reward score: 3.6875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.32%) |Training time=0.88s (25.27%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.93
epoch: 0|step: 553|ppo_ep: 1|act_loss: -0.09478759765625|cri_loss: 0.0138092041015625|unsuper_loss: 0.0
average reward score: 4.671875
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.42%) |Training time=0.87s (25.12%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.94
epoch: 0|step: 554|ppo_ep: 1|act_loss: 0.048004150390625|cri_loss: 0.034088134765625|unsuper_loss: 0.0
average reward score: 4.328125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.45s (70.38%) |Training time=0.84s (24.19%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.94
epoch: 0|step: 555|ppo_ep: 1|act_loss: 0.055023193359375|cri_loss: 0.021087646484375|unsuper_loss: 0.0
average reward score: 4.703125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.38%) |Training time=0.88s (25.22%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.94
epoch: 0|step: 556|ppo_ep: 1|act_loss: 0.037506103515625|cri_loss: 0.01007080078125|unsuper_loss: 0.0
average reward score: 4.08984375
-------------------------------------------------------------------------------------
|E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.50s (70.10%) |Training time=0.87s (24.48%) |Others=0.19 (5.42%)|CurSamplesPerSec=8.98 |AvgSamplesPerSec=8.94
epoch: 0|step: 557|ppo_ep: 1|act_loss: 0.0352783203125|cri_loss: 0.03692626953125|unsuper_loss: 0.0
average reward score: 3.744140625
-------------------------------------------------------------------------------------
|E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.46s (69.34%) |Training time=0.90s (25.26%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.02 |AvgSamplesPerSec=8.94
epoch: 0|step: 558|ppo_ep: 1|act_loss: 0.003681182861328125|cri_loss: 0.01715087890625|unsuper_loss: 0.0
average reward score: 3.951171875
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.58%) |Training time=0.86s (24.96%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.94
[2023-06-30 06:04:54,880] [INFO] [logging.py:96:log_dist] [Rank 0] step=560, skipped=11, lr=[4.4349342619231196e-06, 4.4349342619231196e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:04:54,913] [INFO] [timer.py:215:stop] epoch=0/micro_step=560/global_step=560, RunningAvgSamplesPerSec=45.518962093528586, CurrSamplesPerSec=59.703801499959965, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:04:55,071] [INFO] [logging.py:96:log_dist] [Rank 0] step=560, skipped=8, lr=[2.270406472123277e-06, 2.270406472123277e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 559|ppo_ep: 1|act_loss: 0.03668212890625|cri_loss: 0.036834716796875|unsuper_loss: 0.0
average reward score: 4.1796875
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.56%) |Training time=0.87s (25.02%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.94
epoch: 0|step: 560|ppo_ep: 1|act_loss: -0.01861572265625|cri_loss: 0.0208892822265625|unsuper_loss: 0.0
average reward score: 4.71875
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.57%) |Training time=0.87s (25.00%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.23 |AvgSamplesPerSec=8.94
epoch: 0|step: 561|ppo_ep: 1|act_loss: 0.005977630615234375|cri_loss: 0.0154266357421875|unsuper_loss: 0.0
average reward score: 4.59765625
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.50%) |Training time=0.87s (25.06%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.23 |AvgSamplesPerSec=8.94
epoch: 0|step: 562|ppo_ep: 1|act_loss: -0.00301361083984375|cri_loss: 0.01593017578125|unsuper_loss: 0.0
average reward score: 4.27734375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.32%) |Training time=0.88s (25.25%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.94
epoch: 0|step: 563|ppo_ep: 1|act_loss: -0.03533935546875|cri_loss: 0.02239990234375|unsuper_loss: 0.0
average reward score: 3.578125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.46s (70.37%) |Training time=0.85s (24.23%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.94
epoch: 0|step: 564|ppo_ep: 1|act_loss: -0.07403564453125|cri_loss: 0.033294677734375|unsuper_loss: 0.0
average reward score: 3.94921875
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.41s (68.83%) |Training time=0.90s (25.80%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.94
[2023-06-30 06:05:15,967] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 565|ppo_ep: 1|act_loss: -0.048828125|cri_loss: 0.0224609375|unsuper_loss: 0.0
average reward score: 4.1171875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.46s (70.70%) |Training time=0.84s (24.18%) |Others=0.18 (5.11%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.94
epoch: 0|step: 566|ppo_ep: 1|act_loss: -0.006504058837890625|cri_loss: 0.0401611328125|unsuper_loss: 0.0
average reward score: 3.455078125
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.87%) |Training time=0.86s (24.67%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.94
epoch: 0|step: 567|ppo_ep: 1|act_loss: 0.03466796875|cri_loss: 0.0244140625|unsuper_loss: 0.0
average reward score: 4.01953125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.08%) |Training time=0.89s (25.51%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.94
epoch: 0|step: 568|ppo_ep: 1|act_loss: -0.0301361083984375|cri_loss: 0.0648193359375|unsuper_loss: 0.0
average reward score: 3.55078125
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.44%) |Training time=0.87s (25.13%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.23 |AvgSamplesPerSec=8.94
[2023-06-30 06:05:29,673] [INFO] [logging.py:96:log_dist] [Rank 0] step=570, skipped=11, lr=[4.2583227246210355e-06, 4.2583227246210355e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:05:29,707] [INFO] [timer.py:215:stop] epoch=0/micro_step=570/global_step=570, RunningAvgSamplesPerSec=45.70613248299846, CurrSamplesPerSec=59.25960357824215, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:05:29,865] [INFO] [logging.py:96:log_dist] [Rank 0] step=570, skipped=9, lr=[2.1881268392529074e-06, 2.1881268392529074e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 569|ppo_ep: 1|act_loss: 0.06268310546875|cri_loss: 0.043792724609375|unsuper_loss: 0.0
average reward score: 4.08984375
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.40s (69.44%) |Training time=0.87s (25.13%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.94
epoch: 0|step: 570|ppo_ep: 1|act_loss: 0.10986328125|cri_loss: 0.039031982421875|unsuper_loss: 0.0
average reward score: 4.3046875
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.47%) |Training time=0.87s (25.10%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.23 |AvgSamplesPerSec=8.94
epoch: 0|step: 571|ppo_ep: 1|act_loss: -0.0264434814453125|cri_loss: 0.0970458984375|unsuper_loss: 0.0
average reward score: 3.994140625
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.50%) |Training time=0.86s (24.95%) |Others=0.19 (5.55%)|CurSamplesPerSec=9.24 |AvgSamplesPerSec=8.94
epoch: 0|step: 572|ppo_ep: 1|act_loss: 0.074462890625|cri_loss: 0.040191650390625|unsuper_loss: 0.0
average reward score: 4.25
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.71%) |Training time=0.87s (24.89%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.94
epoch: 0|step: 573|ppo_ep: 1|act_loss: 0.03759765625|cri_loss: 0.034912109375|unsuper_loss: 0.0
average reward score: 4.28125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.11%) |Training time=0.89s (25.50%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.94
epoch: 0|step: 574|ppo_ep: 1|act_loss: 0.03778076171875|cri_loss: 0.02825927734375|unsuper_loss: 0.0
average reward score: 4.234375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.02%) |Training time=0.88s (25.05%) |Others=0.21 (5.93%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.94
epoch: 0|step: 575|ppo_ep: 1|act_loss: -0.0308074951171875|cri_loss: 0.0782470703125|unsuper_loss: 0.0
average reward score: 4.140625
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.45%) |Training time=0.87s (25.13%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.23 |AvgSamplesPerSec=8.94
epoch: 0|step: 576|ppo_ep: 1|act_loss: -0.10601806640625|cri_loss: 0.052978515625|unsuper_loss: 0.0
average reward score: 4.17578125
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.33%) |Training time=0.87s (25.14%) |Others=0.19 (5.53%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.94
epoch: 0|step: 577|ppo_ep: 1|act_loss: -0.0025577545166015625|cri_loss: 0.0234375|unsuper_loss: 0.0
average reward score: 4.3203125
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.56%) |Training time=0.87s (24.99%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.24 |AvgSamplesPerSec=8.95
[2023-06-30 06:06:01,153] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384
epoch: 0|step: 578|ppo_ep: 1|act_loss: -0.023773193359375|cri_loss: 0.047332763671875|unsuper_loss: 0.0
average reward score: 3.623046875
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.84%) |Training time=0.86s (25.06%) |Others=0.18 (5.11%)|CurSamplesPerSec=9.27 |AvgSamplesPerSec=8.95
[2023-06-30 06:06:04,423] [INFO] [logging.py:96:log_dist] [Rank 0] step=580, skipped=11, lr=[4.082477967402902e-06, 4.082477967402902e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:06:04,456] [INFO] [timer.py:215:stop] epoch=0/micro_step=580/global_step=580, RunningAvgSamplesPerSec=45.886804320916056, CurrSamplesPerSec=59.54347807762965, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:06:04,614] [INFO] [logging.py:96:log_dist] [Rank 0] step=580, skipped=10, lr=[2.106189034161656e-06, 2.106189034161656e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 579|ppo_ep: 1|act_loss: -0.15625|cri_loss: 0.055908203125|unsuper_loss: 0.0
average reward score: 4.0234375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.51%) |Training time=0.87s (25.05%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.24 |AvgSamplesPerSec=8.95
epoch: 0|step: 580|ppo_ep: 1|act_loss: 0.02154541015625|cri_loss: 0.0455322265625|unsuper_loss: 0.0
average reward score: 3.794921875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.75%) |Training time=0.86s (24.81%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.95
epoch: 0|step: 581|ppo_ep: 1|act_loss: 0.01299285888671875|cri_loss: 0.0233154296875|unsuper_loss: 0.0
average reward score: 4.30859375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.54%) |Training time=0.87s (25.05%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.95
epoch: 0|step: 582|ppo_ep: 1|act_loss: 0.11810302734375|cri_loss: 0.030181884765625|unsuper_loss: 0.0
average reward score: 3.87109375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.60%) |Training time=0.87s (24.99%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.95
epoch: 0|step: 583|ppo_ep: 1|act_loss: 0.015380859375|cri_loss: 0.04425048828125|unsuper_loss: 0.0
average reward score: 3.7734375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.13%) |Training time=0.88s (25.21%) |Others=0.20 (5.66%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.95
epoch: 0|step: 584|ppo_ep: 1|act_loss: 0.04705810546875|cri_loss: 0.024810791015625|unsuper_loss: 0.0
average reward score: 3.9375
-------------------------------------------------------------------------------------
|E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=2.42s (60.67%) |Training time=1.38s (34.47%) |Others=0.19 (4.86%)|CurSamplesPerSec=8.02 |AvgSamplesPerSec=8.95
epoch: 0|step: 585|ppo_ep: 1|act_loss: -0.0139923095703125|cri_loss: 0.0290069580078125|unsuper_loss: 0.0
average reward score: 3.796875
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.41s (60.40%) |Training time=1.38s (34.61%) |Others=0.20 (4.99%)|CurSamplesPerSec=8.03 |AvgSamplesPerSec=8.94
epoch: 0|step: 586|ppo_ep: 1|act_loss: -0.1075439453125|cri_loss: 0.05438232421875|unsuper_loss: 0.0
average reward score: 4.1015625
-------------------------------------------------------------------------------------
|E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=2.42s (60.61%) |Training time=1.38s (34.53%) |Others=0.19 (4.86%)|CurSamplesPerSec=8.01 |AvgSamplesPerSec=8.94
epoch: 0|step: 587|ppo_ep: 1|act_loss: -0.01629638671875|cri_loss: 0.01226043701171875|unsuper_loss: 0.0
average reward score: 3.552734375
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.41s (60.60%) |Training time=1.37s (34.54%) |Others=0.19 (4.87%)|CurSamplesPerSec=8.04 |AvgSamplesPerSec=8.94
epoch: 0|step: 588|ppo_ep: 1|act_loss: 0.038818359375|cri_loss: 0.0293426513671875|unsuper_loss: 0.0
average reward score: 3.55859375
-------------------------------------------------------------------------------------
|E2E latency=4.01s |Gather latency=0.00s (0.00%) |Generate time=2.41s (60.25%) |Training time=1.40s (34.93%) |Others=0.19 (4.82%)|CurSamplesPerSec=7.99 |AvgSamplesPerSec=8.94
[2023-06-30 06:06:42,278] [INFO] [logging.py:96:log_dist] [Rank 0] step=590, skipped=11, lr=[3.907637928621924e-06, 3.907637928621924e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:06:42,311] [INFO] [timer.py:215:stop] epoch=0/micro_step=590/global_step=590, RunningAvgSamplesPerSec=45.721959649104875, CurrSamplesPerSec=30.82236811284549, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:06:42,472] [INFO] [logging.py:96:log_dist] [Rank 0] step=590, skipped=10, lr=[2.0156571533902627e-06, 2.0156571533902627e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 589|ppo_ep: 1|act_loss: 0.03289794921875|cri_loss: 0.0272369384765625|unsuper_loss: 0.0
average reward score: 4.30078125
-------------------------------------------------------------------------------------
|E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=2.43s (60.89%) |Training time=1.37s (34.28%) |Others=0.19 (4.82%)|CurSamplesPerSec=8.01 |AvgSamplesPerSec=8.94
epoch: 0|step: 590|ppo_ep: 1|act_loss: -0.04559326171875|cri_loss: 0.0307159423828125|unsuper_loss: 0.0
average reward score: 3.99609375
-------------------------------------------------------------------------------------
|E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=2.42s (60.62%) |Training time=1.38s (34.53%) |Others=0.19 (4.85%)|CurSamplesPerSec=8.02 |AvgSamplesPerSec=8.94
epoch: 0|step: 591|ppo_ep: 1|act_loss: 0.00658416748046875|cri_loss: 0.008087158203125|unsuper_loss: 0.0
average reward score: 3.68359375
-------------------------------------------------------------------------------------
|E2E latency=4.08s |Gather latency=0.00s (0.00%) |Generate time=2.43s (59.59%) |Training time=1.46s (35.75%) |Others=0.19 (4.67%)|CurSamplesPerSec=7.85 |AvgSamplesPerSec=8.93
epoch: 0|step: 592|ppo_ep: 1|act_loss: 0.013427734375|cri_loss: 0.0188751220703125|unsuper_loss: 0.0
average reward score: 3.369140625
-------------------------------------------------------------------------------------
|E2E latency=3.77s |Gather latency=0.00s (0.00%) |Generate time=2.41s (63.83%) |Training time=1.17s (31.02%) |Others=0.19 (5.15%)|CurSamplesPerSec=8.49 |AvgSamplesPerSec=8.93
epoch: 0|step: 593|ppo_ep: 1|act_loss: -0.00677490234375|cri_loss: 0.02044677734375|unsuper_loss: 0.0
average reward score: 4.0859375
-------------------------------------------------------------------------------------
|E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=2.40s (60.14%) |Training time=1.40s (35.02%) |Others=0.19 (4.84%)|CurSamplesPerSec=8.00 |AvgSamplesPerSec=8.93
epoch: 0|step: 594|ppo_ep: 1|act_loss: -0.0111846923828125|cri_loss: 0.01471710205078125|unsuper_loss: 0.0
average reward score: 4.19921875
-------------------------------------------------------------------------------------
|E2E latency=4.07s |Gather latency=0.00s (0.00%) |Generate time=2.46s (60.43%) |Training time=1.41s (34.72%) |Others=0.20 (4.85%)|CurSamplesPerSec=7.87 |AvgSamplesPerSec=8.93
epoch: 0|step: 595|ppo_ep: 1|act_loss: 0.0144195556640625|cri_loss: 0.0113067626953125|unsuper_loss: 0.0
average reward score: 3.8671875
-------------------------------------------------------------------------------------
|E2E latency=4.01s |Gather latency=0.00s (0.00%) |Generate time=2.43s (60.62%) |Training time=1.38s (34.54%) |Others=0.19 (4.84%)|CurSamplesPerSec=7.99 |AvgSamplesPerSec=8.93
epoch: 0|step: 596|ppo_ep: 1|act_loss: 0.0140838623046875|cri_loss: 0.00978851318359375|unsuper_loss: 0.0
average reward score: 3.8046875
-------------------------------------------------------------------------------------
|E2E latency=4.02s |Gather latency=0.00s (0.00%) |Generate time=2.44s (60.69%) |Training time=1.38s (34.21%) |Others=0.21 (5.10%)|CurSamplesPerSec=7.96 |AvgSamplesPerSec=8.93
epoch: 0|step: 597|ppo_ep: 1|act_loss: 0.0025177001953125|cri_loss: 0.0083770751953125|unsuper_loss: 0.0
average reward score: 3.68359375
-------------------------------------------------------------------------------------
|E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=2.43s (60.69%) |Training time=1.38s (34.46%) |Others=0.19 (4.85%)|CurSamplesPerSec=8.01 |AvgSamplesPerSec=8.92
epoch: 0|step: 598|ppo_ep: 1|act_loss: -0.038116455078125|cri_loss: 0.01666259765625|unsuper_loss: 0.0
average reward score: 3.59765625
-------------------------------------------------------------------------------------
|E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.42s (60.68%) |Training time=1.37s (34.47%) |Others=0.19 (4.85%)|CurSamplesPerSec=8.03 |AvgSamplesPerSec=8.92
[2023-06-30 06:07:21,996] [INFO] [logging.py:96:log_dist] [Rank 0] step=600, skipped=11, lr=[3.734039187130717e-06, 3.734039187130717e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:07:22,029] [INFO] [timer.py:215:stop] epoch=0/micro_step=600/global_step=600, RunningAvgSamplesPerSec=45.381864437872785, CurrSamplesPerSec=40.76074899979288, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:07:22,188] [INFO] [logging.py:96:log_dist] [Rank 0] step=600, skipped=10, lr=[1.9257806446705116e-06, 1.9257806446705116e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 599|ppo_ep: 1|act_loss: -0.080078125|cri_loss: 0.01282501220703125|unsuper_loss: 0.0
average reward score: 3.578125
-------------------------------------------------------------------------------------
|E2E latency=3.81s |Gather latency=0.00s (0.00%) |Generate time=2.50s (65.66%) |Training time=1.12s (29.34%) |Others=0.19 (5.00%)|CurSamplesPerSec=8.40 |AvgSamplesPerSec=8.92
epoch: 0|step: 600|ppo_ep: 1|act_loss: 0.045379638671875|cri_loss: 0.0293731689453125|unsuper_loss: 0.0
average reward score: 3.48046875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.08%) |Training time=0.89s (25.51%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.92
epoch: 0|step: 601|ppo_ep: 1|act_loss: 0.0272369384765625|cri_loss: 0.00710296630859375|unsuper_loss: 0.0
average reward score: 3.42578125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.41s (68.85%) |Training time=0.90s (25.77%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.92
epoch: 0|step: 602|ppo_ep: 1|act_loss: 0.0302581787109375|cri_loss: 0.0285186767578125|unsuper_loss: 0.0
average reward score: 3.603515625
-------------------------------------------------------------------------------------
|E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.42s (68.25%) |Training time=0.93s (26.35%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.02 |AvgSamplesPerSec=8.92
epoch: 0|step: 603|ppo_ep: 1|act_loss: 0.0022869110107421875|cri_loss: 0.0142059326171875|unsuper_loss: 0.0
average reward score: 3.794921875
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.43s (68.86%) |Training time=0.91s (25.76%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.05 |AvgSamplesPerSec=8.92
epoch: 0|step: 604|ppo_ep: 1|act_loss: 0.0189361572265625|cri_loss: 0.012969970703125|unsuper_loss: 0.0
average reward score: 4.38671875
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.50s (70.54%) |Training time=0.85s (24.04%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.03 |AvgSamplesPerSec=8.92
epoch: 0|step: 605|ppo_ep: 1|act_loss: -0.048431396484375|cri_loss: 0.0111846923828125|unsuper_loss: 0.0
average reward score: 4.05078125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.31%) |Training time=0.89s (25.24%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.92
epoch: 0|step: 606|ppo_ep: 1|act_loss: -0.0266876220703125|cri_loss: 0.006565093994140625|unsuper_loss: 0.0
average reward score: 4.28125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.06%) |Training time=0.89s (25.52%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.92
epoch: 0|step: 607|ppo_ep: 1|act_loss: -0.0238494873046875|cri_loss: 0.01386260986328125|unsuper_loss: 0.0
average reward score: 4.30078125
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.41s (68.45%) |Training time=0.92s (26.19%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.92
epoch: 0|step: 608|ppo_ep: 1|act_loss: -0.0024814605712890625|cri_loss: 0.00897979736328125|unsuper_loss: 0.0
average reward score: 4.29296875
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.45s (69.13%) |Training time=0.90s (25.52%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.03 |AvgSamplesPerSec=8.92
[2023-06-30 06:07:57,169] [INFO] [logging.py:96:log_dist] [Rank 0] step=610, skipped=11, lr=[3.5619166421626894e-06, 3.5619166421626894e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:07:57,202] [INFO] [timer.py:215:stop] epoch=0/micro_step=610/global_step=610, RunningAvgSamplesPerSec=45.527624725544385, CurrSamplesPerSec=57.836389452070435, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:07:57,361] [INFO] [logging.py:96:log_dist] [Rank 0] step=610, skipped=10, lr=[1.8366811213437092e-06, 1.8366811213437092e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 609|ppo_ep: 1|act_loss: -0.00928497314453125|cri_loss: 0.00783538818359375|unsuper_loss: 0.0
average reward score: 3.93359375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.19%) |Training time=0.88s (25.40%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.92
epoch: 0|step: 610|ppo_ep: 1|act_loss: -0.0615234375|cri_loss: 0.01010894775390625|unsuper_loss: 0.0
average reward score: 4.6171875
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.42s (68.91%) |Training time=0.90s (25.71%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.92
epoch: 0|step: 611|ppo_ep: 1|act_loss: -0.0330810546875|cri_loss: 0.01334381103515625|unsuper_loss: 0.0
average reward score: 4.34765625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.41s (68.85%) |Training time=0.90s (25.76%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.92
epoch: 0|step: 612|ppo_ep: 1|act_loss: -0.00586700439453125|cri_loss: 0.01206207275390625|unsuper_loss: 0.0
average reward score: 4.76953125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.41s (68.95%) |Training time=0.90s (25.65%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.93
epoch: 0|step: 613|ppo_ep: 1|act_loss: 0.047607421875|cri_loss: 0.0160675048828125|unsuper_loss: 0.0
average reward score: 4.53515625
-------------------------------------------------------------------------------------
|E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.49s (69.72%) |Training time=0.89s (24.92%) |Others=0.19 (5.35%)|CurSamplesPerSec=8.96 |AvgSamplesPerSec=8.93
epoch: 0|step: 614|ppo_ep: 1|act_loss: 0.09625244140625|cri_loss: 0.028961181640625|unsuper_loss: 0.0
average reward score: 4.7109375
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.03%) |Training time=0.90s (25.61%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.07 |AvgSamplesPerSec=8.93
epoch: 0|step: 615|ppo_ep: 1|act_loss: -0.040802001953125|cri_loss: 0.01242828369140625|unsuper_loss: 0.0
average reward score: 4.6796875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.41s (68.74%) |Training time=0.91s (25.87%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.93
epoch: 0|step: 616|ppo_ep: 1|act_loss: -0.0235595703125|cri_loss: 0.0068817138671875|unsuper_loss: 0.0
average reward score: 4.6875
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.41s (68.50%) |Training time=0.92s (26.13%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=8.93
epoch: 0|step: 617|ppo_ep: 1|act_loss: -0.04254150390625|cri_loss: 0.0081787109375|unsuper_loss: 0.0
average reward score: 4.671875
-------------------------------------------------------------------------------------
|E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.43s (68.15%) |Training time=0.95s (26.53%) |Others=0.19 (5.31%)|CurSamplesPerSec=8.98 |AvgSamplesPerSec=8.93
epoch: 0|step: 618|ppo_ep: 1|act_loss: 0.0008211135864257812|cri_loss: 0.009185791015625|unsuper_loss: 0.0
average reward score: 5.41796875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.03%) |Training time=0.89s (25.55%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.93
[2023-06-30 06:08:32,360] [INFO] [logging.py:96:log_dist] [Rank 0] step=620, skipped=11, lr=[3.3915031954861193e-06, 3.3915031954861193e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:08:32,393] [INFO] [timer.py:215:stop] epoch=0/micro_step=620/global_step=620, RunningAvgSamplesPerSec=45.663119917819884, CurrSamplesPerSec=58.48575371511962, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:08:32,552] [INFO] [logging.py:96:log_dist] [Rank 0] step=620, skipped=10, lr=[1.7484791453998007e-06, 1.7484791453998007e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 619|ppo_ep: 1|act_loss: 0.04644775390625|cri_loss: 0.0080413818359375|unsuper_loss: 0.0
average reward score: 5.0
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.38%) |Training time=0.88s (25.19%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.93
epoch: 0|step: 620|ppo_ep: 1|act_loss: -0.006732940673828125|cri_loss: 0.0074005126953125|unsuper_loss: 0.0
average reward score: 4.625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.15%) |Training time=0.89s (25.47%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.93
epoch: 0|step: 621|ppo_ep: 1|act_loss: 0.04315185546875|cri_loss: 0.00870513916015625|unsuper_loss: 0.0
average reward score: 4.57421875
-------------------------------------------------------------------------------------
|E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.41s (67.74%) |Training time=0.96s (26.93%) |Others=0.19 (5.33%)|CurSamplesPerSec=8.99 |AvgSamplesPerSec=8.93
epoch: 0|step: 622|ppo_ep: 1|act_loss: -0.0020084381103515625|cri_loss: 0.01219940185546875|unsuper_loss: 0.0
average reward score: 4.625
-------------------------------------------------------------------------------------
|E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.46s (67.83%) |Training time=0.97s (26.79%) |Others=0.20 (5.38%)|CurSamplesPerSec=8.82 |AvgSamplesPerSec=8.93
epoch: 0|step: 623|ppo_ep: 1|act_loss: -0.0072479248046875|cri_loss: 0.00778961181640625|unsuper_loss: 0.0
average reward score: 5.41796875
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.41s (68.46%) |Training time=0.92s (26.10%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.93
epoch: 0|step: 624|ppo_ep: 1|act_loss: 0.0182342529296875|cri_loss: 0.00749969482421875|unsuper_loss: 0.0
average reward score: 4.58203125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.21%) |Training time=0.88s (25.36%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.93
epoch: 0|step: 625|ppo_ep: 1|act_loss: -0.022674560546875|cri_loss: 0.00695037841796875|unsuper_loss: 0.0
average reward score: 4.609375
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.41s (68.15%) |Training time=0.93s (26.46%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.06 |AvgSamplesPerSec=8.93
epoch: 0|step: 626|ppo_ep: 1|act_loss: -0.01273345947265625|cri_loss: 0.007843017578125|unsuper_loss: 0.0
average reward score: 4.37109375
-------------------------------------------------------------------------------------
|E2E latency=3.73s |Gather latency=0.00s (0.00%) |Generate time=2.48s (66.46%) |Training time=1.06s (28.46%) |Others=0.19 (5.08%)|CurSamplesPerSec=8.58 |AvgSamplesPerSec=8.93
epoch: 0|step: 627|ppo_ep: 1|act_loss: 0.09210205078125|cri_loss: 0.0162506103515625|unsuper_loss: 0.0
average reward score: 4.3125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.41s (68.87%) |Training time=0.90s (25.68%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.93
epoch: 0|step: 628|ppo_ep: 1|act_loss: -0.0258941650390625|cri_loss: 0.01226806640625|unsuper_loss: 0.0
average reward score: 4.390625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.26%) |Training time=0.88s (25.30%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.93
[2023-06-30 06:09:07,828] [INFO] [logging.py:96:log_dist] [Rank 0] step=630, skipped=11, lr=[3.223029436261057e-06, 3.223029436261057e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:09:07,862] [INFO] [timer.py:215:stop] epoch=0/micro_step=630/global_step=630, RunningAvgSamplesPerSec=45.76655281749269, CurrSamplesPerSec=54.871997105496085, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:09:08,020] [INFO] [logging.py:96:log_dist] [Rank 0] step=630, skipped=10, lr=[1.6612940643430136e-06, 1.6612940643430136e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 629|ppo_ep: 1|act_loss: -0.0197601318359375|cri_loss: 0.00620269775390625|unsuper_loss: 0.0
average reward score: 4.31640625
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.41s (68.61%) |Training time=0.92s (26.03%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.93
epoch: 0|step: 630|ppo_ep: 1|act_loss: 0.0067596435546875|cri_loss: 0.009307861328125|unsuper_loss: 0.0
average reward score: 4.6328125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.05%) |Training time=0.90s (25.55%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.93
epoch: 0|step: 631|ppo_ep: 1|act_loss: -0.01136016845703125|cri_loss: 0.005397796630859375|unsuper_loss: 0.0
average reward score: 4.65234375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.42%) |Training time=0.88s (25.17%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.93
epoch: 0|step: 632|ppo_ep: 1|act_loss: -0.02313232421875|cri_loss: 0.00811004638671875|unsuper_loss: 0.0
average reward score: 4.45703125
-------------------------------------------------------------------------------------
|E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.42s (68.19%) |Training time=0.94s (26.47%) |Others=0.19 (5.34%)|CurSamplesPerSec=9.00 |AvgSamplesPerSec=8.93
epoch: 0|step: 633|ppo_ep: 1|act_loss: -0.0018215179443359375|cri_loss: 0.0057220458984375|unsuper_loss: 0.0
average reward score: 4.71484375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.47%) |Training time=0.87s (25.09%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.93
epoch: 0|step: 634|ppo_ep: 1|act_loss: -0.011138916015625|cri_loss: 0.00698089599609375|unsuper_loss: 0.0
average reward score: 4.24609375
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.42s (68.41%) |Training time=0.93s (26.24%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.04 |AvgSamplesPerSec=8.93
epoch: 0|step: 635|ppo_ep: 1|act_loss: 0.00948333740234375|cri_loss: 0.004596710205078125|unsuper_loss: 0.0
average reward score: 4.078125
-------------------------------------------------------------------------------------
|E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.47s (69.55%) |Training time=0.88s (24.88%) |Others=0.20 (5.57%)|CurSamplesPerSec=9.01 |AvgSamplesPerSec=8.93
epoch: 0|step: 636|ppo_ep: 1|act_loss: -0.020050048828125|cri_loss: 0.006565093994140625|unsuper_loss: 0.0
average reward score: 4.140625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.41s (69.15%) |Training time=0.89s (25.43%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.93
epoch: 0|step: 637|ppo_ep: 1|act_loss: -0.041534423828125|cri_loss: 0.006378173828125|unsuper_loss: 0.0
average reward score: 4.51171875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.44%) |Training time=0.87s (25.15%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.93
epoch: 0|step: 638|ppo_ep: 1|act_loss: 0.01465606689453125|cri_loss: 0.01256561279296875|unsuper_loss: 0.0
average reward score: 4.296875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.25%) |Training time=0.88s (25.33%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.93
[2023-06-30 06:09:42,909] [INFO] [logging.py:96:log_dist] [Rank 0] step=640, skipped=11, lr=[3.056723329025442e-06, 3.056723329025442e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:09:42,943] [INFO] [timer.py:215:stop] epoch=0/micro_step=640/global_step=640, RunningAvgSamplesPerSec=45.907660627664704, CurrSamplesPerSec=58.518979968398696, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:09:43,101] [INFO] [logging.py:96:log_dist] [Rank 0] step=640, skipped=10, lr=[1.5752438497008405e-06, 1.5752438497008405e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 639|ppo_ep: 1|act_loss: -0.00928497314453125|cri_loss: 0.011810302734375|unsuper_loss: 0.0
average reward score: 4.375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.36%) |Training time=0.88s (25.22%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.93
epoch: 0|step: 640|ppo_ep: 1|act_loss: 0.03204345703125|cri_loss: 0.01212310791015625|unsuper_loss: 0.0
average reward score: 3.552734375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.42s (68.79%) |Training time=0.90s (25.69%) |Others=0.19 (5.52%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.93
epoch: 0|step: 641|ppo_ep: 1|act_loss: 0.0271148681640625|cri_loss: 0.0221710205078125|unsuper_loss: 0.0
average reward score: 4.359375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.14%) |Training time=0.89s (25.45%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.93
epoch: 0|step: 642|ppo_ep: 1|act_loss: -0.055572509765625|cri_loss: 0.006927490234375|unsuper_loss: 0.0
average reward score: 4.64453125
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.43s (68.84%) |Training time=0.91s (25.80%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.07 |AvgSamplesPerSec=8.93
epoch: 0|step: 643|ppo_ep: 1|act_loss: -0.024658203125|cri_loss: 0.008544921875|unsuper_loss: 0.0
average reward score: 4.4453125
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.42s (68.67%) |Training time=0.91s (25.93%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.07 |AvgSamplesPerSec=8.93
epoch: 0|step: 644|ppo_ep: 1|act_loss: 0.0306549072265625|cri_loss: 0.005748748779296875|unsuper_loss: 0.0
average reward score: 4.65234375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.16%) |Training time=0.89s (25.45%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.93
epoch: 0|step: 645|ppo_ep: 1|act_loss: 0.00838470458984375|cri_loss: 0.00821685791015625|unsuper_loss: 0.0
average reward score: 4.13671875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.54%) |Training time=0.87s (25.04%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.93
epoch: 0|step: 646|ppo_ep: 1|act_loss: 0.01558685302734375|cri_loss: 0.007091522216796875|unsuper_loss: 0.0
average reward score: 4.37890625
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.45%) |Training time=0.88s (25.14%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.93
epoch: 0|step: 647|ppo_ep: 1|act_loss: 0.041229248046875|cri_loss: 0.0172119140625|unsuper_loss: 0.0
average reward score: 4.77734375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.30%) |Training time=0.89s (25.28%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.93
epoch: 0|step: 648|ppo_ep: 1|act_loss: 0.00714111328125|cri_loss: 0.01448822021484375|unsuper_loss: 0.0
average reward score: 4.625
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.97%) |Training time=0.85s (24.60%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.93
[2023-06-30 06:10:17,947] [INFO] [logging.py:96:log_dist] [Rank 0] step=650, skipped=11, lr=[2.8928099052326388e-06, 2.8928099052326388e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:10:17,981] [INFO] [timer.py:215:stop] epoch=0/micro_step=650/global_step=650, RunningAvgSamplesPerSec=46.04873424124126, CurrSamplesPerSec=57.05609378068698, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:10:18,139] [INFO] [logging.py:96:log_dist] [Rank 0] step=650, skipped=10, lr=[1.490444937394879e-06, 1.490444937394879e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 649|ppo_ep: 1|act_loss: 0.04498291015625|cri_loss: 0.0092315673828125|unsuper_loss: 0.0
average reward score: 3.943359375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.22%) |Training time=0.89s (25.39%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.93
epoch: 0|step: 650|ppo_ep: 1|act_loss: 0.0460205078125|cri_loss: 0.01157379150390625|unsuper_loss: 0.0
average reward score: 4.7734375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.50%) |Training time=0.87s (25.08%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.93
epoch: 0|step: 651|ppo_ep: 1|act_loss: 0.0372314453125|cri_loss: 0.009918212890625|unsuper_loss: 0.0
average reward score: 4.1171875
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.43s (68.93%) |Training time=0.91s (25.70%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.93
epoch: 0|step: 652|ppo_ep: 1|act_loss: -0.00971221923828125|cri_loss: 0.01702880859375|unsuper_loss: 0.0
average reward score: 4.0703125
-------------------------------------------------------------------------------------
|E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.43s (68.04%) |Training time=0.95s (26.63%) |Others=0.19 (5.33%)|CurSamplesPerSec=8.98 |AvgSamplesPerSec=8.93
epoch: 0|step: 653|ppo_ep: 1|act_loss: -0.080810546875|cri_loss: 0.023406982421875|unsuper_loss: 0.0
average reward score: 4.4765625
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.43s (68.89%) |Training time=0.90s (25.69%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.94
epoch: 0|step: 654|ppo_ep: 1|act_loss: -0.07208251953125|cri_loss: 0.017364501953125|unsuper_loss: 0.0
average reward score: 4.53515625
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.71%) |Training time=0.86s (24.85%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.94
epoch: 0|step: 655|ppo_ep: 1|act_loss: -0.082763671875|cri_loss: 0.0229949951171875|unsuper_loss: 0.0
average reward score: 4.43359375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.93%) |Training time=0.86s (24.64%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.94
epoch: 0|step: 656|ppo_ep: 1|act_loss: -0.0237884521484375|cri_loss: 0.01465606689453125|unsuper_loss: 0.0
average reward score: 4.453125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.15%) |Training time=0.89s (25.47%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.94
epoch: 0|step: 657|ppo_ep: 1|act_loss: -0.0164031982421875|cri_loss: 0.01265716552734375|unsuper_loss: 0.0
average reward score: 4.09375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.53%) |Training time=0.87s (25.02%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.94
epoch: 0|step: 658|ppo_ep: 1|act_loss: -0.022186279296875|cri_loss: 0.03558349609375|unsuper_loss: 0.0
average reward score: 4.34375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.68%) |Training time=0.87s (24.91%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.94
[2023-06-30 06:10:52,944] [INFO] [logging.py:96:log_dist] [Rank 0] step=660, skipped=11, lr=[2.7315109587577825e-06, 2.7315109587577825e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:10:52,978] [INFO] [timer.py:215:stop] epoch=0/micro_step=660/global_step=660, RunningAvgSamplesPerSec=46.19067265840347, CurrSamplesPerSec=59.59879059742338, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:10:53,136] [INFO] [logging.py:96:log_dist] [Rank 0] step=660, skipped=10, lr=[1.407012070189524e-06, 1.407012070189524e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 659|ppo_ep: 1|act_loss: 0.0013494491577148438|cri_loss: 0.0288848876953125|unsuper_loss: 0.0
average reward score: 4.27734375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.62%) |Training time=0.87s (24.94%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.94
epoch: 0|step: 660|ppo_ep: 1|act_loss: 0.00800323486328125|cri_loss: 0.0149078369140625|unsuper_loss: 0.0
average reward score: 4.5546875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.80%) |Training time=0.86s (24.78%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.94
epoch: 0|step: 661|ppo_ep: 1|act_loss: -0.0516357421875|cri_loss: 0.023895263671875|unsuper_loss: 0.0
average reward score: 4.48046875
-------------------------------------------------------------------------------------
|E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.43s (68.27%) |Training time=0.93s (26.12%) |Others=0.20 (5.61%)|CurSamplesPerSec=8.98 |AvgSamplesPerSec=8.94
epoch: 0|step: 662|ppo_ep: 1|act_loss: 0.027679443359375|cri_loss: 0.0226287841796875|unsuper_loss: 0.0
average reward score: 4.1484375
-------------------------------------------------------------------------------------
|E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.46s (68.08%) |Training time=0.96s (26.67%) |Others=0.19 (5.25%)|CurSamplesPerSec=8.85 |AvgSamplesPerSec=8.94
epoch: 0|step: 663|ppo_ep: 1|act_loss: 0.033416748046875|cri_loss: 0.08404541015625|unsuper_loss: 0.0
average reward score: 3.748046875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.60%) |Training time=0.87s (24.98%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.94
epoch: 0|step: 664|ppo_ep: 1|act_loss: -0.058258056640625|cri_loss: 0.01971435546875|unsuper_loss: 0.0
average reward score: 4.29296875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.77%) |Training time=0.86s (24.78%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.94
epoch: 0|step: 665|ppo_ep: 1|act_loss: 0.0333251953125|cri_loss: 0.01474761962890625|unsuper_loss: 0.0
average reward score: 4.609375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.72%) |Training time=0.87s (24.85%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.94
epoch: 0|step: 666|ppo_ep: 1|act_loss: 0.07183837890625|cri_loss: 0.0218048095703125|unsuper_loss: 0.0
average reward score: 4.21484375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.56%) |Training time=0.87s (24.97%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.94
epoch: 0|step: 667|ppo_ep: 1|act_loss: 0.043792724609375|cri_loss: 0.03839111328125|unsuper_loss: 0.0
average reward score: 4.29296875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.79%) |Training time=0.86s (24.76%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.94
epoch: 0|step: 668|ppo_ep: 1|act_loss: -0.05487060546875|cri_loss: 0.03997802734375|unsuper_loss: 0.0
average reward score: 4.375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.64%) |Training time=0.87s (24.94%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.94
[2023-06-30 06:11:27,992] [INFO] [logging.py:96:log_dist] [Rank 0] step=670, skipped=11, lr=[2.573044745784934e-06, 2.573044745784934e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:11:28,025] [INFO] [timer.py:215:stop] epoch=0/micro_step=670/global_step=670, RunningAvgSamplesPerSec=46.33191280438729, CurrSamplesPerSec=59.294762958303, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:11:28,183] [INFO] [logging.py:96:log_dist] [Rank 0] step=670, skipped=10, lr=[1.3250581424317012e-06, 1.3250581424317012e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 669|ppo_ep: 1|act_loss: 0.0184326171875|cri_loss: 0.048431396484375|unsuper_loss: 0.0
average reward score: 3.88671875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.60%) |Training time=0.87s (24.96%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.94
epoch: 0|step: 670|ppo_ep: 1|act_loss: -0.060211181640625|cri_loss: 0.029541015625|unsuper_loss: 0.0
average reward score: 4.40234375
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.43s (68.79%) |Training time=0.88s (24.97%) |Others=0.22 (6.25%)|CurSamplesPerSec=9.05 |AvgSamplesPerSec=8.94
epoch: 0|step: 671|ppo_ep: 1|act_loss: -0.097900390625|cri_loss: 0.054595947265625|unsuper_loss: 0.0
average reward score: 4.1171875
-------------------------------------------------------------------------------------
|E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.50%) |Training time=1.03s (29.11%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.02 |AvgSamplesPerSec=8.94
epoch: 0|step: 672|ppo_ep: 1|act_loss: -0.03375244140625|cri_loss: 0.031280517578125|unsuper_loss: 0.0
average reward score: 4.04296875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.27%) |Training time=0.99s (28.34%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.94
epoch: 0|step: 673|ppo_ep: 1|act_loss: -0.0382080078125|cri_loss: 0.05267333984375|unsuper_loss: 0.0
average reward score: 4.29296875
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.77%) |Training time=1.02s (28.88%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.94
epoch: 0|step: 674|ppo_ep: 1|act_loss: 0.0291900634765625|cri_loss: 0.040618896484375|unsuper_loss: 0.0
average reward score: 3.96484375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.39%) |Training time=0.99s (28.24%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.94
epoch: 0|step: 675|ppo_ep: 1|act_loss: -0.032867431640625|cri_loss: 0.046478271484375|unsuper_loss: 0.0
average reward score: 4.1953125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.11%) |Training time=1.00s (28.53%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.94
epoch: 0|step: 676|ppo_ep: 1|act_loss: 0.045196533203125|cri_loss: 0.09332275390625|unsuper_loss: 0.0
average reward score: 3.935546875
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.95%) |Training time=1.01s (28.69%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.94
epoch: 0|step: 677|ppo_ep: 1|act_loss: -0.02142333984375|cri_loss: 0.0616455078125|unsuper_loss: 0.0
average reward score: 4.34765625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.96%) |Training time=1.00s (28.67%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.94
epoch: 0|step: 678|ppo_ep: 1|act_loss: 0.038604736328125|cri_loss: 0.057464599609375|unsuper_loss: 0.0
average reward score: 4.30859375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.02%) |Training time=1.01s (28.63%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.94
[2023-06-30 06:12:03,157] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, but hysteresis is 2. Reducing hysteresis to 1
[2023-06-30 06:12:03,158] [INFO] [logging.py:96:log_dist] [Rank 0] step=680, skipped=12, lr=[2.4330244458931195e-06, 2.4330244458931195e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:12:03,159] [INFO] [timer.py:215:stop] epoch=0/micro_step=680/global_step=680, RunningAvgSamplesPerSec=46.35903295327494, CurrSamplesPerSec=48.040936066031456, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:12:03,317] [INFO] [logging.py:96:log_dist] [Rank 0] step=680, skipped=10, lr=[1.24469404729171e-06, 1.24469404729171e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 679|ppo_ep: 1|act_loss: -0.03143310546875|cri_loss: 0.09210205078125|unsuper_loss: 0.0
average reward score: 3.8515625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.10%) |Training time=1.00s (28.46%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.94
epoch: 0|step: 680|ppo_ep: 1|act_loss: 0.05706787109375|cri_loss: 0.06646728515625|unsuper_loss: 0.0
average reward score: 4.20703125
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.33s (66.22%) |Training time=1.00s (28.43%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.94
[2023-06-30 06:12:10,185] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096
epoch: 0|step: 681|ppo_ep: 1|act_loss: -0.00507354736328125|cri_loss: 0.050445556640625|unsuper_loss: 0.0
average reward score: 4.0390625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.21%) |Training time=1.00s (28.42%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.94
[2023-06-30 06:12:13,877] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 682|ppo_ep: 1|act_loss: -0.048614501953125|cri_loss: 0.046722412109375|unsuper_loss: 0.0
average reward score: 4.15625
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.35s (66.43%) |Training time=1.01s (28.57%) |Others=0.18 (5.01%)|CurSamplesPerSec=9.06 |AvgSamplesPerSec=8.94
epoch: 0|step: 683|ppo_ep: 1|act_loss: -0.03729248046875|cri_loss: 0.10260009765625|unsuper_loss: 0.0
average reward score: 4.1171875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.06%) |Training time=1.00s (28.59%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.94
[2023-06-30 06:12:20,705] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048
[2023-06-30 06:12:20,853] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384
epoch: 0|step: 684|ppo_ep: 1|act_loss: 0.008819580078125|cri_loss: 0.078369140625|unsuper_loss: 0.0
average reward score: 4.3984375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.34s (67.30%) |Training time=0.96s (27.65%) |Others=0.18 (5.05%)|CurSamplesPerSec=9.22 |AvgSamplesPerSec=8.94
epoch: 0|step: 685|ppo_ep: 1|act_loss: 0.034576416015625|cri_loss: 0.05255126953125|unsuper_loss: 0.0
average reward score: 4.015625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.01%) |Training time=1.01s (28.63%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.94
epoch: 0|step: 686|ppo_ep: 1|act_loss: -0.055633544921875|cri_loss: 0.1317138671875|unsuper_loss: 0.0
average reward score: 4.20703125
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.75%) |Training time=1.02s (28.88%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=8.94
epoch: 0|step: 687|ppo_ep: 1|act_loss: 0.029998779296875|cri_loss: 0.06719970703125|unsuper_loss: 0.0
average reward score: 4.17578125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.92%) |Training time=1.01s (28.73%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.94
epoch: 0|step: 688|ppo_ep: 1|act_loss: 0.0631103515625|cri_loss: 0.0860595703125|unsuper_loss: 0.0
average reward score: 4.10546875
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.34s (66.32%) |Training time=1.00s (28.25%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.94
[2023-06-30 06:12:38,228] [INFO] [logging.py:96:log_dist] [Rank 0] step=690, skipped=14, lr=[2.3107581980333665e-06, 2.3107581980333665e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:12:38,261] [INFO] [timer.py:215:stop] epoch=0/micro_step=690/global_step=690, RunningAvgSamplesPerSec=46.378214896241275, CurrSamplesPerSec=47.45080298638957, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:12:38,419] [INFO] [logging.py:96:log_dist] [Rank 0] step=690, skipped=12, lr=[1.1816206013040314e-06, 1.1816206013040314e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 689|ppo_ep: 1|act_loss: 0.0015401840209960938|cri_loss: 0.045318603515625|unsuper_loss: 0.0
average reward score: 4.4609375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.04%) |Training time=1.00s (28.59%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.95
epoch: 0|step: 690|ppo_ep: 1|act_loss: -0.00116729736328125|cri_loss: 0.10284423828125|unsuper_loss: 0.0
average reward score: 4.16796875
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.05%) |Training time=1.00s (28.57%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.95
epoch: 0|step: 691|ppo_ep: 1|act_loss: 0.07623291015625|cri_loss: 0.0552978515625|unsuper_loss: 0.0
average reward score: 4.1328125
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.37s (66.80%) |Training time=0.99s (27.87%) |Others=0.19 (5.33%)|CurSamplesPerSec=9.03 |AvgSamplesPerSec=8.95
epoch: 0|step: 692|ppo_ep: 1|act_loss: 0.0703125|cri_loss: 0.04266357421875|unsuper_loss: 0.0
average reward score: 4.24609375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.93%) |Training time=1.01s (28.69%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.95
epoch: 0|step: 693|ppo_ep: 1|act_loss: 0.1431884765625|cri_loss: 0.0654296875|unsuper_loss: 0.0
average reward score: 4.35546875
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.33s (66.22%) |Training time=1.00s (28.44%) |Others=0.19 (5.33%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.95
epoch: 0|step: 694|ppo_ep: 1|act_loss: 0.0218963623046875|cri_loss: 0.074951171875|unsuper_loss: 0.0
average reward score: 4.40625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.10%) |Training time=1.00s (28.52%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.95
epoch: 0|step: 695|ppo_ep: 1|act_loss: -0.01654052734375|cri_loss: 0.05499267578125|unsuper_loss: 0.0
average reward score: 3.958984375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.22%) |Training time=0.99s (28.38%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.95
epoch: 0|step: 696|ppo_ep: 1|act_loss: 0.07513427734375|cri_loss: 0.063720703125|unsuper_loss: 0.0
average reward score: 3.900390625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.31s (66.18%) |Training time=0.99s (28.41%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.95
epoch: 0|step: 697|ppo_ep: 1|act_loss: -0.02911376953125|cri_loss: 0.051361083984375|unsuper_loss: 0.0
average reward score: 4.2734375
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.33s (66.13%) |Training time=1.00s (28.47%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.95
epoch: 0|step: 698|ppo_ep: 1|act_loss: 0.06756591796875|cri_loss: 0.055633544921875|unsuper_loss: 0.0
average reward score: 4.2734375
-------------------------------------------------------------------------------------
|E2E latency=3.76s |Gather latency=0.00s (0.00%) |Generate time=2.31s (61.40%) |Training time=1.26s (33.43%) |Others=0.19 (5.17%)|CurSamplesPerSec=8.51 |AvgSamplesPerSec=8.95
[2023-06-30 06:13:14,049] [INFO] [logging.py:96:log_dist] [Rank 0] step=700, skipped=14, lr=[2.1609995975129414e-06, 2.1609995975129414e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:13:14,082] [INFO] [timer.py:215:stop] epoch=0/micro_step=700/global_step=700, RunningAvgSamplesPerSec=46.33220459705616, CurrSamplesPerSec=29.15569102785869, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:13:14,243] [INFO] [logging.py:96:log_dist] [Rank 0] step=700, skipped=12, lr=[1.1043906949328387e-06, 1.1043906949328387e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 699|ppo_ep: 1|act_loss: -0.09307861328125|cri_loss: 0.0963134765625|unsuper_loss: 0.0
average reward score: 4.93359375
-------------------------------------------------------------------------------------
|E2E latency=3.97s |Gather latency=0.00s (0.00%) |Generate time=2.35s (59.18%) |Training time=1.43s (35.93%) |Others=0.19 (4.90%)|CurSamplesPerSec=8.05 |AvgSamplesPerSec=8.94
epoch: 0|step: 700|ppo_ep: 1|act_loss: 0.1502685546875|cri_loss: 0.07568359375|unsuper_loss: 0.0
average reward score: 4.34375
-------------------------------------------------------------------------------------
|E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=2.33s (58.53%) |Training time=1.46s (36.57%) |Others=0.20 (4.90%)|CurSamplesPerSec=8.02 |AvgSamplesPerSec=8.94
epoch: 0|step: 701|ppo_ep: 1|act_loss: 0.0005950927734375|cri_loss: 0.041748046875|unsuper_loss: 0.0
average reward score: 4.05078125
-------------------------------------------------------------------------------------
|E2E latency=3.93s |Gather latency=0.00s (0.00%) |Generate time=2.31s (58.86%) |Training time=1.43s (36.26%) |Others=0.19 (4.88%)|CurSamplesPerSec=8.14 |AvgSamplesPerSec=8.94
epoch: 0|step: 702|ppo_ep: 1|act_loss: -0.058074951171875|cri_loss: 0.0946044921875|unsuper_loss: 0.0
average reward score: 4.28515625
-------------------------------------------------------------------------------------
|E2E latency=3.93s |Gather latency=0.00s (0.00%) |Generate time=2.30s (58.63%) |Training time=1.43s (36.46%) |Others=0.19 (4.91%)|CurSamplesPerSec=8.15 |AvgSamplesPerSec=8.94
epoch: 0|step: 703|ppo_ep: 1|act_loss: 0.0499267578125|cri_loss: 0.05145263671875|unsuper_loss: 0.0
average reward score: 4.03125
-------------------------------------------------------------------------------------
|E2E latency=3.93s |Gather latency=0.00s (0.00%) |Generate time=2.31s (58.66%) |Training time=1.43s (36.43%) |Others=0.19 (4.91%)|CurSamplesPerSec=8.14 |AvgSamplesPerSec=8.94
epoch: 0|step: 704|ppo_ep: 1|act_loss: 0.139892578125|cri_loss: 0.043731689453125|unsuper_loss: 0.0
average reward score: 3.97265625
-------------------------------------------------------------------------------------
|E2E latency=3.94s |Gather latency=0.00s (0.00%) |Generate time=2.31s (58.60%) |Training time=1.43s (36.39%) |Others=0.20 (5.01%)|CurSamplesPerSec=8.13 |AvgSamplesPerSec=8.94
epoch: 0|step: 705|ppo_ep: 1|act_loss: -0.027587890625|cri_loss: 0.05206298828125|unsuper_loss: 0.0
average reward score: 4.2109375
-------------------------------------------------------------------------------------
|E2E latency=3.94s |Gather latency=0.00s (0.00%) |Generate time=2.30s (58.41%) |Training time=1.44s (36.62%) |Others=0.20 (4.96%)|CurSamplesPerSec=8.11 |AvgSamplesPerSec=8.94
epoch: 0|step: 706|ppo_ep: 1|act_loss: -0.043609619140625|cri_loss: 0.0865478515625|unsuper_loss: 0.0
average reward score: 4.25390625
-------------------------------------------------------------------------------------
|E2E latency=3.96s |Gather latency=0.00s (0.00%) |Generate time=2.35s (59.40%) |Training time=1.42s (35.74%) |Others=0.19 (4.87%)|CurSamplesPerSec=8.07 |AvgSamplesPerSec=8.94
epoch: 0|step: 707|ppo_ep: 1|act_loss: 0.0004391670227050781|cri_loss: 0.036956787109375|unsuper_loss: 0.0
average reward score: 4.03515625
-------------------------------------------------------------------------------------
|E2E latency=3.94s |Gather latency=0.00s (0.00%) |Generate time=2.35s (59.69%) |Training time=1.40s (35.43%) |Others=0.19 (4.88%)|CurSamplesPerSec=8.11 |AvgSamplesPerSec=8.93
epoch: 0|step: 708|ppo_ep: 1|act_loss: 0.07916259765625|cri_loss: 0.04534912109375|unsuper_loss: 0.0
average reward score: 3.884765625
-------------------------------------------------------------------------------------
|E2E latency=3.96s |Gather latency=0.00s (0.00%) |Generate time=2.32s (58.53%) |Training time=1.45s (36.51%) |Others=0.20 (4.96%)|CurSamplesPerSec=8.08 |AvgSamplesPerSec=8.93
[2023-06-30 06:13:53,536] [INFO] [logging.py:96:log_dist] [Rank 0] step=710, skipped=14, lr=[2.014845698541378e-06, 2.014845698541378e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:13:53,569] [INFO] [timer.py:215:stop] epoch=0/micro_step=710/global_step=710, RunningAvgSamplesPerSec=45.94521600600531, CurrSamplesPerSec=29.09007933910521, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:13:53,729] [INFO] [logging.py:96:log_dist] [Rank 0] step=710, skipped=12, lr=[1.0290492098317331e-06, 1.0290492098317331e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 709|ppo_ep: 1|act_loss: 0.040435791015625|cri_loss: 0.06134033203125|unsuper_loss: 0.0
average reward score: 4.203125
-------------------------------------------------------------------------------------
|E2E latency=3.96s |Gather latency=0.00s (0.00%) |Generate time=2.33s (58.98%) |Training time=1.43s (36.14%) |Others=0.19 (4.88%)|CurSamplesPerSec=8.09 |AvgSamplesPerSec=8.93
epoch: 0|step: 710|ppo_ep: 1|act_loss: 0.0853271484375|cri_loss: 0.0560302734375|unsuper_loss: 0.0
average reward score: 4.24609375
-------------------------------------------------------------------------------------
|E2E latency=3.94s |Gather latency=0.00s (0.00%) |Generate time=2.31s (58.57%) |Training time=1.44s (36.53%) |Others=0.19 (4.91%)|CurSamplesPerSec=8.13 |AvgSamplesPerSec=8.93
epoch: 0|step: 711|ppo_ep: 1|act_loss: -0.07611083984375|cri_loss: 0.09130859375|unsuper_loss: 0.0
average reward score: 3.88671875
-------------------------------------------------------------------------------------
|E2E latency=3.92s |Gather latency=0.00s (0.00%) |Generate time=2.30s (58.67%) |Training time=1.43s (36.43%) |Others=0.19 (4.91%)|CurSamplesPerSec=8.16 |AvgSamplesPerSec=8.93
epoch: 0|step: 712|ppo_ep: 1|act_loss: 0.020477294921875|cri_loss: 0.0618896484375|unsuper_loss: 0.0
average reward score: 4.234375
-------------------------------------------------------------------------------------
|E2E latency=3.93s |Gather latency=0.00s (0.00%) |Generate time=2.32s (59.06%) |Training time=1.41s (36.01%) |Others=0.19 (4.93%)|CurSamplesPerSec=8.15 |AvgSamplesPerSec=8.93
epoch: 0|step: 713|ppo_ep: 1|act_loss: 0.06890869140625|cri_loss: 0.0887451171875|unsuper_loss: 0.0
average reward score: 4.32421875
-------------------------------------------------------------------------------------
|E2E latency=3.95s |Gather latency=0.00s (0.00%) |Generate time=2.31s (58.61%) |Training time=1.44s (36.54%) |Others=0.19 (4.84%)|CurSamplesPerSec=8.11 |AvgSamplesPerSec=8.93
epoch: 0|step: 714|ppo_ep: 1|act_loss: -0.033355712890625|cri_loss: 0.0750732421875|unsuper_loss: 0.0
average reward score: 4.41796875
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.34s (66.41%) |Training time=0.99s (28.19%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.93
epoch: 0|step: 715|ppo_ep: 1|act_loss: -0.018096923828125|cri_loss: 0.04852294921875|unsuper_loss: 0.0
average reward score: 4.2890625
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.05%) |Training time=1.00s (28.54%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.93
epoch: 0|step: 716|ppo_ep: 1|act_loss: 0.045562744140625|cri_loss: 0.12060546875|unsuper_loss: 0.0
average reward score: 4.5234375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.08%) |Training time=1.00s (28.55%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.93
epoch: 0|step: 717|ppo_ep: 1|act_loss: -0.10626220703125|cri_loss: 0.0748291015625|unsuper_loss: 0.0
average reward score: 4.03125
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.94%) |Training time=1.01s (28.68%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.93
epoch: 0|step: 718|ppo_ep: 1|act_loss: 0.0550537109375|cri_loss: 0.0311279296875|unsuper_loss: 0.0
average reward score: 4.2265625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.27%) |Training time=0.99s (28.35%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.93
[2023-06-30 06:14:30,350] [INFO] [logging.py:96:log_dist] [Rank 0] step=720, skipped=14, lr=[1.8724942642967504e-06, 1.8724942642967504e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:14:30,383] [INFO] [timer.py:215:stop] epoch=0/micro_step=720/global_step=720, RunningAvgSamplesPerSec=45.80992684504119, CurrSamplesPerSec=47.04336390631265, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:14:30,541] [INFO] [logging.py:96:log_dist] [Rank 0] step=720, skipped=12, lr=[9.556980917691116e-07, 9.556980917691116e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 719|ppo_ep: 1|act_loss: 0.07666015625|cri_loss: 0.07122802734375|unsuper_loss: 0.0
average reward score: 4.5703125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.86%) |Training time=1.01s (28.78%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.93
epoch: 0|step: 720|ppo_ep: 1|act_loss: 0.130126953125|cri_loss: 0.06158447265625|unsuper_loss: 0.0
average reward score: 3.814453125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.12%) |Training time=1.00s (28.49%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.93
epoch: 0|step: 721|ppo_ep: 1|act_loss: 0.007480621337890625|cri_loss: 0.035919189453125|unsuper_loss: 0.0
average reward score: 4.5234375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.98%) |Training time=1.00s (28.64%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.93
epoch: 0|step: 722|ppo_ep: 1|act_loss: -0.02056884765625|cri_loss: 0.0472412109375|unsuper_loss: 0.0
average reward score: 4.390625
-------------------------------------------------------------------------------------
|E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.32s (64.47%) |Training time=1.09s (30.22%) |Others=0.19 (5.30%)|CurSamplesPerSec=8.91 |AvgSamplesPerSec=8.93
epoch: 0|step: 723|ppo_ep: 1|act_loss: -0.092041015625|cri_loss: 0.08551025390625|unsuper_loss: 0.0
average reward score: 4.375
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.33s (65.72%) |Training time=1.02s (28.93%) |Others=0.19 (5.34%)|CurSamplesPerSec=9.04 |AvgSamplesPerSec=8.93
epoch: 0|step: 724|ppo_ep: 1|act_loss: 0.09063720703125|cri_loss: 0.0640869140625|unsuper_loss: 0.0
average reward score: 4.671875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.99%) |Training time=1.00s (28.60%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.93
epoch: 0|step: 725|ppo_ep: 1|act_loss: -0.09619140625|cri_loss: 0.10906982421875|unsuper_loss: 0.0
average reward score: 4.5703125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.66%) |Training time=1.02s (28.96%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.93
epoch: 0|step: 726|ppo_ep: 1|act_loss: 0.016845703125|cri_loss: 0.056671142578125|unsuper_loss: 0.0
average reward score: 4.33984375
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.99%) |Training time=1.01s (28.65%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.93
epoch: 0|step: 727|ppo_ep: 1|act_loss: -0.0310211181640625|cri_loss: 0.051605224609375|unsuper_loss: 0.0
average reward score: 4.3515625
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.86%) |Training time=1.01s (28.78%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.93
epoch: 0|step: 728|ppo_ep: 1|act_loss: -0.08013916015625|cri_loss: 0.059722900390625|unsuper_loss: 0.0
average reward score: 4.53515625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.05%) |Training time=1.00s (28.57%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.93
[2023-06-30 06:15:05,555] [INFO] [logging.py:96:log_dist] [Rank 0] step=730, skipped=14, lr=[1.7341379127813052e-06, 1.7341379127813052e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:15:05,588] [INFO] [timer.py:215:stop] epoch=0/micro_step=730/global_step=730, RunningAvgSamplesPerSec=45.820726124231804, CurrSamplesPerSec=47.52885367543922, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:15:05,747] [INFO] [logging.py:96:log_dist] [Rank 0] step=730, skipped=12, lr=[8.844365933158975e-07, 8.844365933158975e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 729|ppo_ep: 1|act_loss: -0.024627685546875|cri_loss: 0.0765380859375|unsuper_loss: 0.0
average reward score: 4.12109375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.98%) |Training time=1.00s (28.62%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.93
[2023-06-30 06:15:09,050] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024
epoch: 0|step: 730|ppo_ep: 1|act_loss: 0.022308349609375|cri_loss: 0.0687255859375|unsuper_loss: 0.0
average reward score: 4.45703125
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.32s (66.92%) |Training time=0.96s (27.60%) |Others=0.19 (5.48%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.93
epoch: 0|step: 731|ppo_ep: 1|act_loss: -0.0743408203125|cri_loss: 0.080322265625|unsuper_loss: 0.0
average reward score: 4.37890625
-------------------------------------------------------------------------------------
|E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.44s (68.09%) |Training time=0.95s (26.54%) |Others=0.19 (5.37%)|CurSamplesPerSec=8.93 |AvgSamplesPerSec=8.93
epoch: 0|step: 732|ppo_ep: 1|act_loss: 0.033905029296875|cri_loss: 0.0897216796875|unsuper_loss: 0.0
average reward score: 4.48828125
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.46s (69.74%) |Training time=0.88s (24.89%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.07 |AvgSamplesPerSec=8.93
epoch: 0|step: 733|ppo_ep: 1|act_loss: 0.14697265625|cri_loss: 0.063232421875|unsuper_loss: 0.0
average reward score: 4.6796875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.63%) |Training time=0.87s (24.97%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.93
epoch: 0|step: 734|ppo_ep: 1|act_loss: 0.037933349609375|cri_loss: 0.0615234375|unsuper_loss: 0.0
average reward score: 4.4140625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.60%) |Training time=0.87s (24.94%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.93
epoch: 0|step: 735|ppo_ep: 1|act_loss: 0.032196044921875|cri_loss: 0.117919921875|unsuper_loss: 0.0
average reward score: 4.16015625
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.36%) |Training time=0.89s (25.26%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.93
epoch: 0|step: 736|ppo_ep: 1|act_loss: 0.058074951171875|cri_loss: 0.0694580078125|unsuper_loss: 0.0
average reward score: 4.41015625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.57%) |Training time=0.87s (25.02%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.93
epoch: 0|step: 737|ppo_ep: 1|act_loss: 0.05316162109375|cri_loss: 0.056610107421875|unsuper_loss: 0.0
average reward score: 4.28125
-------------------------------------------------------------------------------------
|E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.51s (69.71%) |Training time=0.90s (25.00%) |Others=0.19 (5.29%)|CurSamplesPerSec=8.89 |AvgSamplesPerSec=8.93
epoch: 0|step: 738|ppo_ep: 1|act_loss: -0.12890625|cri_loss: 0.140869140625|unsuper_loss: 0.0
average reward score: 4.7734375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.41%) |Training time=0.88s (25.19%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.93
[2023-06-30 06:15:40,743] [INFO] [logging.py:96:log_dist] [Rank 0] step=740, skipped=15, lr=[1.6131878388300449e-06, 1.6131878388300449e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:15:40,776] [INFO] [timer.py:215:stop] epoch=0/micro_step=740/global_step=740, RunningAvgSamplesPerSec=45.939300296958294, CurrSamplesPerSec=57.732599168020535, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:15:40,937] [INFO] [logging.py:96:log_dist] [Rank 0] step=740, skipped=12, lr=[8.153611395453046e-07, 8.153611395453046e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 739|ppo_ep: 1|act_loss: 0.0390625|cri_loss: 0.07098388671875|unsuper_loss: 0.0
average reward score: 4.1171875
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.36%) |Training time=0.88s (25.17%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.93
epoch: 0|step: 740|ppo_ep: 1|act_loss: 0.06689453125|cri_loss: 0.09893798828125|unsuper_loss: 0.0
average reward score: 4.3984375
-------------------------------------------------------------------------------------
|E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.46s (67.73%) |Training time=0.98s (26.95%) |Others=0.19 (5.31%)|CurSamplesPerSec=8.82 |AvgSamplesPerSec=8.93
epoch: 0|step: 741|ppo_ep: 1|act_loss: 0.0297393798828125|cri_loss: 0.052886962890625|unsuper_loss: 0.0
average reward score: 4.41796875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.44%) |Training time=0.88s (25.15%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.93
epoch: 0|step: 742|ppo_ep: 1|act_loss: 0.060882568359375|cri_loss: 0.045318603515625|unsuper_loss: 0.0
average reward score: 4.4140625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.64%) |Training time=0.87s (24.98%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.93
epoch: 0|step: 743|ppo_ep: 1|act_loss: 0.08074951171875|cri_loss: 0.07574462890625|unsuper_loss: 0.0
average reward score: 4.53515625
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.36%) |Training time=0.89s (25.24%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.93
epoch: 0|step: 744|ppo_ep: 1|act_loss: -0.098388671875|cri_loss: 0.09869384765625|unsuper_loss: 0.0
average reward score: 4.6875
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.32%) |Training time=0.89s (25.26%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.93
epoch: 0|step: 745|ppo_ep: 1|act_loss: -0.0838623046875|cri_loss: 0.10205078125|unsuper_loss: 0.0
average reward score: 4.44921875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.76%) |Training time=0.87s (24.80%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.93
epoch: 0|step: 746|ppo_ep: 1|act_loss: -0.08807373046875|cri_loss: 0.09954833984375|unsuper_loss: 0.0
average reward score: 4.375
-------------------------------------------------------------------------------------
|E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.46s (69.44%) |Training time=0.89s (25.20%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.02 |AvgSamplesPerSec=8.93
epoch: 0|step: 747|ppo_ep: 1|act_loss: -0.11712646484375|cri_loss: 0.08837890625|unsuper_loss: 0.0
average reward score: 4.35546875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.53%) |Training time=0.88s (25.07%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.93
epoch: 0|step: 748|ppo_ep: 1|act_loss: -0.0301361083984375|cri_loss: 0.0931396484375|unsuper_loss: 0.0
average reward score: 4.578125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.58%) |Training time=0.87s (24.97%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.93
[2023-06-30 06:16:15,973] [INFO] [logging.py:96:log_dist] [Rank 0] step=750, skipped=15, lr=[1.482933241929494e-06, 1.482933241929494e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:16:16,005] [INFO] [timer.py:215:stop] epoch=0/micro_step=750/global_step=750, RunningAvgSamplesPerSec=46.06270332660428, CurrSamplesPerSec=60.063719995041595, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:16:16,164] [INFO] [logging.py:96:log_dist] [Rank 0] step=750, skipped=12, lr=[7.485651975585237e-07, 7.485651975585237e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 749|ppo_ep: 1|act_loss: 0.058990478515625|cri_loss: 0.1243896484375|unsuper_loss: 0.0
average reward score: 4.546875
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.49s (70.23%) |Training time=0.86s (24.39%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.04 |AvgSamplesPerSec=8.93
epoch: 0|step: 750|ppo_ep: 1|act_loss: -0.055450439453125|cri_loss: 0.1334228515625|unsuper_loss: 0.0
average reward score: 4.3984375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.38s (67.83%) |Training time=0.94s (26.81%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.94
epoch: 0|step: 751|ppo_ep: 1|act_loss: 0.0201416015625|cri_loss: 0.09765625|unsuper_loss: 0.0
average reward score: 4.25390625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.00%) |Training time=0.93s (26.54%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.94
epoch: 0|step: 752|ppo_ep: 1|act_loss: -0.1468505859375|cri_loss: 0.1175537109375|unsuper_loss: 0.0
average reward score: 4.484375
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.40s (68.10%) |Training time=0.93s (26.53%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=8.94
epoch: 0|step: 753|ppo_ep: 1|act_loss: -0.03790283203125|cri_loss: 0.057586669921875|unsuper_loss: 0.0
average reward score: 4.4296875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.37s (67.79%) |Training time=0.94s (26.76%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.94
epoch: 0|step: 754|ppo_ep: 1|act_loss: 0.05999755859375|cri_loss: 0.04779052734375|unsuper_loss: 0.0
average reward score: 4.1015625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.38s (67.91%) |Training time=0.93s (26.55%) |Others=0.19 (5.54%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.94
epoch: 0|step: 755|ppo_ep: 1|act_loss: 0.09783935546875|cri_loss: 0.0555419921875|unsuper_loss: 0.0
average reward score: 4.19140625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.38s (67.77%) |Training time=0.94s (26.78%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.94
epoch: 0|step: 756|ppo_ep: 1|act_loss: 0.06201171875|cri_loss: 0.05712890625|unsuper_loss: 0.0
average reward score: 4.83203125
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.48s (69.96%) |Training time=0.87s (24.67%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.04 |AvgSamplesPerSec=8.94
epoch: 0|step: 757|ppo_ep: 1|act_loss: 0.0489501953125|cri_loss: 0.0443115234375|unsuper_loss: 0.0
average reward score: 4.28125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.41%) |Training time=0.89s (25.22%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.94
epoch: 0|step: 758|ppo_ep: 1|act_loss: 0.0012378692626953125|cri_loss: 0.056915283203125|unsuper_loss: 0.0
average reward score: 4.46875
-------------------------------------------------------------------------------------
|E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.49s (69.82%) |Training time=0.89s (24.84%) |Others=0.19 (5.34%)|CurSamplesPerSec=8.96 |AvgSamplesPerSec=8.94
[2023-06-30 06:16:51,135] [INFO] [logging.py:96:log_dist] [Rank 0] step=760, skipped=15, lr=[1.3572008490075794e-06, 1.3572008490075794e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:16:51,168] [INFO] [timer.py:215:stop] epoch=0/micro_step=760/global_step=760, RunningAvgSamplesPerSec=46.1593012332379, CurrSamplesPerSec=57.255115486913475, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:16:51,326] [INFO] [logging.py:96:log_dist] [Rank 0] step=760, skipped=12, lr=[6.841391500128983e-07, 6.841391500128983e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 759|ppo_ep: 1|act_loss: 0.07415771484375|cri_loss: 0.0736083984375|unsuper_loss: 0.0
average reward score: 4.5625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.17%) |Training time=0.89s (25.43%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.94
epoch: 0|step: 760|ppo_ep: 1|act_loss: -0.0166778564453125|cri_loss: 0.049346923828125|unsuper_loss: 0.0
average reward score: 4.453125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.16%) |Training time=0.89s (25.38%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.94
epoch: 0|step: 761|ppo_ep: 1|act_loss: -0.1107177734375|cri_loss: 0.1102294921875|unsuper_loss: 0.0
average reward score: 4.21484375
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.30%) |Training time=0.89s (25.34%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=8.94
epoch: 0|step: 762|ppo_ep: 1|act_loss: -0.0076751708984375|cri_loss: 0.0521240234375|unsuper_loss: 0.0
average reward score: 4.0390625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.30%) |Training time=0.88s (25.29%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.94
epoch: 0|step: 763|ppo_ep: 1|act_loss: -0.0158843994140625|cri_loss: 0.10577392578125|unsuper_loss: 0.0
average reward score: 4.34765625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.32%) |Training time=0.88s (25.27%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.94
epoch: 0|step: 764|ppo_ep: 1|act_loss: -0.0008687973022460938|cri_loss: 0.051971435546875|unsuper_loss: 0.0
average reward score: 4.41796875
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.05%) |Training time=0.90s (25.59%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.94
epoch: 0|step: 765|ppo_ep: 1|act_loss: 0.0295867919921875|cri_loss: 0.05059814453125|unsuper_loss: 0.0
average reward score: 3.6796875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.41s (68.93%) |Training time=0.90s (25.67%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.94
epoch: 0|step: 766|ppo_ep: 1|act_loss: -0.1478271484375|cri_loss: 0.164794921875|unsuper_loss: 0.0
average reward score: 4.8984375
-------------------------------------------------------------------------------------
|E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.47s (69.49%) |Training time=0.89s (25.11%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.02 |AvgSamplesPerSec=8.94
epoch: 0|step: 767|ppo_ep: 1|act_loss: -0.05908203125|cri_loss: 0.08319091796875|unsuper_loss: 0.0
average reward score: 4.0546875
-------------------------------------------------------------------------------------
|E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.52s (69.87%) |Training time=0.90s (24.87%) |Others=0.19 (5.26%)|CurSamplesPerSec=8.86 |AvgSamplesPerSec=8.94
epoch: 0|step: 768|ppo_ep: 1|act_loss: -0.1317138671875|cri_loss: 0.0946044921875|unsuper_loss: 0.0
average reward score: 4.4765625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.19%) |Training time=0.89s (25.42%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.94
[2023-06-30 06:17:26,328] [INFO] [logging.py:96:log_dist] [Rank 0] step=770, skipped=15, lr=[1.2361607905759474e-06, 1.2361607905759474e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:17:26,362] [INFO] [timer.py:215:stop] epoch=0/micro_step=770/global_step=770, RunningAvgSamplesPerSec=46.274176461126196, CurrSamplesPerSec=56.97911664317615, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:17:26,520] [INFO] [logging.py:96:log_dist] [Rank 0] step=770, skipped=12, lr=[6.221701728237008e-07, 6.221701728237008e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 769|ppo_ep: 1|act_loss: -0.10595703125|cri_loss: 0.047943115234375|unsuper_loss: 0.0
average reward score: 4.42578125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.09%) |Training time=0.89s (25.52%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.94
epoch: 0|step: 770|ppo_ep: 1|act_loss: -0.002719879150390625|cri_loss: 0.058563232421875|unsuper_loss: 0.0
average reward score: 4.44921875
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.45s (69.63%) |Training time=0.88s (25.02%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=8.94
epoch: 0|step: 771|ppo_ep: 1|act_loss: 0.09326171875|cri_loss: 0.059844970703125|unsuper_loss: 0.0
average reward score: 4.421875
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.14%) |Training time=0.89s (25.47%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.94
epoch: 0|step: 772|ppo_ep: 1|act_loss: -0.0018472671508789062|cri_loss: 0.0977783203125|unsuper_loss: 0.0
average reward score: 3.517578125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.42s (69.06%) |Training time=0.89s (25.47%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.94
epoch: 0|step: 773|ppo_ep: 1|act_loss: -0.0711669921875|cri_loss: 0.0706787109375|unsuper_loss: 0.0
average reward score: 3.984375
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.09%) |Training time=0.90s (25.52%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.94
epoch: 0|step: 774|ppo_ep: 1|act_loss: -0.10760498046875|cri_loss: 0.06787109375|unsuper_loss: 0.0
average reward score: 4.05859375
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.43s (68.98%) |Training time=0.90s (25.61%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.94
epoch: 0|step: 775|ppo_ep: 1|act_loss: 0.00034046173095703125|cri_loss: 0.06842041015625|unsuper_loss: 0.0
average reward score: 4.28515625
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.45s (69.40%) |Training time=0.89s (25.25%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.06 |AvgSamplesPerSec=8.94
epoch: 0|step: 776|ppo_ep: 1|act_loss: -0.049102783203125|cri_loss: 0.0721435546875|unsuper_loss: 0.0
average reward score: 4.25
-------------------------------------------------------------------------------------
|E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.53s (69.80%) |Training time=0.90s (24.94%) |Others=0.19 (5.26%)|CurSamplesPerSec=8.83 |AvgSamplesPerSec=8.94
epoch: 0|step: 777|ppo_ep: 1|act_loss: 0.0555419921875|cri_loss: 0.05908203125|unsuper_loss: 0.0
average reward score: 3.8828125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.40s (68.36%) |Training time=0.92s (26.26%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.94
epoch: 0|step: 778|ppo_ep: 1|act_loss: 0.0295562744140625|cri_loss: 0.0355224609375|unsuper_loss: 0.0
average reward score: 4.45703125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.26%) |Training time=0.92s (26.30%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.94
[2023-06-30 06:18:01,555] [INFO] [logging.py:96:log_dist] [Rank 0] step=780, skipped=15, lr=[1.1199768478734052e-06, 1.1199768478734052e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:18:01,589] [INFO] [timer.py:215:stop] epoch=0/micro_step=780/global_step=780, RunningAvgSamplesPerSec=46.37684575047605, CurrSamplesPerSec=53.63343631031516, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:18:01,748] [INFO] [logging.py:96:log_dist] [Rank 0] step=780, skipped=12, lr=[5.627421172050096e-07, 5.627421172050096e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 779|ppo_ep: 1|act_loss: 0.0233001708984375|cri_loss: 0.048492431640625|unsuper_loss: 0.0
average reward score: 4.20703125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.17%) |Training time=0.93s (26.45%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.94
epoch: 0|step: 780|ppo_ep: 1|act_loss: -0.03826904296875|cri_loss: 0.0816650390625|unsuper_loss: 0.0
average reward score: 4.48046875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.40s (68.41%) |Training time=0.92s (26.18%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.94
epoch: 0|step: 781|ppo_ep: 1|act_loss: 0.040740966796875|cri_loss: 0.057037353515625|unsuper_loss: 0.0
average reward score: 4.703125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.39s (67.96%) |Training time=0.94s (26.68%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.94
epoch: 0|step: 782|ppo_ep: 1|act_loss: 0.0094757080078125|cri_loss: 0.05218505859375|unsuper_loss: 0.0
average reward score: 3.958984375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.17%) |Training time=0.93s (26.45%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.94
epoch: 0|step: 783|ppo_ep: 1|act_loss: 0.07415771484375|cri_loss: 0.0667724609375|unsuper_loss: 0.0
average reward score: 3.77734375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.43%) |Training time=0.91s (26.19%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.94
epoch: 0|step: 784|ppo_ep: 1|act_loss: -0.037322998046875|cri_loss: 0.05755615234375|unsuper_loss: 0.0
average reward score: 4.26171875
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.42s (68.69%) |Training time=0.92s (25.96%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.94
epoch: 0|step: 785|ppo_ep: 1|act_loss: -0.08258056640625|cri_loss: 0.07568359375|unsuper_loss: 0.0
average reward score: 4.42578125
-------------------------------------------------------------------------------------
|E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.41s (67.95%) |Training time=0.94s (26.57%) |Others=0.19 (5.48%)|CurSamplesPerSec=9.02 |AvgSamplesPerSec=8.94
epoch: 0|step: 786|ppo_ep: 1|act_loss: -0.100830078125|cri_loss: 0.08990478515625|unsuper_loss: 0.0
average reward score: 3.7421875
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.42s (68.59%) |Training time=0.92s (26.05%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.06 |AvgSamplesPerSec=8.94
epoch: 0|step: 787|ppo_ep: 1|act_loss: 0.010772705078125|cri_loss: 0.06787109375|unsuper_loss: 0.0
average reward score: 4.62890625
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.41s (68.51%) |Training time=0.92s (26.13%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=8.94
epoch: 0|step: 788|ppo_ep: 1|act_loss: -0.11676025390625|cri_loss: 0.0426025390625|unsuper_loss: 0.0
average reward score: 3.974609375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.44%) |Training time=0.91s (26.16%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.94
[2023-06-30 06:18:36,680] [INFO] [logging.py:96:log_dist] [Rank 0] step=790, skipped=15, lr=[1.008806231250907e-06, 1.008806231250907e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:18:36,713] [INFO] [timer.py:215:stop] epoch=0/micro_step=790/global_step=790, RunningAvgSamplesPerSec=46.459897366741764, CurrSamplesPerSec=54.04256155287016, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:18:36,872] [INFO] [logging.py:96:log_dist] [Rank 0] step=790, skipped=12, lr=[5.059353962092916e-07, 5.059353962092916e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 789|ppo_ep: 1|act_loss: -0.0006694793701171875|cri_loss: 0.048858642578125|unsuper_loss: 0.0
average reward score: 3.921875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.13%) |Training time=0.93s (26.45%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.94
[2023-06-30 06:18:40,355] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 790|ppo_ep: 1|act_loss: 0.0576171875|cri_loss: 0.048614501953125|unsuper_loss: 0.0
average reward score: 3.912109375
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.65%) |Training time=0.91s (26.28%) |Others=0.18 (5.07%)|CurSamplesPerSec=9.20 |AvgSamplesPerSec=8.94
epoch: 0|step: 791|ppo_ep: 1|act_loss: -0.128662109375|cri_loss: 0.11224365234375|unsuper_loss: 0.0
average reward score: 4.4140625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.46%) |Training time=0.91s (26.14%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.94
epoch: 0|step: 792|ppo_ep: 1|act_loss: -0.040771484375|cri_loss: 0.06903076171875|unsuper_loss: 0.0
average reward score: 4.078125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.13%) |Training time=0.93s (26.47%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.94
epoch: 0|step: 793|ppo_ep: 1|act_loss: -0.06304931640625|cri_loss: 0.070556640625|unsuper_loss: 0.0
average reward score: 3.546875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.48%) |Training time=0.91s (26.12%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.94
epoch: 0|step: 794|ppo_ep: 1|act_loss: -0.137451171875|cri_loss: 0.1512451171875|unsuper_loss: 0.0
average reward score: 4.49609375
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.40s (67.90%) |Training time=0.93s (26.36%) |Others=0.20 (5.74%)|CurSamplesPerSec=9.07 |AvgSamplesPerSec=8.94
epoch: 0|step: 795|ppo_ep: 1|act_loss: 0.031768798828125|cri_loss: 0.0136566162109375|unsuper_loss: 0.0
average reward score: 4.3515625
-------------------------------------------------------------------------------------
|E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.45s (68.66%) |Training time=0.93s (26.01%) |Others=0.19 (5.33%)|CurSamplesPerSec=8.97 |AvgSamplesPerSec=8.94
epoch: 0|step: 796|ppo_ep: 1|act_loss: 0.0232391357421875|cri_loss: 0.09271240234375|unsuper_loss: 0.0
average reward score: 4.234375
-------------------------------------------------------------------------------------
|E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.46s (68.77%) |Training time=0.93s (25.93%) |Others=0.19 (5.30%)|CurSamplesPerSec=8.95 |AvgSamplesPerSec=8.94
epoch: 0|step: 797|ppo_ep: 1|act_loss: -0.08502197265625|cri_loss: 0.053985595703125|unsuper_loss: 0.0
average reward score: 4.22265625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.08%) |Training time=0.93s (26.54%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.94
epoch: 0|step: 798|ppo_ep: 1|act_loss: -0.052093505859375|cri_loss: 0.09674072265625|unsuper_loss: 0.0
average reward score: 4.5390625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.39%) |Training time=0.92s (26.20%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.94
[2023-06-30 06:19:11,817] [INFO] [logging.py:96:log_dist] [Rank 0] step=800, skipped=15, lr=[9.02799367447708e-07, 9.02799367447708e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:19:11,851] [INFO] [timer.py:215:stop] epoch=0/micro_step=800/global_step=800, RunningAvgSamplesPerSec=46.54201370096247, CurrSamplesPerSec=54.011614585072955, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:19:12,008] [INFO] [logging.py:96:log_dist] [Rank 0] step=800, skipped=13, lr=[4.571141932131315e-07, 4.571141932131315e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 799|ppo_ep: 1|act_loss: -0.01241302490234375|cri_loss: 0.035980224609375|unsuper_loss: 0.0
average reward score: 4.1953125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.21%) |Training time=0.92s (26.42%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.94
epoch: 0|step: 800|ppo_ep: 1|act_loss: 0.0182037353515625|cri_loss: 0.019256591796875|unsuper_loss: 0.0
average reward score: 4.3359375
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.03%) |Training time=0.93s (26.53%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.95
epoch: 0|step: 801|ppo_ep: 1|act_loss: 0.0777587890625|cri_loss: 0.0255584716796875|unsuper_loss: 0.0
average reward score: 4.34375
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.02%) |Training time=0.90s (25.64%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.07 |AvgSamplesPerSec=8.95
epoch: 0|step: 802|ppo_ep: 1|act_loss: 0.02239990234375|cri_loss: 0.03271484375|unsuper_loss: 0.0
average reward score: 4.44921875
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.39s (67.92%) |Training time=0.94s (26.69%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.95
epoch: 0|step: 803|ppo_ep: 1|act_loss: -0.087890625|cri_loss: 0.058074951171875|unsuper_loss: 0.0
average reward score: 4.08203125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.17%) |Training time=0.93s (26.44%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.95
epoch: 0|step: 804|ppo_ep: 1|act_loss: -0.10162353515625|cri_loss: 0.055023193359375|unsuper_loss: 0.0
average reward score: 4.69921875
-------------------------------------------------------------------------------------
|E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.44s (68.42%) |Training time=0.94s (26.27%) |Others=0.19 (5.31%)|CurSamplesPerSec=8.98 |AvgSamplesPerSec=8.95
epoch: 0|step: 805|ppo_ep: 1|act_loss: -0.09283447265625|cri_loss: 0.04736328125|unsuper_loss: 0.0
average reward score: 4.17578125
-------------------------------------------------------------------------------------
|E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.44s (68.47%) |Training time=0.93s (26.15%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.00 |AvgSamplesPerSec=8.95
epoch: 0|step: 806|ppo_ep: 1|act_loss: 0.09649658203125|cri_loss: 0.07818603515625|unsuper_loss: 0.0
average reward score: 4.1015625
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.43s (68.60%) |Training time=0.92s (26.06%) |Others=0.19 (5.34%)|CurSamplesPerSec=9.05 |AvgSamplesPerSec=8.95
epoch: 0|step: 807|ppo_ep: 1|act_loss: 0.0333251953125|cri_loss: 0.0328369140625|unsuper_loss: 0.0
average reward score: 4.5
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.36%) |Training time=0.92s (26.26%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.95
epoch: 0|step: 808|ppo_ep: 1|act_loss: -0.049896240234375|cri_loss: 0.0384521484375|unsuper_loss: 0.0
average reward score: 4.0078125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.14%) |Training time=0.93s (26.47%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.95
[2023-06-30 06:19:47,040] [INFO] [logging.py:96:log_dist] [Rank 0] step=810, skipped=15, lr=[8.020996960465471e-07, 8.020996960465471e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:19:47,073] [INFO] [timer.py:215:stop] epoch=0/micro_step=810/global_step=810, RunningAvgSamplesPerSec=46.61934648260787, CurrSamplesPerSec=54.30035945973828, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:19:47,232] [INFO] [logging.py:96:log_dist] [Rank 0] step=810, skipped=13, lr=[4.0549675859491657e-07, 4.0549675859491657e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 809|ppo_ep: 1|act_loss: 0.03936767578125|cri_loss: 0.0294952392578125|unsuper_loss: 0.0
average reward score: 4.08203125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.29%) |Training time=0.92s (26.31%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.95
epoch: 0|step: 810|ppo_ep: 1|act_loss: 0.04559326171875|cri_loss: 0.048492431640625|unsuper_loss: 0.0
average reward score: 4.3828125
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.42s (68.51%) |Training time=0.93s (26.17%) |Others=0.19 (5.32%)|CurSamplesPerSec=9.05 |AvgSamplesPerSec=8.95
epoch: 0|step: 811|ppo_ep: 1|act_loss: 0.05975341796875|cri_loss: 0.0269622802734375|unsuper_loss: 0.0
average reward score: 4.37109375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.18%) |Training time=0.93s (26.44%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.95
epoch: 0|step: 812|ppo_ep: 1|act_loss: 0.0016756057739257812|cri_loss: 0.0293121337890625|unsuper_loss: 0.0
average reward score: 4.31640625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.15%) |Training time=0.93s (26.43%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.95
epoch: 0|step: 813|ppo_ep: 1|act_loss: 0.034088134765625|cri_loss: 0.040496826171875|unsuper_loss: 0.0
average reward score: 4.58984375
-------------------------------------------------------------------------------------
|E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.44s (68.51%) |Training time=0.93s (26.18%) |Others=0.19 (5.31%)|CurSamplesPerSec=8.99 |AvgSamplesPerSec=8.95
epoch: 0|step: 814|ppo_ep: 1|act_loss: -0.051513671875|cri_loss: 0.0428466796875|unsuper_loss: 0.0
average reward score: 4.46875
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.27%) |Training time=0.92s (26.35%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.95
epoch: 0|step: 815|ppo_ep: 1|act_loss: 0.0258026123046875|cri_loss: 0.045257568359375|unsuper_loss: 0.0
average reward score: 4.28125
-------------------------------------------------------------------------------------
|E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.46s (68.74%) |Training time=0.93s (25.97%) |Others=0.19 (5.29%)|CurSamplesPerSec=8.95 |AvgSamplesPerSec=8.95
[2023-06-30 06:20:11,912] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384
epoch: 0|step: 816|ppo_ep: 1|act_loss: 0.0545654296875|cri_loss: 0.0535888671875|unsuper_loss: 0.0
average reward score: 4.28125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.41%) |Training time=0.93s (26.55%) |Others=0.18 (5.05%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.95
epoch: 0|step: 817|ppo_ep: 1|act_loss: 0.020904541015625|cri_loss: 0.036163330078125|unsuper_loss: 0.0
average reward score: 4.23046875
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.22%) |Training time=0.92s (26.36%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.95
epoch: 0|step: 818|ppo_ep: 1|act_loss: -0.010101318359375|cri_loss: 0.057037353515625|unsuper_loss: 0.0
average reward score: 4.2421875
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.40s (68.07%) |Training time=0.94s (26.55%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.07 |AvgSamplesPerSec=8.95
[2023-06-30 06:20:22,248] [INFO] [logging.py:96:log_dist] [Rank 0] step=820, skipped=15, lr=[7.068434753832422e-07, 7.068434753832422e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:20:22,281] [INFO] [timer.py:215:stop] epoch=0/micro_step=820/global_step=820, RunningAvgSamplesPerSec=46.69381561028451, CurrSamplesPerSec=54.12280140039373, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:20:22,439] [INFO] [logging.py:96:log_dist] [Rank 0] step=820, skipped=14, lr=[3.614623161842565e-07, 3.614623161842565e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 819|ppo_ep: 1|act_loss: -0.042327880859375|cri_loss: 0.0504150390625|unsuper_loss: 0.0
average reward score: 4.48828125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.26%) |Training time=0.92s (26.37%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.95
epoch: 0|step: 820|ppo_ep: 1|act_loss: -0.059051513671875|cri_loss: 0.06732177734375|unsuper_loss: 0.0
average reward score: 4.1015625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.22%) |Training time=0.92s (26.41%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.95
epoch: 0|step: 821|ppo_ep: 1|act_loss: -0.04425048828125|cri_loss: 0.043975830078125|unsuper_loss: 0.0
average reward score: 4.78125
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.40s (68.14%) |Training time=0.93s (26.47%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.95
epoch: 0|step: 822|ppo_ep: 1|act_loss: -0.1575927734375|cri_loss: 0.11474609375|unsuper_loss: 0.0
average reward score: 3.74609375
-------------------------------------------------------------------------------------
|E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.47s (68.99%) |Training time=0.92s (25.69%) |Others=0.19 (5.31%)|CurSamplesPerSec=8.95 |AvgSamplesPerSec=8.95
epoch: 0|step: 823|ppo_ep: 1|act_loss: -0.0791015625|cri_loss: 0.04315185546875|unsuper_loss: 0.0
average reward score: 4.15625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.11%) |Training time=0.93s (26.53%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.95
epoch: 0|step: 824|ppo_ep: 1|act_loss: 0.08697509765625|cri_loss: 0.0304718017578125|unsuper_loss: 0.0
average reward score: 4.3828125
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.41s (68.24%) |Training time=0.93s (26.40%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.07 |AvgSamplesPerSec=8.95
epoch: 0|step: 825|ppo_ep: 1|act_loss: 0.08001708984375|cri_loss: 0.056854248046875|unsuper_loss: 0.0
average reward score: 3.83203125
-------------------------------------------------------------------------------------
|E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.46s (69.41%) |Training time=0.89s (25.22%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.04 |AvgSamplesPerSec=8.95
epoch: 0|step: 826|ppo_ep: 1|act_loss: -0.021392822265625|cri_loss: 0.06695556640625|unsuper_loss: 0.0
average reward score: 4.515625
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.40s (69.08%) |Training time=0.89s (25.50%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.95
epoch: 0|step: 827|ppo_ep: 1|act_loss: 0.07659912109375|cri_loss: 0.0377197265625|unsuper_loss: 0.0
average reward score: 4.18359375
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.47s (69.88%) |Training time=0.87s (24.76%) |Others=0.19 (5.35%)|CurSamplesPerSec=9.07 |AvgSamplesPerSec=8.95
epoch: 0|step: 828|ppo_ep: 1|act_loss: 0.04168701171875|cri_loss: 0.06036376953125|unsuper_loss: 0.0
average reward score: 3.9453125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.04%) |Training time=0.93s (26.56%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.95
[2023-06-30 06:20:57,384] [INFO] [logging.py:96:log_dist] [Rank 0] step=830, skipped=15, lr=[6.171595981733693e-07, 6.171595981733693e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:20:57,418] [INFO] [timer.py:215:stop] epoch=0/micro_step=830/global_step=830, RunningAvgSamplesPerSec=46.778227350693946, CurrSamplesPerSec=54.55583837020718, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:20:57,576] [INFO] [logging.py:96:log_dist] [Rank 0] step=830, skipped=14, lr=[3.1528623193564286e-07, 3.1528623193564286e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 829|ppo_ep: 1|act_loss: 0.00798797607421875|cri_loss: 0.049285888671875|unsuper_loss: 0.0
average reward score: 4.53125
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.27%) |Training time=0.92s (26.30%) |Others=0.19 (5.43%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.95
epoch: 0|step: 830|ppo_ep: 1|act_loss: 0.037353515625|cri_loss: 0.03436279296875|unsuper_loss: 0.0
average reward score: 3.962890625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.18%) |Training time=0.92s (26.41%) |Others=0.19 (5.41%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.95
epoch: 0|step: 831|ppo_ep: 1|act_loss: 0.0633544921875|cri_loss: 0.0758056640625|unsuper_loss: 0.0
average reward score: 4.28515625
-------------------------------------------------------------------------------------
|E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.53s (69.46%) |Training time=0.92s (25.32%) |Others=0.19 (5.22%)|CurSamplesPerSec=8.80 |AvgSamplesPerSec=8.95
epoch: 0|step: 832|ppo_ep: 1|act_loss: 0.098876953125|cri_loss: 0.05548095703125|unsuper_loss: 0.0
average reward score: 4.2734375
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.43%) |Training time=0.91s (26.17%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.95
epoch: 0|step: 833|ppo_ep: 1|act_loss: -0.001171112060546875|cri_loss: 0.037933349609375|unsuper_loss: 0.0
average reward score: 4.38671875
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.35%) |Training time=0.91s (26.25%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.95
epoch: 0|step: 834|ppo_ep: 1|act_loss: -0.004489898681640625|cri_loss: 0.08819580078125|unsuper_loss: 0.0
average reward score: 4.50390625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.20%) |Training time=0.92s (26.35%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.95
epoch: 0|step: 835|ppo_ep: 1|act_loss: -0.00676727294921875|cri_loss: 0.034454345703125|unsuper_loss: 0.0
average reward score: 4.390625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.41s (68.51%) |Training time=0.92s (26.10%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.95
epoch: 0|step: 836|ppo_ep: 1|act_loss: 0.066650390625|cri_loss: 0.041900634765625|unsuper_loss: 0.0
average reward score: 4.453125
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.91%) |Training time=0.89s (25.64%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.24 |AvgSamplesPerSec=8.95
epoch: 0|step: 837|ppo_ep: 1|act_loss: 0.01139068603515625|cri_loss: 0.045654296875|unsuper_loss: 0.0
average reward score: 4.6640625
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.39s (69.03%) |Training time=0.88s (25.55%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.26 |AvgSamplesPerSec=8.95
epoch: 0|step: 838|ppo_ep: 1|act_loss: -0.152587890625|cri_loss: 0.103759765625|unsuper_loss: 0.0
average reward score: 3.990234375
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.39s (69.03%) |Training time=0.88s (25.51%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.24 |AvgSamplesPerSec=8.95
[2023-06-30 06:21:32,367] [INFO] [logging.py:96:log_dist] [Rank 0] step=840, skipped=15, lr=[5.33169417105455e-07, 5.33169417105455e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:21:32,399] [INFO] [timer.py:215:stop] epoch=0/micro_step=840/global_step=840, RunningAvgSamplesPerSec=46.86643065005567, CurrSamplesPerSec=56.66324902657206, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:21:32,559] [INFO] [logging.py:96:log_dist] [Rank 0] step=840, skipped=14, lr=[2.720663188258199e-07, 2.720663188258199e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 839|ppo_ep: 1|act_loss: -0.028411865234375|cri_loss: 0.037933349609375|unsuper_loss: 0.0
average reward score: 3.9296875
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.74%) |Training time=0.90s (25.81%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.21 |AvgSamplesPerSec=8.95
epoch: 0|step: 840|ppo_ep: 1|act_loss: 0.0234375|cri_loss: 0.031005859375|unsuper_loss: 0.0
average reward score: 4.1484375
-------------------------------------------------------------------------------------
|E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.48s (69.61%) |Training time=0.89s (25.07%) |Others=0.19 (5.32%)|CurSamplesPerSec=8.97 |AvgSamplesPerSec=8.95
epoch: 0|step: 841|ppo_ep: 1|act_loss: -0.0518798828125|cri_loss: 0.067626953125|unsuper_loss: 0.0
average reward score: 4.2578125
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.39s (68.92%) |Training time=0.89s (25.66%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.24 |AvgSamplesPerSec=8.95
epoch: 0|step: 842|ppo_ep: 1|act_loss: -0.0008764266967773438|cri_loss: 0.039276123046875|unsuper_loss: 0.0
average reward score: 4.66796875
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.40s (69.09%) |Training time=0.88s (25.43%) |Others=0.19 (5.48%)|CurSamplesPerSec=9.23 |AvgSamplesPerSec=8.95
epoch: 0|step: 843|ppo_ep: 1|act_loss: -0.0953369140625|cri_loss: 0.03643798828125|unsuper_loss: 0.0
average reward score: 3.73046875
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.91%) |Training time=0.89s (25.65%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.95
epoch: 0|step: 844|ppo_ep: 1|act_loss: 0.0645751953125|cri_loss: 0.0277557373046875|unsuper_loss: 0.0
average reward score: 4.08203125
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.24%) |Training time=0.89s (25.34%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.95
epoch: 0|step: 845|ppo_ep: 1|act_loss: 0.01273345947265625|cri_loss: 0.05987548828125|unsuper_loss: 0.0
average reward score: 4.328125
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.84%) |Training time=0.89s (25.70%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.24 |AvgSamplesPerSec=8.95
epoch: 0|step: 846|ppo_ep: 1|act_loss: -0.01515960693359375|cri_loss: 0.03668212890625|unsuper_loss: 0.0
average reward score: 4.0390625
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.97%) |Training time=0.88s (25.58%) |Others=0.19 (5.45%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.95
epoch: 0|step: 847|ppo_ep: 1|act_loss: 0.023651123046875|cri_loss: 0.043121337890625|unsuper_loss: 0.0
average reward score: 4.12109375
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.86%) |Training time=0.89s (25.66%) |Others=0.19 (5.48%)|CurSamplesPerSec=9.26 |AvgSamplesPerSec=8.95
epoch: 0|step: 848|ppo_ep: 1|act_loss: -0.04119873046875|cri_loss: 0.07708740234375|unsuper_loss: 0.0
average reward score: 4.1953125
-------------------------------------------------------------------------------------
|E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.40s (68.83%) |Training time=0.89s (25.52%) |Others=0.20 (5.65%)|CurSamplesPerSec=9.19 |AvgSamplesPerSec=8.95
[2023-06-30 06:22:07,514] [INFO] [logging.py:96:log_dist] [Rank 0] step=850, skipped=15, lr=[4.549865806367255e-07, 4.549865806367255e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:22:07,547] [INFO] [timer.py:215:stop] epoch=0/micro_step=850/global_step=850, RunningAvgSamplesPerSec=46.94843872931608, CurrSamplesPerSec=40.2031723362027, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:22:07,709] [INFO] [logging.py:96:log_dist] [Rank 0] step=850, skipped=14, lr=[2.3186105841041418e-07, 2.3186105841041418e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 849|ppo_ep: 1|act_loss: -0.0048980712890625|cri_loss: 0.035614013671875|unsuper_loss: 0.0
average reward score: 4.34375
-------------------------------------------------------------------------------------
|E2E latency=3.83s |Gather latency=0.00s (0.00%) |Generate time=2.51s (65.45%) |Training time=1.13s (29.45%) |Others=0.20 (5.11%)|CurSamplesPerSec=8.36 |AvgSamplesPerSec=8.95
epoch: 0|step: 850|ppo_ep: 1|act_loss: -0.0051422119140625|cri_loss: 0.04315185546875|unsuper_loss: 0.0
average reward score: 4.203125
-------------------------------------------------------------------------------------
|E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=2.37s (59.27%) |Training time=1.43s (35.75%) |Others=0.20 (4.99%)|CurSamplesPerSec=8.00 |AvgSamplesPerSec=8.95
epoch: 0|step: 851|ppo_ep: 1|act_loss: -0.037109375|cri_loss: 0.050445556640625|unsuper_loss: 0.0
average reward score: 4.5390625
-------------------------------------------------------------------------------------
|E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=2.38s (59.39%) |Training time=1.43s (35.72%) |Others=0.20 (4.89%)|CurSamplesPerSec=8.00 |AvgSamplesPerSec=8.95
epoch: 0|step: 852|ppo_ep: 1|act_loss: 0.10638427734375|cri_loss: 0.060882568359375|unsuper_loss: 0.0
average reward score: 4.25390625
-------------------------------------------------------------------------------------
|E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=2.38s (59.59%) |Training time=1.42s (35.53%) |Others=0.20 (4.88%)|CurSamplesPerSec=8.00 |AvgSamplesPerSec=8.95
epoch: 0|step: 853|ppo_ep: 1|act_loss: 0.01861572265625|cri_loss: 0.042694091796875|unsuper_loss: 0.0
average reward score: 4.6171875
-------------------------------------------------------------------------------------
|E2E latency=3.71s |Gather latency=0.00s (0.00%) |Generate time=2.41s (64.88%) |Training time=1.11s (30.00%) |Others=0.19 (5.12%)|CurSamplesPerSec=8.62 |AvgSamplesPerSec=8.95
epoch: 0|step: 854|ppo_ep: 1|act_loss: 0.04925537109375|cri_loss: 0.0484619140625|unsuper_loss: 0.0
average reward score: 3.509765625
-------------------------------------------------------------------------------------
|E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.38s (70.44%) |Training time=0.81s (23.99%) |Others=0.19 (5.57%)|CurSamplesPerSec=9.45 |AvgSamplesPerSec=8.95
epoch: 0|step: 855|ppo_ep: 1|act_loss: 0.08465576171875|cri_loss: 0.0362548828125|unsuper_loss: 0.0
average reward score: 3.833984375
-------------------------------------------------------------------------------------
|E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.37s (70.41%) |Training time=0.81s (23.96%) |Others=0.19 (5.63%)|CurSamplesPerSec=9.49 |AvgSamplesPerSec=8.95
epoch: 0|step: 856|ppo_ep: 1|act_loss: 0.0711669921875|cri_loss: 0.051177978515625|unsuper_loss: 0.0
average reward score: 4.359375
-------------------------------------------------------------------------------------
|E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.38s (70.13%) |Training time=0.83s (24.32%) |Others=0.19 (5.55%)|CurSamplesPerSec=9.41 |AvgSamplesPerSec=8.95
epoch: 0|step: 857|ppo_ep: 1|act_loss: 0.04034423828125|cri_loss: 0.033172607421875|unsuper_loss: 0.0
average reward score: 4.09765625
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.47s (69.93%) |Training time=0.87s (24.63%) |Others=0.19 (5.44%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.95
epoch: 0|step: 858|ppo_ep: 1|act_loss: -0.046966552734375|cri_loss: 0.03228759765625|unsuper_loss: 0.0
average reward score: 3.935546875
-------------------------------------------------------------------------------------
|E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.82%) |Training time=0.76s (22.54%) |Others=0.19 (5.64%)|CurSamplesPerSec=9.43 |AvgSamplesPerSec=8.95
[2023-06-30 06:22:43,713] [INFO] [logging.py:96:log_dist] [Rank 0] step=860, skipped=15, lr=[3.8271687921355017e-07, 3.8271687921355017e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:22:43,746] [INFO] [timer.py:215:stop] epoch=0/micro_step=860/global_step=860, RunningAvgSamplesPerSec=46.93871485665006, CurrSamplesPerSec=73.97231089398775, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:22:43,904] [INFO] [logging.py:96:log_dist] [Rank 0] step=860, skipped=14, lr=[1.9472485307027945e-07, 1.9472485307027945e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 859|ppo_ep: 1|act_loss: 0.07769775390625|cri_loss: 0.06036376953125|unsuper_loss: 0.0
average reward score: 4.21875
-------------------------------------------------------------------------------------
|E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.45s (71.96%) |Training time=0.77s (22.49%) |Others=0.19 (5.55%)|CurSamplesPerSec=9.40 |AvgSamplesPerSec=8.95
epoch: 0|step: 860|ppo_ep: 1|act_loss: 0.06689453125|cri_loss: 0.0299530029296875|unsuper_loss: 0.0
average reward score: 4.01953125
-------------------------------------------------------------------------------------
|E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.43s (71.68%) |Training time=0.77s (22.70%) |Others=0.19 (5.62%)|CurSamplesPerSec=9.44 |AvgSamplesPerSec=8.95
epoch: 0|step: 861|ppo_ep: 1|act_loss: 0.07440185546875|cri_loss: 0.031890869140625|unsuper_loss: 0.0
average reward score: 4.5703125
-------------------------------------------------------------------------------------
|E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.46s (72.05%) |Training time=0.76s (22.37%) |Others=0.19 (5.58%)|CurSamplesPerSec=9.38 |AvgSamplesPerSec=8.95
epoch: 0|step: 862|ppo_ep: 1|act_loss: 0.09283447265625|cri_loss: 0.04193115234375|unsuper_loss: 0.0
average reward score: 4.25390625
-------------------------------------------------------------------------------------
|E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.43s (71.70%) |Training time=0.77s (22.54%) |Others=0.20 (5.77%)|CurSamplesPerSec=9.43 |AvgSamplesPerSec=8.95
epoch: 0|step: 863|ppo_ep: 1|act_loss: 0.032562255859375|cri_loss: 0.03411865234375|unsuper_loss: 0.0
average reward score: 4.25
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.51s (72.54%) |Training time=0.76s (21.99%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.23 |AvgSamplesPerSec=8.95
epoch: 0|step: 864|ppo_ep: 1|act_loss: 0.01593017578125|cri_loss: 0.051055908203125|unsuper_loss: 0.0
average reward score: 4.1484375
-------------------------------------------------------------------------------------
|E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.44s (72.00%) |Training time=0.76s (22.43%) |Others=0.19 (5.57%)|CurSamplesPerSec=9.43 |AvgSamplesPerSec=8.96
epoch: 0|step: 865|ppo_ep: 1|act_loss: -0.0689697265625|cri_loss: 0.045135498046875|unsuper_loss: 0.0
average reward score: 4.30859375
-------------------------------------------------------------------------------------
|E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.46s (72.02%) |Training time=0.76s (22.42%) |Others=0.19 (5.56%)|CurSamplesPerSec=9.39 |AvgSamplesPerSec=8.96
epoch: 0|step: 866|ppo_ep: 1|act_loss: 0.0192413330078125|cri_loss: 0.059173583984375|unsuper_loss: 0.0
average reward score: 4.59765625
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.49s (72.22%) |Training time=0.77s (22.24%) |Others=0.19 (5.54%)|CurSamplesPerSec=9.28 |AvgSamplesPerSec=8.96
epoch: 0|step: 867|ppo_ep: 1|act_loss: 0.0302886962890625|cri_loss: 0.029296875|unsuper_loss: 0.0
average reward score: 3.9453125
-------------------------------------------------------------------------------------
|E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.48s (72.14%) |Training time=0.76s (22.27%) |Others=0.19 (5.59%)|CurSamplesPerSec=9.32 |AvgSamplesPerSec=8.96
epoch: 0|step: 868|ppo_ep: 1|act_loss: -0.00795745849609375|cri_loss: 0.03717041015625|unsuper_loss: 0.0
average reward score: 4.515625
-------------------------------------------------------------------------------------
|E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.45s (71.96%) |Training time=0.77s (22.49%) |Others=0.19 (5.54%)|CurSamplesPerSec=9.40 |AvgSamplesPerSec=8.96
[2023-06-30 06:23:17,859] [INFO] [logging.py:96:log_dist] [Rank 0] step=870, skipped=15, lr=[3.1645810212470433e-07, 3.1645810212470433e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:23:17,892] [INFO] [timer.py:215:stop] epoch=0/micro_step=870/global_step=870, RunningAvgSamplesPerSec=47.13717402977433, CurrSamplesPerSec=73.9647286646556, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:23:18,050] [INFO] [logging.py:96:log_dist] [Rank 0] step=870, skipped=14, lr=[1.607079523987662e-07, 1.607079523987662e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 869|ppo_ep: 1|act_loss: -0.0865478515625|cri_loss: 0.0706787109375|unsuper_loss: 0.0
average reward score: 4.59375
-------------------------------------------------------------------------------------
|E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.94%) |Training time=0.76s (22.50%) |Others=0.19 (5.57%)|CurSamplesPerSec=9.42 |AvgSamplesPerSec=8.96
epoch: 0|step: 870|ppo_ep: 1|act_loss: -0.08209228515625|cri_loss: 0.0293426513671875|unsuper_loss: 0.0
average reward score: 4.234375
-------------------------------------------------------------------------------------
|E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.47s (72.12%) |Training time=0.76s (22.28%) |Others=0.19 (5.59%)|CurSamplesPerSec=9.33 |AvgSamplesPerSec=8.96
epoch: 0|step: 871|ppo_ep: 1|act_loss: -0.0504150390625|cri_loss: 0.0163421630859375|unsuper_loss: 0.0
average reward score: 4.0703125
-------------------------------------------------------------------------------------
|E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.91%) |Training time=0.77s (22.54%) |Others=0.19 (5.55%)|CurSamplesPerSec=9.42 |AvgSamplesPerSec=8.96
epoch: 0|step: 872|ppo_ep: 1|act_loss: 0.0303497314453125|cri_loss: 0.0207366943359375|unsuper_loss: 0.0
average reward score: 4.3984375
-------------------------------------------------------------------------------------
|E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.46s (72.01%) |Training time=0.77s (22.44%) |Others=0.19 (5.55%)|CurSamplesPerSec=9.38 |AvgSamplesPerSec=8.96
epoch: 0|step: 873|ppo_ep: 1|act_loss: 0.039337158203125|cri_loss: 0.0232391357421875|unsuper_loss: 0.0
average reward score: 4.484375
-------------------------------------------------------------------------------------
|E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.47s (72.09%) |Training time=0.77s (22.34%) |Others=0.19 (5.57%)|CurSamplesPerSec=9.33 |AvgSamplesPerSec=8.96
epoch: 0|step: 874|ppo_ep: 1|act_loss: -0.11773681640625|cri_loss: 0.08135986328125|unsuper_loss: 0.0
average reward score: 4.15234375
-------------------------------------------------------------------------------------
|E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.93%) |Training time=0.76s (22.51%) |Others=0.19 (5.56%)|CurSamplesPerSec=9.42 |AvgSamplesPerSec=8.96
[2023-06-30 06:23:38,332] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 875|ppo_ep: 1|act_loss: 0.0309295654296875|cri_loss: 0.0225372314453125|unsuper_loss: 0.0
average reward score: 4.75390625
-------------------------------------------------------------------------------------
|E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.46s (73.01%) |Training time=0.72s (21.37%) |Others=0.19 (5.61%)|CurSamplesPerSec=9.49 |AvgSamplesPerSec=8.96
epoch: 0|step: 876|ppo_ep: 1|act_loss: -0.141845703125|cri_loss: 0.051513671875|unsuper_loss: 0.0
average reward score: 3.884765625
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.51s (72.53%) |Training time=0.76s (22.00%) |Others=0.19 (5.47%)|CurSamplesPerSec=9.25 |AvgSamplesPerSec=8.96
epoch: 0|step: 877|ppo_ep: 1|act_loss: -0.048187255859375|cri_loss: 0.048187255859375|unsuper_loss: 0.0
average reward score: 4.61328125
-------------------------------------------------------------------------------------
|E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.95%) |Training time=0.76s (22.49%) |Others=0.19 (5.56%)|CurSamplesPerSec=9.44 |AvgSamplesPerSec=8.96
epoch: 0|step: 878|ppo_ep: 1|act_loss: 0.08502197265625|cri_loss: 0.0285186767578125|unsuper_loss: 0.0
average reward score: 4.03515625
-------------------------------------------------------------------------------------
|E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.96%) |Training time=0.76s (22.48%) |Others=0.19 (5.56%)|CurSamplesPerSec=9.44 |AvgSamplesPerSec=8.96
[2023-06-30 06:23:52,007] [INFO] [logging.py:96:log_dist] [Rank 0] step=880, skipped=16, lr=[2.6203881362437934e-07, 2.6203881362437934e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:23:52,041] [INFO] [timer.py:215:stop] epoch=0/micro_step=880/global_step=880, RunningAvgSamplesPerSec=47.335725305355105, CurrSamplesPerSec=73.96546236085088, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:23:52,199] [INFO] [logging.py:96:log_dist] [Rank 0] step=880, skipped=14, lr=[1.298563852081905e-07, 1.298563852081905e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 879|ppo_ep: 1|act_loss: 0.0362548828125|cri_loss: 0.0743408203125|unsuper_loss: 0.0
average reward score: 4.40234375
-------------------------------------------------------------------------------------
|E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.51s (72.47%) |Training time=0.77s (22.07%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.23 |AvgSamplesPerSec=8.96
epoch: 0|step: 880|ppo_ep: 1|act_loss: -0.0081634521484375|cri_loss: 0.02197265625|unsuper_loss: 0.0
average reward score: 4.30078125
-------------------------------------------------------------------------------------
|E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.43s (71.85%) |Training time=0.76s (22.59%) |Others=0.19 (5.57%)|CurSamplesPerSec=9.46 |AvgSamplesPerSec=8.96
epoch: 0|step: 881|ppo_ep: 1|act_loss: -0.0189666748046875|cri_loss: 0.028717041015625|unsuper_loss: 0.0
average reward score: 4.1484375
-------------------------------------------------------------------------------------
|E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.45s (72.03%) |Training time=0.76s (22.43%) |Others=0.19 (5.54%)|CurSamplesPerSec=9.41 |AvgSamplesPerSec=8.96
epoch: 0|step: 882|ppo_ep: 1|act_loss: 0.04022216796875|cri_loss: 0.035400390625|unsuper_loss: 0.0
average reward score: 4.1640625
-------------------------------------------------------------------------------------
|E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.43s (71.80%) |Training time=0.76s (22.62%) |Others=0.19 (5.59%)|CurSamplesPerSec=9.46 |AvgSamplesPerSec=8.96
epoch: 0|step: 883|ppo_ep: 1|act_loss: 0.0281219482421875|cri_loss: 0.031219482421875|unsuper_loss: 0.0
average reward score: 3.6796875
-------------------------------------------------------------------------------------
|E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.50s (72.34%) |Training time=0.77s (22.17%) |Others=0.19 (5.50%)|CurSamplesPerSec=9.26 |AvgSamplesPerSec=8.96
epoch: 0|step: 884|ppo_ep: 1|act_loss: -0.0634765625|cri_loss: 0.0277252197265625|unsuper_loss: 0.0
average reward score: 4.08984375
-------------------------------------------------------------------------------------
|E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.47s (72.18%) |Training time=0.76s (22.28%) |Others=0.19 (5.54%)|CurSamplesPerSec=9.34 |AvgSamplesPerSec=8.96
epoch: 0|step: 885|ppo_ep: 1|act_loss: 0.0723876953125|cri_loss: 0.044464111328125|unsuper_loss: 0.0
average reward score: 4.43359375
-------------------------------------------------------------------------------------
|E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.56s (71.04%) |Training time=0.85s (23.67%) |Others=0.19 (5.29%)|CurSamplesPerSec=8.89 |AvgSamplesPerSec=8.96
epoch: 0|step: 886|ppo_ep: 1|act_loss: -0.0233154296875|cri_loss: 0.0977783203125|unsuper_loss: 0.0
average reward score: 4.48828125
-------------------------------------------------------------------------------------
|E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.85%) |Training time=0.77s (22.55%) |Others=0.19 (5.60%)|CurSamplesPerSec=9.43 |AvgSamplesPerSec=8.96
epoch: 0|step: 887|ppo_ep: 1|act_loss: -0.08319091796875|cri_loss: 0.03997802734375|unsuper_loss: 0.0
average reward score: 4.8671875
-------------------------------------------------------------------------------------
|E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.46s (72.00%) |Training time=0.76s (22.42%) |Others=0.19 (5.57%)|CurSamplesPerSec=9.38 |AvgSamplesPerSec=8.97
epoch: 0|step: 888|ppo_ep: 1|act_loss: 0.0322265625|cri_loss: 0.034515380859375|unsuper_loss: 0.0
average reward score: 4.0078125
-------------------------------------------------------------------------------------
|E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.46s (71.98%) |Training time=0.77s (22.40%) |Others=0.19 (5.61%)|CurSamplesPerSec=9.36 |AvgSamplesPerSec=8.97
[2023-06-30 06:24:26,261] [INFO] [logging.py:96:log_dist] [Rank 0] step=890, skipped=16, lr=[2.0744097427091748e-07, 2.0744097427091748e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:24:26,294] [INFO] [timer.py:215:stop] epoch=0/micro_step=890/global_step=890, RunningAvgSamplesPerSec=47.52134946021604, CurrSamplesPerSec=73.96913106017603, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:24:26,454] [INFO] [logging.py:96:log_dist] [Rank 0] step=890, skipped=14, lr=[1.0221189724751502e-07, 1.0221189724751502e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 889|ppo_ep: 1|act_loss: 0.07373046875|cri_loss: 0.049652099609375|unsuper_loss: 0.0
average reward score: 4.96875
-------------------------------------------------------------------------------------
|E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.89%) |Training time=0.76s (22.50%) |Others=0.19 (5.61%)|CurSamplesPerSec=9.44 |AvgSamplesPerSec=8.97
epoch: 0|step: 890|ppo_ep: 1|act_loss: 0.021759033203125|cri_loss: 0.0218963623046875|unsuper_loss: 0.0
average reward score: 4.53515625
-------------------------------------------------------------------------------------
|E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.45s (72.01%) |Training time=0.76s (22.43%) |Others=0.19 (5.56%)|CurSamplesPerSec=9.41 |AvgSamplesPerSec=8.97
epoch: 0|step: 891|ppo_ep: 1|act_loss: 0.0025920867919921875|cri_loss: 0.032196044921875|unsuper_loss: 0.0
average reward score: 4.578125
-------------------------------------------------------------------------------------
|E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.90%) |Training time=0.77s (22.54%) |Others=0.19 (5.56%)|CurSamplesPerSec=9.43 |AvgSamplesPerSec=8.97
epoch: 0|step: 892|ppo_ep: 1|act_loss: 0.09515380859375|cri_loss: 0.0287628173828125|unsuper_loss: 0.0
average reward score: 4.0234375
-------------------------------------------------------------------------------------
|E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.75%) |Training time=0.76s (22.46%) |Others=0.20 (5.79%)|CurSamplesPerSec=9.40 |AvgSamplesPerSec=8.97
epoch: 0|step: 893|ppo_ep: 1|act_loss: -0.033233642578125|cri_loss: 0.045684814453125|unsuper_loss: 0.0
average reward score: 4.4140625
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.49s (72.31%) |Training time=0.76s (22.17%) |Others=0.19 (5.53%)|CurSamplesPerSec=9.28 |AvgSamplesPerSec=8.97
epoch: 0|step: 894|ppo_ep: 1|act_loss: 0.033233642578125|cri_loss: 0.03314208984375|unsuper_loss: 0.0
average reward score: 4.765625
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.57s (72.84%) |Training time=0.77s (21.77%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.97
epoch: 0|step: 895|ppo_ep: 1|act_loss: 0.0518798828125|cri_loss: 0.043670654296875|unsuper_loss: 0.0
average reward score: 4.70703125
-------------------------------------------------------------------------------------
|E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.43s (71.83%) |Training time=0.76s (22.60%) |Others=0.19 (5.57%)|CurSamplesPerSec=9.47 |AvgSamplesPerSec=8.97
epoch: 0|step: 896|ppo_ep: 1|act_loss: -0.1461181640625|cri_loss: 0.0750732421875|unsuper_loss: 0.0
average reward score: 4.5546875
-------------------------------------------------------------------------------------
|E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.93%) |Training time=0.76s (22.51%) |Others=0.19 (5.56%)|CurSamplesPerSec=9.44 |AvgSamplesPerSec=8.97
epoch: 0|step: 897|ppo_ep: 1|act_loss: -0.07208251953125|cri_loss: 0.042755126953125|unsuper_loss: 0.0
average reward score: 4.453125
-------------------------------------------------------------------------------------
|E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.47s (72.18%) |Training time=0.76s (22.30%) |Others=0.19 (5.52%)|CurSamplesPerSec=9.35 |AvgSamplesPerSec=8.97
epoch: 0|step: 898|ppo_ep: 1|act_loss: -0.0007758140563964844|cri_loss: 0.0268707275390625|unsuper_loss: 0.0
average reward score: 3.84765625
-------------------------------------------------------------------------------------
|E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.99%) |Training time=0.76s (22.45%) |Others=0.19 (5.56%)|CurSamplesPerSec=9.45 |AvgSamplesPerSec=8.97
[2023-06-30 06:25:00,397] [INFO] [logging.py:96:log_dist] [Rank 0] step=900, skipped=16, lr=[1.590912278818792e-07, 1.590912278818792e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:25:00,430] [INFO] [timer.py:215:stop] epoch=0/micro_step=900/global_step=900, RunningAvgSamplesPerSec=47.71129940469062, CurrSamplesPerSec=74.16044720247051, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:25:00,588] [INFO] [logging.py:96:log_dist] [Rank 0] step=900, skipped=14, lr=[7.781189471550543e-08, 7.781189471550543e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 899|ppo_ep: 1|act_loss: -0.014129638671875|cri_loss: 0.02325439453125|unsuper_loss: 0.0
average reward score: 4.4296875
-------------------------------------------------------------------------------------
|E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.43s (71.87%) |Training time=0.76s (22.57%) |Others=0.19 (5.56%)|CurSamplesPerSec=9.47 |AvgSamplesPerSec=8.97
epoch: 0|step: 900|ppo_ep: 1|act_loss: -0.0286102294921875|cri_loss: 0.050537109375|unsuper_loss: 0.0
average reward score: 4.34765625
-------------------------------------------------------------------------------------
|E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.48s (72.29%) |Training time=0.76s (22.20%) |Others=0.19 (5.51%)|CurSamplesPerSec=9.33 |AvgSamplesPerSec=8.97
epoch: 0|step: 901|ppo_ep: 1|act_loss: -0.004375457763671875|cri_loss: 0.02362060546875|unsuper_loss: 0.0
average reward score: 3.9921875
-------------------------------------------------------------------------------------
|E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.90%) |Training time=0.76s (22.54%) |Others=0.19 (5.56%)|CurSamplesPerSec=9.43 |AvgSamplesPerSec=8.97
epoch: 0|step: 902|ppo_ep: 1|act_loss: 0.08721923828125|cri_loss: 0.057220458984375|unsuper_loss: 0.0
average reward score: 4.15234375
-------------------------------------------------------------------------------------
|E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.47s (72.20%) |Training time=0.76s (22.28%) |Others=0.19 (5.52%)|CurSamplesPerSec=9.35 |AvgSamplesPerSec=8.97
epoch: 0|step: 903|ppo_ep: 1|act_loss: -0.044158935546875|cri_loss: 0.0443115234375|unsuper_loss: 0.0
average reward score: 4.125
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.49s (72.14%) |Training time=0.77s (22.22%) |Others=0.19 (5.64%)|CurSamplesPerSec=9.27 |AvgSamplesPerSec=8.97
epoch: 0|step: 904|ppo_ep: 1|act_loss: -0.11456298828125|cri_loss: 0.069091796875|unsuper_loss: 0.0
average reward score: 4.66796875
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.49s (72.09%) |Training time=0.77s (22.41%) |Others=0.19 (5.50%)|CurSamplesPerSec=9.27 |AvgSamplesPerSec=8.97
epoch: 0|step: 905|ppo_ep: 1|act_loss: -0.10723876953125|cri_loss: 0.0638427734375|unsuper_loss: 0.0
average reward score: 4.1484375
-------------------------------------------------------------------------------------
|E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.41s (71.46%) |Training time=0.78s (22.96%) |Others=0.19 (5.59%)|CurSamplesPerSec=9.47 |AvgSamplesPerSec=8.97
epoch: 0|step: 906|ppo_ep: 1|act_loss: 0.04241943359375|cri_loss: 0.06060791015625|unsuper_loss: 0.0
average reward score: 3.94921875
-------------------------------------------------------------------------------------
|E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.76%) |Training time=0.77s (22.69%) |Others=0.19 (5.56%)|CurSamplesPerSec=9.40 |AvgSamplesPerSec=8.97
epoch: 0|step: 907|ppo_ep: 1|act_loss: 0.051910400390625|cri_loss: 0.0165557861328125|unsuper_loss: 0.0
average reward score: 3.9921875
-------------------------------------------------------------------------------------
|E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.42s (71.62%) |Training time=0.77s (22.80%) |Others=0.19 (5.58%)|CurSamplesPerSec=9.46 |AvgSamplesPerSec=8.97
epoch: 0|step: 908|ppo_ep: 1|act_loss: -0.04803466796875|cri_loss: 0.02557373046875|unsuper_loss: 0.0
average reward score: 4.1484375
-------------------------------------------------------------------------------------
|E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.42s (71.56%) |Training time=0.77s (22.87%) |Others=0.19 (5.58%)|CurSamplesPerSec=9.48 |AvgSamplesPerSec=8.97
[2023-06-30 06:25:34,470] [INFO] [logging.py:96:log_dist] [Rank 0] step=910, skipped=16, lr=[1.1705499727233991e-07, 1.1705499727233991e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:25:34,504] [INFO] [timer.py:215:stop] epoch=0/micro_step=910/global_step=910, RunningAvgSamplesPerSec=47.89418805007901, CurrSamplesPerSec=73.01894433557729, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:25:34,661] [INFO] [logging.py:96:log_dist] [Rank 0] step=910, skipped=14, lr=[5.6689393645807666e-08, 5.6689393645807666e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 909|ppo_ep: 1|act_loss: 0.09820556640625|cri_loss: 0.0364990234375|unsuper_loss: 0.0
average reward score: 4.60546875
-------------------------------------------------------------------------------------
|E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.43s (71.71%) |Training time=0.77s (22.74%) |Others=0.19 (5.55%)|CurSamplesPerSec=9.46 |AvgSamplesPerSec=8.97
epoch: 0|step: 910|ppo_ep: 1|act_loss: -0.0175933837890625|cri_loss: 0.052215576171875|unsuper_loss: 0.0
average reward score: 4.59765625
-------------------------------------------------------------------------------------
|E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.42s (71.75%) |Training time=0.76s (22.61%) |Others=0.19 (5.64%)|CurSamplesPerSec=9.49 |AvgSamplesPerSec=8.98
epoch: 0|step: 911|ppo_ep: 1|act_loss: -0.0084991455078125|cri_loss: 0.031707763671875|unsuper_loss: 0.0
average reward score: 4.56640625
-------------------------------------------------------------------------------------
|E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.64%) |Training time=0.78s (22.81%) |Others=0.19 (5.55%)|CurSamplesPerSec=9.41 |AvgSamplesPerSec=8.98
epoch: 0|step: 912|ppo_ep: 1|act_loss: 0.051025390625|cri_loss: 0.027679443359375|unsuper_loss: 0.0
average reward score: 4.0859375
-------------------------------------------------------------------------------------
|E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.43s (71.53%) |Training time=0.78s (22.93%) |Others=0.19 (5.54%)|CurSamplesPerSec=9.43 |AvgSamplesPerSec=8.98
epoch: 0|step: 913|ppo_ep: 1|act_loss: -0.02215576171875|cri_loss: 0.037200927734375|unsuper_loss: 0.0
average reward score: 4.4609375
-------------------------------------------------------------------------------------
|E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.60s (72.68%) |Training time=0.79s (22.03%) |Others=0.19 (5.28%)|CurSamplesPerSec=8.94 |AvgSamplesPerSec=8.98
epoch: 0|step: 914|ppo_ep: 1|act_loss: 0.05047607421875|cri_loss: 0.045623779296875|unsuper_loss: 0.0
average reward score: 3.904296875
-------------------------------------------------------------------------------------
|E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.76%) |Training time=0.77s (22.60%) |Others=0.19 (5.64%)|CurSamplesPerSec=9.42 |AvgSamplesPerSec=8.98
epoch: 0|step: 915|ppo_ep: 1|act_loss: 0.0152435302734375|cri_loss: 0.040069580078125|unsuper_loss: 0.0
average reward score: 4.640625
-------------------------------------------------------------------------------------
|E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.45s (71.96%) |Training time=0.77s (22.51%) |Others=0.19 (5.54%)|CurSamplesPerSec=9.41 |AvgSamplesPerSec=8.98
epoch: 0|step: 916|ppo_ep: 1|act_loss: 0.007228851318359375|cri_loss: 0.041900634765625|unsuper_loss: 0.0
average reward score: 4.42578125
-------------------------------------------------------------------------------------
|E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.41s (71.41%) |Training time=0.78s (22.97%) |Others=0.19 (5.62%)|CurSamplesPerSec=9.47 |AvgSamplesPerSec=8.98
epoch: 0|step: 917|ppo_ep: 1|act_loss: 0.0219573974609375|cri_loss: 0.030670166015625|unsuper_loss: 0.0
average reward score: 4.70703125
-------------------------------------------------------------------------------------
|E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.41s (71.60%) |Training time=0.77s (22.81%) |Others=0.19 (5.60%)|CurSamplesPerSec=9.49 |AvgSamplesPerSec=8.98
epoch: 0|step: 918|ppo_ep: 1|act_loss: 0.042633056640625|cri_loss: 0.06256103515625|unsuper_loss: 0.0
average reward score: 4.1015625
-------------------------------------------------------------------------------------
|E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.42s (71.47%) |Training time=0.78s (22.94%) |Others=0.19 (5.59%)|CurSamplesPerSec=9.47 |AvgSamplesPerSec=8.98
[2023-06-30 06:26:08,539] [INFO] [logging.py:96:log_dist] [Rank 0] step=920, skipped=16, lr=[8.13891623382061e-08, 8.13891623382061e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:26:08,573] [INFO] [timer.py:215:stop] epoch=0/micro_step=920/global_step=920, RunningAvgSamplesPerSec=48.07057191009628, CurrSamplesPerSec=71.97394265367517, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:26:08,730] [INFO] [logging.py:96:log_dist] [Rank 0] step=920, skipped=14, lr=[3.887297523242184e-08, 3.887297523242184e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 919|ppo_ep: 1|act_loss: -0.0163421630859375|cri_loss: 0.029571533203125|unsuper_loss: 0.0
average reward score: 4.1328125
-------------------------------------------------------------------------------------
|E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.43s (71.58%) |Training time=0.77s (22.87%) |Others=0.19 (5.55%)|CurSamplesPerSec=9.44 |AvgSamplesPerSec=8.98
epoch: 0|step: 920|ppo_ep: 1|act_loss: -0.0034351348876953125|cri_loss: 0.05181884765625|unsuper_loss: 0.0
average reward score: 4.56640625
-------------------------------------------------------------------------------------
|E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.74%) |Training time=0.77s (22.71%) |Others=0.19 (5.55%)|CurSamplesPerSec=9.43 |AvgSamplesPerSec=8.98
epoch: 0|step: 921|ppo_ep: 1|act_loss: 0.0161285400390625|cri_loss: 0.03857421875|unsuper_loss: 0.0
average reward score: 4.28515625
-------------------------------------------------------------------------------------
|E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.78%) |Training time=0.77s (22.68%) |Others=0.19 (5.54%)|CurSamplesPerSec=9.42 |AvgSamplesPerSec=8.98
epoch: 0|step: 922|ppo_ep: 1|act_loss: -0.01849365234375|cri_loss: 0.040924072265625|unsuper_loss: 0.0
average reward score: 4.36328125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.55s (72.78%) |Training time=0.76s (21.80%) |Others=0.19 (5.42%)|CurSamplesPerSec=9.13 |AvgSamplesPerSec=8.98
epoch: 0|step: 923|ppo_ep: 1|act_loss: 0.043731689453125|cri_loss: 0.021453857421875|unsuper_loss: 0.0
average reward score: 3.912109375
-------------------------------------------------------------------------------------
|E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.79%) |Training time=0.77s (22.54%) |Others=0.19 (5.67%)|CurSamplesPerSec=9.42 |AvgSamplesPerSec=8.98
epoch: 0|step: 924|ppo_ep: 1|act_loss: 0.044525146484375|cri_loss: 0.025115966796875|unsuper_loss: 0.0
average reward score: 4.265625
-------------------------------------------------------------------------------------
|E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.49s (69.05%) |Training time=0.92s (25.68%) |Others=0.19 (5.27%)|CurSamplesPerSec=8.89 |AvgSamplesPerSec=8.98
epoch: 0|step: 925|ppo_ep: 1|act_loss: 0.027252197265625|cri_loss: 0.0177154541015625|unsuper_loss: 0.0
average reward score: 4.4296875
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.79%) |Training time=0.87s (24.81%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.16 |AvgSamplesPerSec=8.98
epoch: 0|step: 926|ppo_ep: 1|act_loss: 0.078125|cri_loss: 0.05279541015625|unsuper_loss: 0.0
average reward score: 4.6640625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.45s (69.75%) |Training time=0.87s (24.86%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.98
epoch: 0|step: 927|ppo_ep: 1|act_loss: -0.0205841064453125|cri_loss: 0.044158935546875|unsuper_loss: 0.0
average reward score: 5.0625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.71%) |Training time=0.87s (24.88%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.98
[2023-06-30 06:26:39,807] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024
epoch: 0|step: 928|ppo_ep: 1|act_loss: -0.0001552104949951172|cri_loss: 0.054534912109375|unsuper_loss: 0.0
average reward score: 4.51171875
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.45s (70.92%) |Training time=0.82s (23.62%) |Others=0.19 (5.46%)|CurSamplesPerSec=9.27 |AvgSamplesPerSec=8.98
[2023-06-30 06:26:43,289] [INFO] [logging.py:96:log_dist] [Rank 0] step=930, skipped=17, lr=[5.4776665295035125e-08, 5.4776665295035125e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:26:43,322] [INFO] [timer.py:215:stop] epoch=0/micro_step=930/global_step=930, RunningAvgSamplesPerSec=48.20201941950485, CurrSamplesPerSec=59.93157807311937, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:26:43,480] [INFO] [logging.py:96:log_dist] [Rank 0] step=930, skipped=14, lr=[2.4386747156034395e-08, 2.4386747156034395e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 929|ppo_ep: 1|act_loss: -0.038421630859375|cri_loss: 0.052032470703125|unsuper_loss: 0.0
average reward score: 4.31640625
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.46s (70.04%) |Training time=0.86s (24.58%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.98
epoch: 0|step: 930|ppo_ep: 1|act_loss: 0.03533935546875|cri_loss: 0.0172119140625|unsuper_loss: 0.0
average reward score: 4.1171875
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.45s (69.67%) |Training time=0.88s (24.99%) |Others=0.19 (5.34%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=8.98
epoch: 0|step: 931|ppo_ep: 1|act_loss: -0.09173583984375|cri_loss: 0.05902099609375|unsuper_loss: 0.0
average reward score: 4.4375
-------------------------------------------------------------------------------------
|E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.50s (69.94%) |Training time=0.88s (24.76%) |Others=0.19 (5.30%)|CurSamplesPerSec=8.96 |AvgSamplesPerSec=8.98
epoch: 0|step: 932|ppo_ep: 1|act_loss: -0.0110321044921875|cri_loss: 0.0272369384765625|unsuper_loss: 0.0
average reward score: 4.375
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.45s (69.60%) |Training time=0.88s (25.01%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.10 |AvgSamplesPerSec=8.98
epoch: 0|step: 933|ppo_ep: 1|act_loss: 0.0012340545654296875|cri_loss: 0.043304443359375|unsuper_loss: 0.0
average reward score: 4.078125
-------------------------------------------------------------------------------------
|E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.48s (69.64%) |Training time=0.89s (25.03%) |Others=0.19 (5.33%)|CurSamplesPerSec=8.97 |AvgSamplesPerSec=8.98
epoch: 0|step: 934|ppo_ep: 1|act_loss: 0.09326171875|cri_loss: 0.07501220703125|unsuper_loss: 0.0
average reward score: 3.720703125
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.44s (70.13%) |Training time=0.85s (24.48%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.18 |AvgSamplesPerSec=8.98
epoch: 0|step: 935|ppo_ep: 1|act_loss: 0.007343292236328125|cri_loss: 0.049346923828125|unsuper_loss: 0.0
average reward score: 4.45703125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.67%) |Training time=0.87s (24.94%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.98
epoch: 0|step: 936|ppo_ep: 1|act_loss: 0.01200103759765625|cri_loss: 0.031280517578125|unsuper_loss: 0.0
average reward score: 4.18359375
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.74%) |Training time=0.87s (24.87%) |Others=0.19 (5.39%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.98
epoch: 0|step: 937|ppo_ep: 1|act_loss: -0.0076751708984375|cri_loss: 0.048583984375|unsuper_loss: 0.0
average reward score: 4.765625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.51%) |Training time=0.88s (25.11%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.98
epoch: 0|step: 938|ppo_ep: 1|act_loss: -0.036529541015625|cri_loss: 0.0242462158203125|unsuper_loss: 0.0
average reward score: 4.21875
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.47s (69.93%) |Training time=0.87s (24.67%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.05 |AvgSamplesPerSec=8.98
[2023-06-30 06:27:18,497] [INFO] [logging.py:96:log_dist] [Rank 0] step=940, skipped=17, lr=[3.1340361330398695e-08, 3.1340361330398695e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:27:18,531] [INFO] [timer.py:215:stop] epoch=0/micro_step=940/global_step=940, RunningAvgSamplesPerSec=48.29489044026186, CurrSamplesPerSec=59.56823967590636, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:27:18,689] [INFO] [logging.py:96:log_dist] [Rank 0] step=940, skipped=14, lr=[1.3250310963527358e-08, 1.3250310963527358e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 939|ppo_ep: 1|act_loss: 0.0036716461181640625|cri_loss: 0.07196044921875|unsuper_loss: 0.0
average reward score: 4.30859375
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.47s (69.99%) |Training time=0.87s (24.65%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.98
epoch: 0|step: 940|ppo_ep: 1|act_loss: 0.0261077880859375|cri_loss: 0.044921875|unsuper_loss: 0.0
average reward score: 4.66015625
-------------------------------------------------------------------------------------
|E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.49s (69.73%) |Training time=0.89s (24.90%) |Others=0.19 (5.36%)|CurSamplesPerSec=8.98 |AvgSamplesPerSec=8.98
epoch: 0|step: 941|ppo_ep: 1|act_loss: 0.039947509765625|cri_loss: 0.0256805419921875|unsuper_loss: 0.0
average reward score: 4.02734375
-------------------------------------------------------------------------------------
|E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.46s (69.73%) |Training time=0.88s (24.90%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.09 |AvgSamplesPerSec=8.98
epoch: 0|step: 942|ppo_ep: 1|act_loss: -0.0247344970703125|cri_loss: 0.036834716796875|unsuper_loss: 0.0
average reward score: 4.375
-------------------------------------------------------------------------------------
|E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.46s (69.67%) |Training time=0.88s (24.97%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.08 |AvgSamplesPerSec=8.98
epoch: 0|step: 943|ppo_ep: 1|act_loss: -0.07061767578125|cri_loss: 0.07623291015625|unsuper_loss: 0.0
average reward score: 4.109375
-------------------------------------------------------------------------------------
|E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.49s (70.17%) |Training time=0.87s (24.45%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.01 |AvgSamplesPerSec=8.98
epoch: 0|step: 944|ppo_ep: 1|act_loss: -0.0146331787109375|cri_loss: 0.0400390625|unsuper_loss: 0.0
average reward score: 4.31640625
-------------------------------------------------------------------------------------
|E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.67%) |Training time=0.87s (24.95%) |Others=0.19 (5.38%)|CurSamplesPerSec=9.17 |AvgSamplesPerSec=8.98
epoch: 0|step: 945|ppo_ep: 1|act_loss: 0.0227203369140625|cri_loss: 0.057342529296875|unsuper_loss: 0.0
average reward score: 4.7265625
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.68%) |Training time=0.88s (24.95%) |Others=0.19 (5.37%)|CurSamplesPerSec=9.12 |AvgSamplesPerSec=8.98
epoch: 0|step: 946|ppo_ep: 1|act_loss: -0.054107666015625|cri_loss: 0.06451416015625|unsuper_loss: 0.0
average reward score: 4.64453125
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.72%) |Training time=0.87s (24.89%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.14 |AvgSamplesPerSec=8.98
epoch: 0|step: 947|ppo_ep: 1|act_loss: -0.034393310546875|cri_loss: 0.05120849609375|unsuper_loss: 0.0
average reward score: 4.47265625
-------------------------------------------------------------------------------------
|E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.43s (69.50%) |Training time=0.88s (25.10%) |Others=0.19 (5.40%)|CurSamplesPerSec=9.15 |AvgSamplesPerSec=8.98
epoch: 0|step: 948|ppo_ep: 1|act_loss: -0.00797271728515625|cri_loss: 0.0257415771484375|unsuper_loss: 0.0
average reward score: 4.046875
-------------------------------------------------------------------------------------
|E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.45s (69.80%) |Training time=0.87s (24.84%) |Others=0.19 (5.36%)|CurSamplesPerSec=9.11 |AvgSamplesPerSec=8.98
[2023-06-30 06:27:53,744] [INFO] [logging.py:96:log_dist] [Rank 0] step=950, skipped=17, lr=[1.4390434885858659e-08, 1.4390434885858659e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-30 06:27:53,777] [INFO] [timer.py:215:stop] epoch=0/micro_step=950/global_step=950, RunningAvgSamplesPerSec=48.38602011967408, CurrSamplesPerSec=59.395944261923646, MemAllocated=12.1GB, MaxMemAllocated=21.88GB
[2023-06-30 06:27:53,937] [INFO] [logging.py:96:log_dist] [Rank 0] step=950, skipped=14, lr=[5.478735544813263e-09, 5.478735544813263e-09], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 949|ppo_ep: 1|act_loss: -0.037078857421875|cri_loss: 0.019500732421875|unsuper_loss: 0.0
average reward score: 4.71484375
-------------------------------------------------------------------------------------
|E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.51s (70.31%) |Training time=0.87s (24.35%) |Others=0.19 (5.34%)|CurSamplesPerSec=8.96 |AvgSamplesPerSec=8.98
epoch: 0|step: 950|ppo_ep: 1|act_loss: 0.028961181640625|cri_loss: 0.0264129638671875|unsuper_loss: 0.0
average reward score: 3.994140625
-------------------------------------------------------------------------------------
|E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.38s (70.46%) |Training time=0.81s (23.94%) |Others=0.19 (5.60%)|CurSamplesPerSec=9.48 |AvgSamplesPerSec=8.98
epoch: 0|step: 951|ppo_ep: 1|act_loss: 0.01248931884765625|cri_loss: 0.0277099609375|unsuper_loss: 0.0
average reward score: 4.53515625
-------------------------------------------------------------------------------------
|E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.39s (70.70%) |Training time=0.80s (23.72%) |Others=0.19 (5.58%)|CurSamplesPerSec=9.48 |AvgSamplesPerSec=8.98
epoch: 0|step: 952|ppo_ep: 1|act_loss: -0.061248779296875|cri_loss: 0.050933837890625|unsuper_loss: 0.0
average reward score: 4.65234375
-------------------------------------------------------------------------------------
|E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.38s (68.91%) |Training time=0.88s (25.59%) |Others=0.19 (5.50%)|CurSamplesPerSec=9.28 |AvgSamplesPerSec=8.99
[2023-06-30 06:28:07,307] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 953|ppo_ep: 1|act_loss: -0.211181640625|cri_loss: 0.11517333984375|unsuper_loss: 0.0
average reward score: 4.26953125
-------------------------------------------------------------------------------------
saving model ...saving model ...saving model ...saving model ...saving model ...saving model ...
saving model ...saving model ...


[2023-06-30 06:28:10,144] [INFO] [launch.py:347:main] Process 1346894 exits successfully.
[2023-06-30 06:28:11,146] [INFO] [launch.py:347:main] Process 1346893 exits successfully.
[2023-06-30 06:28:11,146] [INFO] [launch.py:347:main] Process 1346889 exits successfully.
[2023-06-30 06:28:11,146] [INFO] [launch.py:347:main] Process 1346891 exits successfully.
[2023-06-30 06:28:11,146] [INFO] [launch.py:347:main] Process 1346890 exits successfully.
[2023-06-30 06:28:11,146] [INFO] [launch.py:347:main] Process 1346888 exits successfully.
[2023-06-30 06:28:12,147] [INFO] [launch.py:347:main] Process 1346892 exits successfully.
[2023-06-30 06:28:19,155] [INFO] [launch.py:347:main] Process 1346887 exits successfully.