Setting ds_accelerator to cuda (auto detect)
[2023-07-01 08:03:29,655] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Detected CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7: setting --include=localhost:0,1,2,3,4,5,6,7
[2023-07-01 08:03:29,723] [INFO] [runner.py:555:main] cmd = /home/zhaiyuanzhao/anaconda3/envs/RLHF/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=12346 --enable_each_rank_log=None main.py --data_path /home/zhaiyuanzhao/llm/dataset/rm-static/data --data_split 2,4,4 --actor_model_name_or_path /home/zhaiyuanzhao/code/DeepSpeedExamples-4datasets/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/output-1.3b --critic_model_name_or_path /home/zhaiyuanzhao/code/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/output-RM --num_padding_at_beginning 1 --per_device_train_batch_size 4 --per_device_mini_train_batch_size 4 --generation_batch_numbers 1 --ppo_epochs 1 --max_answer_seq_len 256 --max_prompt_seq_len 256 --actor_learning_rate 9.65e-6 --critic_learning_rate 5e-6 --num_train_epochs 1 --lr_scheduler_type cosine --gradient_accumulation_steps 1 --disable_actor_dropout --num_warmup_steps 100 --deepspeed --seed 1234 --enable_hybrid_engine --actor_zero_stage 2 --critic_zero_stage 2 --enable_ema --output_dir ./output-1.3b-RM_350m
Setting ds_accelerator to cuda (auto detect)
[2023-07-01 08:03:32,114] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2023-07-01 08:03:32,114] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=8, node_rank=0
[2023-07-01 08:03:32,114] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2023-07-01 08:03:32,114] [INFO] [launch.py:163:main] dist_world_size=8
[2023-07-01 08:03:32,114] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
Setting ds_accelerator to cuda (auto detect)
Setting ds_accelerator to cuda (auto detect)
Setting ds_accelerator to cuda (auto detect)
Setting ds_accelerator to cuda (auto detect)
Setting ds_accelerator to cuda (auto detect)
Setting ds_accelerator to cuda (auto detect)
Setting ds_accelerator to cuda (auto detect)
Setting ds_accelerator to cuda (auto detect)
[2023-07-01 08:03:58,950] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-01 08:03:58,950] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-01 08:03:59,205] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-01 08:03:59,205] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-01 08:03:59,236] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-01 08:03:59,236] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-01 08:03:59,402] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-01 08:03:59,402] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-01 08:03:59,452] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-01 08:03:59,452] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-01 08:03:59,498] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-01 08:03:59,498] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-01 08:03:59,515] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-01 08:03:59,515] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-01 08:03:59,515] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-01 08:03:59,515] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-01 08:03:59,515] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
Found cached dataset parquet (/home/zhaiyuanzhao/.cache/huggingface/datasets/parquet/default-d09980a08a1dbd7c/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
  0%|          | 0/2 [00:00<?, ?it/s] 50%|█████     | 1/2 [00:00<00:00,  1.72it/s]100%|██████████| 2/2 [00:00<00:00,  3.11it/s]
************************[start] Initializing Actor Model [start] *************************
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combinationInstalled CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combinationInstalled CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination


Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...


Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Detected CUDA files, patching ldflags
Emitting ninja build file /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
ninja: no work to do.
Loading extension module fused_adam...
Loading extension module fused_adam...Loading extension module fused_adam...

Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Time to load fused_adam op: 0.8546912670135498 secondsTime to load fused_adam op: 0.8312933444976807 secondsTime to load fused_adam op: 0.8547146320343018 seconds
Time to load fused_adam op: 0.8446323871612549 secondsTime to load fused_adam op: 0.8547112941741943 secondsTime to load fused_adam op: 0.8448314666748047 seconds
Time to load fused_adam op: 0.8456475734710693 secondsTime to load fused_adam op: 0.8537054061889648 seconds


[2023-07-01 08:05:17,061] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.3, git-hash=unknown, git-branch=unknown
[2023-07-01 08:05:28,601] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-07-01 08:05:28,603] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer
[2023-07-01 08:05:28,603] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...


Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
[2023-07-01 08:05:28,624] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam
[2023-07-01 08:05:28,624] [INFO] [utils.py:54:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'deepspeed.ops.adam.fused_adam.FusedAdam'>
[2023-07-01 08:05:28,624] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer
[2023-07-01 08:05:28,624] [INFO] [stage_1_and_2.py:133:__init__] Reduce bucket size 500,000,000
[2023-07-01 08:05:28,624] [INFO] [stage_1_and_2.py:134:__init__] Allgather bucket size 500,000,000
[2023-07-01 08:05:28,624] [INFO] [stage_1_and_2.py:135:__init__] CPU Offload: False
[2023-07-01 08:05:28,624] [INFO] [stage_1_and_2.py:136:__init__] Round robin gradient partitioning: False
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Emitting ninja build file /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module utils...
Loading extension module utils...Loading extension module utils...
Loading extension module utils...

Loading extension module utils...Loading extension module utils...

Loading extension module utils...Loading extension module utils...

Time to load utils op: 0.5373373031616211 secondsTime to load utils op: 0.5420136451721191 secondsTime to load utils op: 0.542755126953125 secondsTime to load utils op: 0.5428259372711182 seconds
Time to load utils op: 0.535088300704956 seconds
Time to load utils op: 0.5429253578186035 seconds
Time to load utils op: 0.5427916049957275 seconds
Time to load utils op: 0.5428504943847656 seconds


Rank: 1 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 7 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 2 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 5 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 0 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 3 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 6 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 4 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step...

Loading extension module utils...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...Time to load utils op: 0.0010123252868652344 seconds

Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Time to load utils op: 0.001024007797241211 seconds
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.000904083251953125 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step...

Loading extension module utils...
Time to load utils op: 0.0009882450103759766 seconds
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0008978843688964844 seconds
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0014503002166748047 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0020041465759277344 seconds
[2023-07-01 08:05:40,006] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states
[2023-07-01 08:05:40,008] [INFO] [utils.py:786:see_memory_usage] MA 3.06 GB         Max_MA 3.06 GB         CA 3.07 GB         Max_CA 3 GB 
[2023-07-01 08:05:40,008] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 36.9 GB, percent = 3.7%
[2023-07-01 08:05:40,152] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states
[2023-07-01 08:05:40,153] [INFO] [utils.py:786:see_memory_usage] MA 4.29 GB         Max_MA 4.91 GB         CA 4.91 GB         Max_CA 5 GB 
[2023-07-01 08:05:40,153] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 36.9 GB, percent = 3.7%
[2023-07-01 08:05:40,153] [INFO] [stage_1_and_2.py:489:__init__] optimizer state initialized
[2023-07-01 08:05:40,294] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer
[2023-07-01 08:05:40,294] [INFO] [utils.py:786:see_memory_usage] MA 4.29 GB         Max_MA 4.29 GB         CA 4.91 GB         Max_CA 5 GB 
[2023-07-01 08:05:40,295] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 36.9 GB, percent = 3.7%
[2023-07-01 08:05:40,296] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2023-07-01 08:05:40,297] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2023-07-01 08:05:40,297] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x2b16b6509760>
[2023-07-01 08:05:40,297] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:05:40,297] [INFO] [config.py:960:print] DeepSpeedEngine configuration:
[2023-07-01 08:05:40,297] [INFO] [config.py:964:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   amp_enabled .................. False
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   amp_params ................... False
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   bfloat16_enabled ............. False
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   checkpoint_parallel_write_pipeline  False
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   checkpoint_tag_validation_enabled  True
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   checkpoint_tag_validation_fail  False
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x2b16c07b3af0>
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   communication_data_type ...... None
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   curriculum_enabled_legacy .... False
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   curriculum_params_legacy ..... False
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   data_efficiency_enabled ...... False
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   dataloader_drop_last ......... False
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   disable_allgather ............ False
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   dump_state ................... False
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1}
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   eigenvalue_enabled ........... False
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   eigenvalue_gas_boundary_resolution  1
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   eigenvalue_layer_num ......... 0
[2023-07-01 08:05:40,298] [INFO] [config.py:964:print]   eigenvalue_max_iter .......... 100
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   eigenvalue_stability ......... 1e-06
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   eigenvalue_tol ............... 0.01
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   eigenvalue_verbose ........... False
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   elasticity_enabled ........... False
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   fp16_auto_cast ............... False
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   fp16_enabled ................. True
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   fp16_master_weights_and_gradients  False
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   global_rank .................. 0
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   grad_accum_dtype ............. None
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   gradient_accumulation_steps .. 1
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   gradient_clipping ............ 1.0
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   gradient_predivide_factor .... 1.0
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   hybrid_engine ................ enabled=True max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   initial_dynamic_scale ........ 65536
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   load_universal_checkpoint .... False
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   loss_scale ................... 0
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   memory_breakdown ............. False
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   mics_hierarchial_params_gather  False
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   mics_shard_size .............. -1
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   optimizer_legacy_fusion ...... False
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   optimizer_name ............... None
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   optimizer_params ............. None
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   pld_enabled .................. False
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   pld_params ................... False
[2023-07-01 08:05:40,299] [INFO] [config.py:964:print]   prescale_gradients ........... False
[2023-07-01 08:05:40,300] [INFO] [config.py:964:print]   scheduler_name ............... None
[2023-07-01 08:05:40,300] [INFO] [config.py:964:print]   scheduler_params ............. None
[2023-07-01 08:05:40,300] [INFO] [config.py:964:print]   sparse_attention ............. None
[2023-07-01 08:05:40,300] [INFO] [config.py:964:print]   sparse_gradients_enabled ..... False
[2023-07-01 08:05:40,300] [INFO] [config.py:964:print]   steps_per_print .............. 10
[2023-07-01 08:05:40,300] [INFO] [config.py:964:print]   train_batch_size ............. 32
[2023-07-01 08:05:40,300] [INFO] [config.py:964:print]   train_micro_batch_size_per_gpu  4
[2023-07-01 08:05:40,300] [INFO] [config.py:964:print]   use_node_local_storage ....... False
[2023-07-01 08:05:40,300] [INFO] [config.py:964:print]   wall_clock_breakdown ......... False
[2023-07-01 08:05:40,300] [INFO] [config.py:964:print]   world_size ................... 8
[2023-07-01 08:05:40,300] [INFO] [config.py:964:print]   zero_allow_untested_optimizer  False
[2023-07-01 08:05:40,300] [INFO] [config.py:964:print]   zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False
[2023-07-01 08:05:40,300] [INFO] [config.py:964:print]   zero_enabled ................. True
[2023-07-01 08:05:40,300] [INFO] [config.py:964:print]   zero_force_ds_cpu_optimizer .. True
[2023-07-01 08:05:40,300] [INFO] [config.py:964:print]   zero_optimization_stage ...... 2
[2023-07-01 08:05:40,300] [INFO] [config.py:950:print_user_config]   json = {
    "train_batch_size": 32, 
    "train_micro_batch_size_per_gpu": 4, 
    "steps_per_print": 10, 
    "zero_optimization": {
        "stage": 2, 
        "offload_param": {
            "device": "none"
        }, 
        "offload_optimizer": {
            "device": "none"
        }, 
        "stage3_param_persistence_threshold": 1.000000e+04, 
        "stage3_max_live_parameters": 3.000000e+07, 
        "stage3_prefetch_bucket_size": 3.000000e+07, 
        "memory_efficient_linear": false
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale_window": 100
    }, 
    "gradient_clipping": 1.0, 
    "prescale_gradients": false, 
    "wall_clock_breakdown": false, 
    "hybrid_engine": {
        "enabled": true, 
        "max_out_tokens": 512, 
        "inference_tp_size": 1, 
        "release_inference_cache": false, 
        "pin_parameters": true, 
        "tp_gather_partition_size": 8
    }
}
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0010378360748291016 seconds
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combinationInstalled CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combinationInstalled CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination


Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117/transformer_inference/build.ninja...
Building extension module transformer_inference...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.5358097553253174 seconds
Loading extension module transformer_inference...Loading extension module transformer_inference...

Loading extension module transformer_inference...
Loading extension module transformer_inference...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.5629968643188477 seconds
Time to load transformer_inference op: 0.5577342510223389 seconds
Time to load transformer_inference op: 0.5631918907165527 seconds
Time to load transformer_inference op: 0.5636200904846191 seconds
Time to load transformer_inference op: 0.5582687854766846 seconds
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.5590822696685791 seconds
[2023-07-01 08:05:41,008] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 2048, 'intermediate_size': 8192, 'heads': 32, 'num_hidden_layers': -1, 'dtype': torch.float16, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.ReLU: 2>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 512, 'min_out_tokens': 512, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': True, 'transposed_mode': True}
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.5609304904937744 seconds
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.05214333534240723 seconds
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Time to load transformer_inference op: 0.051145315170288086 secondsLoading extension module transformer_inference...

Time to load transformer_inference op: 0.04891061782836914 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.052359580993652344 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.05263805389404297 seconds
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.054830074310302734 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.05211639404296875 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.056571245193481445 seconds
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.05420184135437012 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.05135607719421387 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.05627012252807617 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.05015301704406738 seconds
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.051249027252197266 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.0525355339050293 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.05635857582092285 seconds
******************[end] Initialized Actor Model [end] (duration: 49.96s)******************
*************************[start] Initializing Ref Model [start] **************************
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.05696511268615723 seconds
model loaded
[2023-07-01 08:05:58,473] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.3, git-hash=unknown, git-branch=unknown
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
[2023-07-01 08:06:08,838] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-07-01 08:06:08,840] [INFO] [config.py:960:print] DeepSpeedEngine configuration:
[2023-07-01 08:06:08,841] [INFO] [config.py:964:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2023-07-01 08:06:08,841] [INFO] [config.py:964:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-07-01 08:06:08,841] [INFO] [config.py:964:print]   amp_enabled .................. False
[2023-07-01 08:06:08,841] [INFO] [config.py:964:print]   amp_params ................... False
[2023-07-01 08:06:08,841] [INFO] [config.py:964:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2023-07-01 08:06:08,841] [INFO] [config.py:964:print]   bfloat16_enabled ............. False
[2023-07-01 08:06:08,841] [INFO] [config.py:964:print]   checkpoint_parallel_write_pipeline  False
[2023-07-01 08:06:08,841] [INFO] [config.py:964:print]   checkpoint_tag_validation_enabled  True
[2023-07-01 08:06:08,841] [INFO] [config.py:964:print]   checkpoint_tag_validation_fail  False
[2023-07-01 08:06:08,841] [INFO] [config.py:964:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x2b16c0ff48b0>
[2023-07-01 08:06:08,841] [INFO] [config.py:964:print]   communication_data_type ...... None
[2023-07-01 08:06:08,841] [INFO] [config.py:964:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-07-01 08:06:08,841] [INFO] [config.py:964:print]   curriculum_enabled_legacy .... False
[2023-07-01 08:06:08,841] [INFO] [config.py:964:print]   curriculum_params_legacy ..... False
[2023-07-01 08:06:08,841] [INFO] [config.py:964:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   data_efficiency_enabled ...... False
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   dataloader_drop_last ......... False
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   disable_allgather ............ False
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   dump_state ................... False
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   dynamic_loss_scale_args ...... None
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   eigenvalue_enabled ........... False
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   eigenvalue_gas_boundary_resolution  1
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   eigenvalue_layer_num ......... 0
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   eigenvalue_max_iter .......... 100
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   eigenvalue_stability ......... 1e-06
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   eigenvalue_tol ............... 0.01
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   eigenvalue_verbose ........... False
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   elasticity_enabled ........... False
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   fp16_auto_cast ............... False
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   fp16_enabled ................. True
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   fp16_master_weights_and_gradients  False
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   global_rank .................. 0
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   grad_accum_dtype ............. None
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   gradient_accumulation_steps .. 1
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   gradient_clipping ............ 1.0
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   gradient_predivide_factor .... 1.0
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   initial_dynamic_scale ........ 65536
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   load_universal_checkpoint .... False
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   loss_scale ................... 0
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   memory_breakdown ............. False
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   mics_hierarchial_params_gather  False
[2023-07-01 08:06:08,842] [INFO] [config.py:964:print]   mics_shard_size .............. -1
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   optimizer_legacy_fusion ...... False
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   optimizer_name ............... None
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   optimizer_params ............. None
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   pld_enabled .................. False
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   pld_params ................... False
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   prescale_gradients ........... False
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   scheduler_name ............... None
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   scheduler_params ............. None
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   sparse_attention ............. None
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   sparse_gradients_enabled ..... False
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   steps_per_print .............. 10
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   train_batch_size ............. 32
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   train_micro_batch_size_per_gpu  4
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   use_node_local_storage ....... False
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   wall_clock_breakdown ......... False
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   world_size ................... 8
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   zero_allow_untested_optimizer  False
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   zero_enabled ................. False
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   zero_force_ds_cpu_optimizer .. True
[2023-07-01 08:06:08,843] [INFO] [config.py:964:print]   zero_optimization_stage ...... 0
[2023-07-01 08:06:08,844] [INFO] [config.py:950:print_user_config]   json = {
    "train_batch_size": 32, 
    "train_micro_batch_size_per_gpu": 4, 
    "steps_per_print": 10, 
    "zero_optimization": {
        "stage": 0, 
        "stage3_param_persistence_threshold": 1.000000e+04, 
        "offload_param": {
            "device": "none"
        }, 
        "memory_efficient_linear": false
    }, 
    "fp16": {
        "enabled": true
    }, 
    "gradient_clipping": 1.0, 
    "prescale_gradients": false, 
    "wall_clock_breakdown": false
}
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.001287221908569336 seconds
*******************[end] Initialized Ref Model [end] (duration: 27.48s)*******************
*************************[start] Initializing EMA Model [start] **************************
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0016167163848876953 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0013256072998046875 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0018966197967529297 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0015711784362792969 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0012118816375732422 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.001928091049194336 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0018939971923828125 seconds
model loaded
[2023-07-01 08:06:24,937] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.3, git-hash=unknown, git-branch=unknown
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
[2023-07-01 08:06:35,566] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-07-01 08:06:35,581] [INFO] [config.py:960:print] DeepSpeedEngine configuration:
[2023-07-01 08:06:35,582] [INFO] [config.py:964:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2023-07-01 08:06:35,583] [INFO] [config.py:964:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-07-01 08:06:35,583] [INFO] [config.py:964:print]   amp_enabled .................. False
[2023-07-01 08:06:35,583] [INFO] [config.py:964:print]   amp_params ................... False
[2023-07-01 08:06:35,583] [INFO] [config.py:964:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2023-07-01 08:06:35,583] [INFO] [config.py:964:print]   bfloat16_enabled ............. False
[2023-07-01 08:06:35,583] [INFO] [config.py:964:print]   checkpoint_parallel_write_pipeline  False
[2023-07-01 08:06:35,583] [INFO] [config.py:964:print]   checkpoint_tag_validation_enabled  True
[2023-07-01 08:06:35,583] [INFO] [config.py:964:print]   checkpoint_tag_validation_fail  False
[2023-07-01 08:06:35,583] [INFO] [config.py:964:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x2b16b67ac130>
[2023-07-01 08:06:35,583] [INFO] [config.py:964:print]   communication_data_type ...... None
[2023-07-01 08:06:35,583] [INFO] [config.py:964:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-07-01 08:06:35,583] [INFO] [config.py:964:print]   curriculum_enabled_legacy .... False
[2023-07-01 08:06:35,583] [INFO] [config.py:964:print]   curriculum_params_legacy ..... False
[2023-07-01 08:06:35,583] [INFO] [config.py:964:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-07-01 08:06:35,583] [INFO] [config.py:964:print]   data_efficiency_enabled ...... False
[2023-07-01 08:06:35,583] [INFO] [config.py:964:print]   dataloader_drop_last ......... False
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   disable_allgather ............ False
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   dump_state ................... False
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   dynamic_loss_scale_args ...... None
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   eigenvalue_enabled ........... False
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   eigenvalue_gas_boundary_resolution  1
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   eigenvalue_layer_num ......... 0
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   eigenvalue_max_iter .......... 100
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   eigenvalue_stability ......... 1e-06
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   eigenvalue_tol ............... 0.01
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   eigenvalue_verbose ........... False
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   elasticity_enabled ........... False
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   fp16_auto_cast ............... False
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   fp16_enabled ................. True
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   fp16_master_weights_and_gradients  False
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   global_rank .................. 0
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   grad_accum_dtype ............. None
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   gradient_accumulation_steps .. 1
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   gradient_clipping ............ 1.0
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   gradient_predivide_factor .... 1.0
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   initial_dynamic_scale ........ 65536
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   load_universal_checkpoint .... False
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   loss_scale ................... 0
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   memory_breakdown ............. False
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   mics_hierarchial_params_gather  False
[2023-07-01 08:06:35,584] [INFO] [config.py:964:print]   mics_shard_size .............. -1
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   optimizer_legacy_fusion ...... False
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   optimizer_name ............... None
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   optimizer_params ............. None
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   pld_enabled .................. False
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   pld_params ................... False
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   prescale_gradients ........... False
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   scheduler_name ............... None
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   scheduler_params ............. None
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   sparse_attention ............. None
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   sparse_gradients_enabled ..... False
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   steps_per_print .............. 10
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   train_batch_size ............. 32
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   train_micro_batch_size_per_gpu  4
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   use_node_local_storage ....... False
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   wall_clock_breakdown ......... False
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   world_size ................... 8
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   zero_allow_untested_optimizer  False
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   zero_enabled ................. False
[2023-07-01 08:06:35,585] [INFO] [config.py:964:print]   zero_force_ds_cpu_optimizer .. True
[2023-07-01 08:06:35,586] [INFO] [config.py:964:print]   zero_optimization_stage ...... 0
[2023-07-01 08:06:35,586] [INFO] [config.py:950:print_user_config]   json = {
    "train_batch_size": 32, 
    "train_micro_batch_size_per_gpu": 4, 
    "steps_per_print": 10, 
    "zero_optimization": {
        "stage": 0, 
        "stage3_param_persistence_threshold": 1.000000e+04, 
        "offload_param": {
            "device": "none"
        }, 
        "memory_efficient_linear": false
    }, 
    "fp16": {
        "enabled": true
    }, 
    "gradient_clipping": 1.0, 
    "prescale_gradients": false, 
    "wall_clock_breakdown": false
}
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.01014089584350586 seconds
*******************[end] Initialized EMA Model [end] (duration: 26.75s)*******************
************************[start] Initializing Critic Model [start] ************************
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0024161338806152344 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.001081705093383789 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0011301040649414062 seconds
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0009920597076416016 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.001585245132446289 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0011470317840576172 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0016450881958007812 seconds
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.015161752700805664 seconds
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination

Loading extension module fused_adam...
Time to load fused_adam op: 0.010671377182006836 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.003968000411987305 seconds
[2023-07-01 08:06:52,465] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.3, git-hash=unknown, git-branch=unknown
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.023074865341186523 seconds
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.005225181579589844 seconds
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.023493051528930664 seconds
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.0221707820892334 seconds
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.07556438446044922 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.030516624450683594 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...

No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.005091190338134766 seconds
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.004791975021362305 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0013506412506103516 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...

Loading extension module utils...
Time to load utils op: 0.002511739730834961 seconds
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0009350776672363281 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.001844644546508789 seconds
[2023-07-01 08:07:01,983] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-07-01 08:07:01,985] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer
[2023-07-01 08:07:01,985] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2023-07-01 08:07:02,001] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam
[2023-07-01 08:07:02,001] [INFO] [utils.py:54:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'deepspeed.ops.adam.fused_adam.FusedAdam'>
[2023-07-01 08:07:02,001] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer
[2023-07-01 08:07:02,002] [INFO] [stage_1_and_2.py:133:__init__] Reduce bucket size 500,000,000
[2023-07-01 08:07:02,002] [INFO] [stage_1_and_2.py:134:__init__] Allgather bucket size 500,000,000
[2023-07-01 08:07:02,002] [INFO] [stage_1_and_2.py:135:__init__] CPU Offload: False
[2023-07-01 08:07:02,002] [INFO] [stage_1_and_2.py:136:__init__] Round robin gradient partitioning: False
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.002156496047973633 seconds
Rank: 0 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 1 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 3 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 2 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 7 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 4 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 5 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 6 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...

Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...

No modifications detected for re-loaded extension module utils, skipping build step...Time to load utils op: 0.0013456344604492188 secondsLoading extension module utils...Loading extension module utils...

Time to load utils op: 0.00107574462890625 seconds

Loading extension module utils...

No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0005922317504882812 seconds
Time to load utils op: 0.0008847713470458984 seconds
Time to load utils op: 0.0013260841369628906 seconds
Time to load utils op: 0.0009319782257080078 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0009717941284179688 seconds
[2023-07-01 08:07:11,072] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states
[2023-07-01 08:07:11,072] [INFO] [utils.py:786:see_memory_usage] MA 10.58 GB         Max_MA 10.58 GB         CA 10.97 GB         Max_CA 11 GB 
[2023-07-01 08:07:11,073] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 59.75 GB, percent = 5.9%
[2023-07-01 08:07:11,396] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states
[2023-07-01 08:07:11,398] [INFO] [utils.py:786:see_memory_usage] MA 10.89 GB         Max_MA 11.05 GB         CA 11.43 GB         Max_CA 11 GB 
[2023-07-01 08:07:11,398] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 60.85 GB, percent = 6.0%
[2023-07-01 08:07:11,399] [INFO] [stage_1_and_2.py:489:__init__] optimizer state initialized
[2023-07-01 08:07:11,651] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer
[2023-07-01 08:07:11,652] [INFO] [utils.py:786:see_memory_usage] MA 10.89 GB         Max_MA 10.89 GB         CA 11.43 GB         Max_CA 11 GB 
[2023-07-01 08:07:11,652] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 61.73 GB, percent = 6.1%
[2023-07-01 08:07:11,654] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2023-07-01 08:07:11,654] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2023-07-01 08:07:11,654] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x2b16c07b3a90>
[2023-07-01 08:07:11,654] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:07:11,655] [INFO] [config.py:960:print] DeepSpeedEngine configuration:
[2023-07-01 08:07:11,655] [INFO] [config.py:964:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2023-07-01 08:07:11,655] [INFO] [config.py:964:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-07-01 08:07:11,655] [INFO] [config.py:964:print]   amp_enabled .................. False
[2023-07-01 08:07:11,655] [INFO] [config.py:964:print]   amp_params ................... False
[2023-07-01 08:07:11,655] [INFO] [config.py:964:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2023-07-01 08:07:11,655] [INFO] [config.py:964:print]   bfloat16_enabled ............. False
[2023-07-01 08:07:11,655] [INFO] [config.py:964:print]   checkpoint_parallel_write_pipeline  False
[2023-07-01 08:07:11,655] [INFO] [config.py:964:print]   checkpoint_tag_validation_enabled  True
[2023-07-01 08:07:11,655] [INFO] [config.py:964:print]   checkpoint_tag_validation_fail  False
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x2b16c35c8460>
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   communication_data_type ...... None
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   curriculum_enabled_legacy .... False
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   curriculum_params_legacy ..... False
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   data_efficiency_enabled ...... False
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   dataloader_drop_last ......... False
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   disable_allgather ............ False
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   dump_state ................... False
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1}
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   eigenvalue_enabled ........... False
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   eigenvalue_gas_boundary_resolution  1
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   eigenvalue_layer_num ......... 0
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   eigenvalue_max_iter .......... 100
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   eigenvalue_stability ......... 1e-06
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   eigenvalue_tol ............... 0.01
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   eigenvalue_verbose ........... False
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   elasticity_enabled ........... False
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   fp16_auto_cast ............... False
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   fp16_enabled ................. True
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   fp16_master_weights_and_gradients  False
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   global_rank .................. 0
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   grad_accum_dtype ............. None
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   gradient_accumulation_steps .. 1
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   gradient_clipping ............ 1.0
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   gradient_predivide_factor .... 1.0
[2023-07-01 08:07:11,656] [INFO] [config.py:964:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   initial_dynamic_scale ........ 65536
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   load_universal_checkpoint .... False
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   loss_scale ................... 0
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   memory_breakdown ............. False
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   mics_hierarchial_params_gather  False
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   mics_shard_size .............. -1
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   optimizer_legacy_fusion ...... False
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   optimizer_name ............... None
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   optimizer_params ............. None
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   pld_enabled .................. False
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   pld_params ................... False
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   prescale_gradients ........... False
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   scheduler_name ............... None
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   scheduler_params ............. None
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   sparse_attention ............. None
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   sparse_gradients_enabled ..... False
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   steps_per_print .............. 10
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   train_batch_size ............. 32
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   train_micro_batch_size_per_gpu  4
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   use_node_local_storage ....... False
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   wall_clock_breakdown ......... False
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   world_size ................... 8
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   zero_allow_untested_optimizer  False
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   zero_enabled ................. True
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   zero_force_ds_cpu_optimizer .. True
[2023-07-01 08:07:11,657] [INFO] [config.py:964:print]   zero_optimization_stage ...... 2
[2023-07-01 08:07:11,658] [INFO] [config.py:950:print_user_config]   json = {
    "train_batch_size": 32, 
    "train_micro_batch_size_per_gpu": 4, 
    "steps_per_print": 10, 
    "zero_optimization": {
        "stage": 2, 
        "offload_param": {
            "device": "none"
        }, 
        "offload_optimizer": {
            "device": "none"
        }, 
        "stage3_param_persistence_threshold": 1.000000e+04, 
        "stage3_max_live_parameters": 3.000000e+07, 
        "stage3_prefetch_bucket_size": 3.000000e+07, 
        "memory_efficient_linear": false
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale_window": 100
    }, 
    "gradient_clipping": 1.0, 
    "prescale_gradients": false, 
    "wall_clock_breakdown": false, 
    "hybrid_engine": {
        "enabled": false, 
        "max_out_tokens": 512, 
        "inference_tp_size": 1, 
        "release_inference_cache": false, 
        "pin_parameters": true, 
        "tp_gather_partition_size": 8
    }
}
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0016736984252929688 seconds
*****************[end] Initialized Critic Model [end] (duration: 36.06s)******************
************************[start] Initializing Reward Model [start] ************************
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
[2023-07-01 08:07:25,026] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.3, git-hash=unknown, git-branch=unknown
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.01121068000793457 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.001344919204711914 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0012459754943847656 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0020864009857177734 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0013394355773925781 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0023109912872314453 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.002219676971435547 seconds
[2023-07-01 08:07:33,041] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-07-01 08:07:33,043] [INFO] [config.py:960:print] DeepSpeedEngine configuration:
[2023-07-01 08:07:33,043] [INFO] [config.py:964:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2023-07-01 08:07:33,043] [INFO] [config.py:964:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-07-01 08:07:33,043] [INFO] [config.py:964:print]   amp_enabled .................. False
[2023-07-01 08:07:33,043] [INFO] [config.py:964:print]   amp_params ................... False
[2023-07-01 08:07:33,043] [INFO] [config.py:964:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2023-07-01 08:07:33,043] [INFO] [config.py:964:print]   bfloat16_enabled ............. False
[2023-07-01 08:07:33,043] [INFO] [config.py:964:print]   checkpoint_parallel_write_pipeline  False
[2023-07-01 08:07:33,043] [INFO] [config.py:964:print]   checkpoint_tag_validation_enabled  True
[2023-07-01 08:07:33,043] [INFO] [config.py:964:print]   checkpoint_tag_validation_fail  False
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x2b16c366bfa0>
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   communication_data_type ...... None
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   curriculum_enabled_legacy .... False
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   curriculum_params_legacy ..... False
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   data_efficiency_enabled ...... False
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   dataloader_drop_last ......... False
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   disable_allgather ............ False
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   dump_state ................... False
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   dynamic_loss_scale_args ...... None
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   eigenvalue_enabled ........... False
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   eigenvalue_gas_boundary_resolution  1
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   eigenvalue_layer_num ......... 0
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   eigenvalue_max_iter .......... 100
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   eigenvalue_stability ......... 1e-06
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   eigenvalue_tol ............... 0.01
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   eigenvalue_verbose ........... False
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   elasticity_enabled ........... False
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   fp16_auto_cast ............... False
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   fp16_enabled ................. True
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   fp16_master_weights_and_gradients  False
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   global_rank .................. 0
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   grad_accum_dtype ............. None
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   gradient_accumulation_steps .. 1
[2023-07-01 08:07:33,044] [INFO] [config.py:964:print]   gradient_clipping ............ 1.0
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   gradient_predivide_factor .... 1.0
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   initial_dynamic_scale ........ 65536
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   load_universal_checkpoint .... False
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   loss_scale ................... 0
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   memory_breakdown ............. False
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   mics_hierarchial_params_gather  False
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   mics_shard_size .............. -1
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   optimizer_legacy_fusion ...... False
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   optimizer_name ............... None
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   optimizer_params ............. None
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   pld_enabled .................. False
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   pld_params ................... False
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   prescale_gradients ........... False
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   scheduler_name ............... None
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   scheduler_params ............. None
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   sparse_attention ............. None
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   sparse_gradients_enabled ..... False
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   steps_per_print .............. 10
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   train_batch_size ............. 32
[2023-07-01 08:07:33,045] [INFO] [config.py:964:print]   train_micro_batch_size_per_gpu  4
[2023-07-01 08:07:33,046] [INFO] [config.py:964:print]   use_node_local_storage ....... False
[2023-07-01 08:07:33,046] [INFO] [config.py:964:print]   wall_clock_breakdown ......... False
[2023-07-01 08:07:33,046] [INFO] [config.py:964:print]   world_size ................... 8
[2023-07-01 08:07:33,046] [INFO] [config.py:964:print]   zero_allow_untested_optimizer  False
[2023-07-01 08:07:33,046] [INFO] [config.py:964:print]   zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False
[2023-07-01 08:07:33,046] [INFO] [config.py:964:print]   zero_enabled ................. False
[2023-07-01 08:07:33,046] [INFO] [config.py:964:print]   zero_force_ds_cpu_optimizer .. True
[2023-07-01 08:07:33,046] [INFO] [config.py:964:print]   zero_optimization_stage ...... 0
[2023-07-01 08:07:33,046] [INFO] [config.py:950:print_user_config]   json = {
    "train_batch_size": 32, 
    "train_micro_batch_size_per_gpu": 4, 
    "steps_per_print": 10, 
    "zero_optimization": {
        "stage": 0, 
        "stage3_param_persistence_threshold": 1.000000e+04, 
        "offload_param": {
            "device": "none"
        }, 
        "memory_efficient_linear": false
    }, 
    "fp16": {
        "enabled": true
    }, 
    "gradient_clipping": 1.0, 
    "prescale_gradients": false, 
    "wall_clock_breakdown": false
}
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0013327598571777344 seconds
*****************[end] Initialized Reward Model [end] (duration: 21.39s)******************
***** Running training *****
Beginning of Epoch 1/1, Total Generation Batches 954
------------------------------------------------------
Free memory : 65.963745 (GigaBytes)  
Total memory: 79.096497 (GigaBytes)  
Requested memory: 1.031250 (GigaBytes) 
Setting maximum total tokens (input + output) to 512 
WorkSpace: 0x2b1c0c000000 
------------------------------------------------------
[2023-07-01 08:07:36,761] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1
[2023-07-01 08:07:36,919] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 0|ppo_ep: 1|act_loss: 0.00479888916015625|cri_loss: 0.201416015625|unsuper_loss: 0.0
average reward score: -1.482421875
-------------------------------------------------------------------------------------
|E2E latency=3.85s |Gather latency=0.00s (0.00%) |Generate time=2.86s (74.28%) |Training time=0.81s (21.07%) |Others=0.18 (4.65%)|CurSamplesPerSec=8.31 |AvgSamplesPerSec=8.31
[2023-07-01 08:07:39,051] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768
[2023-07-01 08:07:39,207] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768
epoch: 0|step: 1|ppo_ep: 1|act_loss: -0.30029296875|cri_loss: 1.76953125|unsuper_loss: 0.0
average reward score: -3.720703125
-------------------------------------------------------------------------------------
|E2E latency=2.29s |Gather latency=0.00s (0.00%) |Generate time=1.51s (66.00%) |Training time=0.60s (26.25%) |Others=0.18 (7.75%)|CurSamplesPerSec=14.00 |AvgSamplesPerSec=10.43
[2023-07-01 08:07:41,323] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384
[2023-07-01 08:07:41,480] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384
epoch: 0|step: 2|ppo_ep: 1|act_loss: -0.11614990234375|cri_loss: 0.6455078125|unsuper_loss: 0.0
average reward score: -1.78515625
-------------------------------------------------------------------------------------
|E2E latency=2.27s |Gather latency=0.00s (0.00%) |Generate time=1.50s (65.88%) |Training time=0.60s (26.37%) |Others=0.18 (7.75%)|CurSamplesPerSec=14.07 |AvgSamplesPerSec=11.41
[2023-07-01 08:07:43,945] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192
epoch: 0|step: 3|ppo_ep: 1|act_loss: -0.0853271484375|cri_loss: 0.2236328125|unsuper_loss: 0.0
average reward score: 0.70947265625
-------------------------------------------------------------------------------------
|E2E latency=2.46s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.84%) |Training time=0.79s (32.02%) |Others=0.18 (7.14%)|CurSamplesPerSec=12.99 |AvgSamplesPerSec=11.77
[2023-07-01 08:07:46,067] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192
epoch: 0|step: 4|ppo_ep: 1|act_loss: -0.032318115234375|cri_loss: 0.200439453125|unsuper_loss: 0.0
average reward score: -0.22509765625
-------------------------------------------------------------------------------------
|E2E latency=2.32s |Gather latency=0.00s (0.00%) |Generate time=1.50s (64.50%) |Training time=0.60s (26.01%) |Others=0.22 (9.49%)|CurSamplesPerSec=13.78 |AvgSamplesPerSec=12.12
epoch: 0|step: 5|ppo_ep: 1|act_loss: -0.345458984375|cri_loss: 1.0078125|unsuper_loss: 0.0
average reward score: -0.45458984375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.67%) |Training time=0.79s (31.56%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.74 |AvgSamplesPerSec=12.22
[2023-07-01 08:07:51,234] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096
epoch: 0|step: 6|ppo_ep: 1|act_loss: 0.09051513671875|cri_loss: 0.2021484375|unsuper_loss: 0.0
average reward score: 0.46240234375
-------------------------------------------------------------------------------------
|E2E latency=2.45s |Gather latency=0.00s (0.00%) |Generate time=1.49s (60.84%) |Training time=0.79s (32.02%) |Others=0.18 (7.14%)|CurSamplesPerSec=13.04 |AvgSamplesPerSec=12.33
epoch: 0|step: 7|ppo_ep: 1|act_loss: 0.128662109375|cri_loss: 0.1058349609375|unsuper_loss: 0.0
average reward score: -1.6728515625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.04%) |Training time=0.78s (31.13%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.39
[2023-07-01 08:07:55,843] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096
epoch: 0|step: 8|ppo_ep: 1|act_loss: -0.116455078125|cri_loss: 0.52734375|unsuper_loss: 0.0
average reward score: -0.121826171875
-------------------------------------------------------------------------------------
|E2E latency=2.31s |Gather latency=0.00s (0.00%) |Generate time=1.50s (64.78%) |Training time=0.60s (25.74%) |Others=0.22 (9.48%)|CurSamplesPerSec=13.84 |AvgSamplesPerSec=12.54
[2023-07-01 08:07:58,169] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=5, lr=[4.825e-07, 4.825e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:07:58,344] [INFO] [timer.py:215:stop] epoch=0/micro_step=10/global_step=10, RunningAvgSamplesPerSec=57.79174715187453, CurrSamplesPerSec=51.394899251349514, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:07:58,506] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=5, lr=[2.5000000000000004e-07, 2.5000000000000004e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 9|ppo_ep: 1|act_loss: 0.11492919921875|cri_loss: 0.1263427734375|unsuper_loss: 0.0
average reward score: -0.98779296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.82%) |Training time=0.79s (31.39%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.56
epoch: 0|step: 10|ppo_ep: 1|act_loss: 0.10296630859375|cri_loss: 0.1690673828125|unsuper_loss: 0.0
average reward score: 0.268798828125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.83%) |Training time=0.78s (31.31%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.58
epoch: 0|step: 11|ppo_ep: 1|act_loss: -0.005687713623046875|cri_loss: 0.12103271484375|unsuper_loss: 0.0
average reward score: -1.0966796875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.86%) |Training time=0.78s (31.27%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.60
epoch: 0|step: 12|ppo_ep: 1|act_loss: -0.173583984375|cri_loss: 0.69140625|unsuper_loss: 0.0
average reward score: -2.2421875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.85%) |Training time=0.78s (31.37%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.61
epoch: 0|step: 13|ppo_ep: 1|act_loss: -0.25|cri_loss: 0.22216796875|unsuper_loss: 0.0
average reward score: 0.267578125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.89%) |Training time=0.78s (31.28%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.63
epoch: 0|step: 14|ppo_ep: 1|act_loss: -0.3173828125|cri_loss: 0.396728515625|unsuper_loss: 0.0
average reward score: -1.21875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.75%) |Training time=0.78s (31.38%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.64
epoch: 0|step: 15|ppo_ep: 1|act_loss: -0.07391357421875|cri_loss: 0.1260986328125|unsuper_loss: 0.0
average reward score: 1.1376953125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.79%) |Training time=0.79s (31.34%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.65
epoch: 0|step: 16|ppo_ep: 1|act_loss: -0.184814453125|cri_loss: 0.1656494140625|unsuper_loss: 0.0
average reward score: -0.33447265625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.75%) |Training time=0.79s (31.37%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.65
epoch: 0|step: 17|ppo_ep: 1|act_loss: -0.11932373046875|cri_loss: 0.05841064453125|unsuper_loss: 0.0
average reward score: 1.076171875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.90%) |Training time=0.78s (31.31%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.66
[2023-07-01 08:08:20,667] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048
epoch: 0|step: 18|ppo_ep: 1|act_loss: -0.076171875|cri_loss: 0.2410888671875|unsuper_loss: 0.0
average reward score: -0.114990234375
-------------------------------------------------------------------------------------
|E2E latency=2.31s |Gather latency=0.00s (0.00%) |Generate time=1.50s (64.77%) |Training time=0.59s (25.71%) |Others=0.22 (9.53%)|CurSamplesPerSec=13.86 |AvgSamplesPerSec=12.72
[2023-07-01 08:08:22,994] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=6, lr=[1.3510000000000003e-06, 1.3510000000000003e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:08:23,174] [INFO] [timer.py:215:stop] epoch=0/micro_step=20/global_step=20, RunningAvgSamplesPerSec=55.12417224553301, CurrSamplesPerSec=51.5406103261145, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:08:23,336] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=5, lr=[7.5e-07, 7.5e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 19|ppo_ep: 1|act_loss: -0.1875|cri_loss: 0.1397705078125|unsuper_loss: 0.0
average reward score: 0.066162109375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.92%) |Training time=0.78s (31.26%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.72
epoch: 0|step: 20|ppo_ep: 1|act_loss: -0.11083984375|cri_loss: 0.08123779296875|unsuper_loss: 0.0
average reward score: -0.359619140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.06%) |Training time=0.78s (31.08%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.73
epoch: 0|step: 21|ppo_ep: 1|act_loss: -0.12493896484375|cri_loss: 1.5810546875|unsuper_loss: 0.0
average reward score: -0.47119140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.26%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.73
epoch: 0|step: 22|ppo_ep: 1|act_loss: 0.006500244140625|cri_loss: 0.096923828125|unsuper_loss: 0.0
average reward score: 0.015869140625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.95%) |Training time=0.78s (31.27%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.73
epoch: 0|step: 23|ppo_ep: 1|act_loss: -0.264404296875|cri_loss: 0.2071533203125|unsuper_loss: 0.0
average reward score: 0.58056640625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.03%) |Training time=0.78s (31.19%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.74
epoch: 0|step: 24|ppo_ep: 1|act_loss: -0.0160980224609375|cri_loss: 0.040435791015625|unsuper_loss: 0.0
average reward score: 0.701171875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.79s (31.41%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.74
epoch: 0|step: 25|ppo_ep: 1|act_loss: 0.0157470703125|cri_loss: 0.09912109375|unsuper_loss: 0.0
average reward score: 0.88232421875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.75%) |Training time=0.79s (31.42%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.74
epoch: 0|step: 26|ppo_ep: 1|act_loss: -0.1885986328125|cri_loss: 0.146728515625|unsuper_loss: 0.0
average reward score: 0.328857421875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.09%) |Training time=0.78s (31.13%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.75
epoch: 0|step: 27|ppo_ep: 1|act_loss: -0.050872802734375|cri_loss: 0.25634765625|unsuper_loss: 0.0
average reward score: 1.8916015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.25%) |Training time=0.77s (30.92%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.75
epoch: 0|step: 28|ppo_ep: 1|act_loss: -0.050018310546875|cri_loss: 0.1875|unsuper_loss: 0.0
average reward score: 0.7919921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.02%) |Training time=0.78s (31.13%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.75
[2023-07-01 08:08:47,979] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=6, lr=[2.316e-06, 2.316e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:08:48,155] [INFO] [timer.py:215:stop] epoch=0/micro_step=30/global_step=30, RunningAvgSamplesPerSec=53.93328586330883, CurrSamplesPerSec=51.76409642894195, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:08:48,316] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=5, lr=[1.25e-06, 1.25e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 29|ppo_ep: 1|act_loss: -0.143310546875|cri_loss: 0.155517578125|unsuper_loss: 0.0
average reward score: -0.705078125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.91%) |Training time=0.78s (31.23%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.75
[2023-07-01 08:08:50,477] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024
epoch: 0|step: 30|ppo_ep: 1|act_loss: -0.201171875|cri_loss: 0.245361328125|unsuper_loss: 0.0
average reward score: 1.01171875
-------------------------------------------------------------------------------------
|E2E latency=2.32s |Gather latency=0.00s (0.00%) |Generate time=1.50s (64.69%) |Training time=0.60s (25.78%) |Others=0.22 (9.52%)|CurSamplesPerSec=13.78 |AvgSamplesPerSec=12.78
epoch: 0|step: 31|ppo_ep: 1|act_loss: -0.1300048828125|cri_loss: 0.458740234375|unsuper_loss: 0.0
average reward score: 1.4248046875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.16%) |Training time=0.78s (31.06%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.78
epoch: 0|step: 32|ppo_ep: 1|act_loss: -0.033416748046875|cri_loss: 0.2181396484375|unsuper_loss: 0.0
average reward score: 2.3203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.08%) |Training time=0.78s (31.12%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.78
epoch: 0|step: 33|ppo_ep: 1|act_loss: 0.0909423828125|cri_loss: 0.2308349609375|unsuper_loss: 0.0
average reward score: 2.40625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.89%) |Training time=0.78s (31.30%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.78
epoch: 0|step: 34|ppo_ep: 1|act_loss: 0.16015625|cri_loss: 0.380859375|unsuper_loss: 0.0
average reward score: 0.155517578125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.72%) |Training time=0.79s (31.44%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.78
[2023-07-01 08:09:03,152] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048
epoch: 0|step: 35|ppo_ep: 1|act_loss: -0.5283203125|cri_loss: 0.5498046875|unsuper_loss: 0.0
average reward score: 0.50830078125
-------------------------------------------------------------------------------------
|E2E latency=2.46s |Gather latency=0.00s (0.00%) |Generate time=1.50s (61.05%) |Training time=0.78s (31.90%) |Others=0.17 (7.06%)|CurSamplesPerSec=13.02 |AvgSamplesPerSec=12.79
epoch: 0|step: 36|ppo_ep: 1|act_loss: -0.3828125|cri_loss: 0.293701171875|unsuper_loss: 0.0
average reward score: 2.45703125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.05%) |Training time=0.78s (31.14%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.79
epoch: 0|step: 37|ppo_ep: 1|act_loss: 0.09698486328125|cri_loss: 1.140625|unsuper_loss: 0.0
average reward score: 1.140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.06%) |Training time=0.78s (31.13%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.79
epoch: 0|step: 38|ppo_ep: 1|act_loss: -0.0311279296875|cri_loss: 0.875|unsuper_loss: 0.0
average reward score: 1.3427734375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.77%) |Training time=0.79s (31.45%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.79
[2023-07-01 08:09:12,781] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=7, lr=[3.1845000000000006e-06, 3.1845000000000006e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:09:12,956] [INFO] [timer.py:215:stop] epoch=0/micro_step=40/global_step=40, RunningAvgSamplesPerSec=53.75452823610218, CurrSamplesPerSec=51.27585440192057, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:09:13,117] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=6, lr=[1.7000000000000002e-06, 1.7000000000000002e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 39|ppo_ep: 1|act_loss: -0.1375732421875|cri_loss: 0.295654296875|unsuper_loss: 0.0
average reward score: 0.8251953125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.78%) |Training time=0.79s (31.47%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.79
epoch: 0|step: 40|ppo_ep: 1|act_loss: 0.08258056640625|cri_loss: 0.638671875|unsuper_loss: 0.0
average reward score: 1.732421875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.19%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.79
epoch: 0|step: 41|ppo_ep: 1|act_loss: 0.0699462890625|cri_loss: 0.9794921875|unsuper_loss: 0.0
average reward score: 0.386474609375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.79s (31.44%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.79
epoch: 0|step: 42|ppo_ep: 1|act_loss: 0.322509765625|cri_loss: 0.9208984375|unsuper_loss: 0.0
average reward score: 1.583984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.79s (31.45%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.79
epoch: 0|step: 43|ppo_ep: 1|act_loss: -0.00948333740234375|cri_loss: 0.356689453125|unsuper_loss: 0.0
average reward score: 2.111328125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.97%) |Training time=0.78s (31.24%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.79
epoch: 0|step: 44|ppo_ep: 1|act_loss: 0.040313720703125|cri_loss: 1.0302734375|unsuper_loss: 0.0
average reward score: 0.6337890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.20%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.79
epoch: 0|step: 45|ppo_ep: 1|act_loss: -0.0197906494140625|cri_loss: 0.369140625|unsuper_loss: 0.0
average reward score: 2.9453125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.04%) |Training time=0.78s (31.18%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.79
epoch: 0|step: 46|ppo_ep: 1|act_loss: 0.023040771484375|cri_loss: 0.261474609375|unsuper_loss: 0.0
average reward score: -0.0382080078125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.10%) |Training time=0.78s (31.07%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.79
epoch: 0|step: 47|ppo_ep: 1|act_loss: 0.08148193359375|cri_loss: 0.2861328125|unsuper_loss: 0.0
average reward score: 2.01953125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.87%) |Training time=0.78s (31.24%) |Others=0.22 (8.90%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.79
epoch: 0|step: 48|ppo_ep: 1|act_loss: -0.533203125|cri_loss: 0.65380859375|unsuper_loss: 0.0
average reward score: 2.21484375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.81%) |Training time=0.79s (31.36%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.79
[2023-07-01 08:09:37,769] [INFO] [logging.py:96:log_dist] [Rank 0] step=50, skipped=7, lr=[4.149500000000001e-06, 4.149500000000001e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:09:37,950] [INFO] [timer.py:215:stop] epoch=0/micro_step=50/global_step=50, RunningAvgSamplesPerSec=53.31678833276616, CurrSamplesPerSec=51.09898873914861, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:09:38,109] [INFO] [logging.py:96:log_dist] [Rank 0] step=50, skipped=6, lr=[2.2e-06, 2.2e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 49|ppo_ep: 1|act_loss: 0.08367919921875|cri_loss: 0.22802734375|unsuper_loss: 0.0
average reward score: 2.3828125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.79s (31.53%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.79
epoch: 0|step: 50|ppo_ep: 1|act_loss: 0.10125732421875|cri_loss: 0.251708984375|unsuper_loss: 0.0
average reward score: 0.8916015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.40%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.79
epoch: 0|step: 51|ppo_ep: 1|act_loss: -0.1507568359375|cri_loss: 0.33447265625|unsuper_loss: 0.0
average reward score: 0.9453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.84%) |Training time=0.78s (31.33%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.79
epoch: 0|step: 52|ppo_ep: 1|act_loss: 0.10089111328125|cri_loss: 0.299560546875|unsuper_loss: 0.0
average reward score: 2.8203125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.65%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.79
epoch: 0|step: 53|ppo_ep: 1|act_loss: -0.151123046875|cri_loss: 0.250244140625|unsuper_loss: 0.0
average reward score: 2.5703125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.34%) |Training time=0.80s (31.91%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.73 |AvgSamplesPerSec=12.79
[2023-07-01 08:09:50,278] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1024, reducing to 512
epoch: 0|step: 54|ppo_ep: 1|act_loss: -0.341796875|cri_loss: 0.6640625|unsuper_loss: 0.0
average reward score: 1.484375
-------------------------------------------------------------------------------------
|E2E latency=2.31s |Gather latency=0.00s (0.00%) |Generate time=1.49s (64.68%) |Training time=0.60s (25.76%) |Others=0.22 (9.56%)|CurSamplesPerSec=13.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 55|ppo_ep: 1|act_loss: 0.1827392578125|cri_loss: 0.251220703125|unsuper_loss: 0.0
average reward score: 1.4091796875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.90%) |Training time=0.78s (31.33%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 56|ppo_ep: 1|act_loss: 0.2410888671875|cri_loss: 0.34375|unsuper_loss: 0.0
average reward score: 1.634765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.08%) |Training time=0.78s (31.10%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 57|ppo_ep: 1|act_loss: 0.279541015625|cri_loss: 0.3466796875|unsuper_loss: 0.0
average reward score: 1.4453125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.11%) |Training time=0.77s (31.06%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 58|ppo_ep: 1|act_loss: -0.273193359375|cri_loss: 0.72021484375|unsuper_loss: 0.0
average reward score: 1.748046875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.21%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
[2023-07-01 08:10:02,583] [INFO] [logging.py:96:log_dist] [Rank 0] step=60, skipped=8, lr=[5.018000000000001e-06, 5.018000000000001e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:10:02,758] [INFO] [timer.py:215:stop] epoch=0/micro_step=60/global_step=60, RunningAvgSamplesPerSec=53.29622440894831, CurrSamplesPerSec=51.95278262916032, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:10:02,917] [INFO] [logging.py:96:log_dist] [Rank 0] step=60, skipped=6, lr=[2.7000000000000004e-06, 2.7000000000000004e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 59|ppo_ep: 1|act_loss: 0.0276947021484375|cri_loss: 0.2529296875|unsuper_loss: 0.0
average reward score: 1.58984375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.02%) |Training time=0.78s (31.25%) |Others=0.22 (8.73%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 60|ppo_ep: 1|act_loss: -0.6240234375|cri_loss: 0.80859375|unsuper_loss: 0.0
average reward score: 2.216796875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.90%) |Training time=0.78s (31.30%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 61|ppo_ep: 1|act_loss: 0.04522705078125|cri_loss: 0.372802734375|unsuper_loss: 0.0
average reward score: 1.447265625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.81%) |Training time=0.78s (31.41%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 62|ppo_ep: 1|act_loss: 0.06988525390625|cri_loss: 0.3798828125|unsuper_loss: 0.0
average reward score: 1.2255859375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.81%) |Training time=0.78s (31.40%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 63|ppo_ep: 1|act_loss: 0.26513671875|cri_loss: 0.4345703125|unsuper_loss: 0.0
average reward score: 2.41015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.79s (31.43%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 64|ppo_ep: 1|act_loss: 0.57861328125|cri_loss: 0.51220703125|unsuper_loss: 0.0
average reward score: 1.8310546875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.78s (31.40%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 65|ppo_ep: 1|act_loss: -0.12744140625|cri_loss: 0.471923828125|unsuper_loss: 0.0
average reward score: 0.8876953125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.18%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 66|ppo_ep: 1|act_loss: 0.11785888671875|cri_loss: 0.328369140625|unsuper_loss: 0.0
average reward score: 0.77880859375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.73%) |Training time=0.79s (31.45%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.81
epoch: 0|step: 67|ppo_ep: 1|act_loss: -0.155517578125|cri_loss: 0.271484375|unsuper_loss: 0.0
average reward score: 0.8349609375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.78%) |Training time=0.79s (31.41%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81
epoch: 0|step: 68|ppo_ep: 1|act_loss: -0.2369384765625|cri_loss: 0.470458984375|unsuper_loss: 0.0
average reward score: 1.556640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.92%) |Training time=0.78s (31.25%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
[2023-07-01 08:10:27,579] [INFO] [logging.py:96:log_dist] [Rank 0] step=70, skipped=8, lr=[5.983e-06, 5.983e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:10:27,758] [INFO] [timer.py:215:stop] epoch=0/micro_step=70/global_step=70, RunningAvgSamplesPerSec=53.04370475961942, CurrSamplesPerSec=52.004153583395585, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:10:27,918] [INFO] [logging.py:96:log_dist] [Rank 0] step=70, skipped=6, lr=[3.2000000000000003e-06, 3.2000000000000003e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 69|ppo_ep: 1|act_loss: -0.050323486328125|cri_loss: 0.59521484375|unsuper_loss: 0.0
average reward score: 2.0703125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.02%) |Training time=0.78s (31.19%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 70|ppo_ep: 1|act_loss: 0.1898193359375|cri_loss: 0.451416015625|unsuper_loss: 0.0
average reward score: 1.25
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.93%) |Training time=0.78s (31.25%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 71|ppo_ep: 1|act_loss: -0.22265625|cri_loss: 0.4140625|unsuper_loss: 0.0
average reward score: 2.359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.06%) |Training time=0.78s (31.15%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 72|ppo_ep: 1|act_loss: 0.25927734375|cri_loss: 0.6513671875|unsuper_loss: 0.0
average reward score: 1.634765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.38%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 73|ppo_ep: 1|act_loss: 0.2115478515625|cri_loss: 0.343994140625|unsuper_loss: 0.0
average reward score: 1.193359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.83%) |Training time=0.79s (31.41%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 74|ppo_ep: 1|act_loss: -0.1343994140625|cri_loss: 0.5712890625|unsuper_loss: 0.0
average reward score: 1.62109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.90%) |Training time=0.78s (31.29%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 75|ppo_ep: 1|act_loss: -0.423095703125|cri_loss: 0.384033203125|unsuper_loss: 0.0
average reward score: 0.6923828125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.83%) |Training time=0.78s (31.36%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 76|ppo_ep: 1|act_loss: -0.04229736328125|cri_loss: 0.2203369140625|unsuper_loss: 0.0
average reward score: 1.1923828125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.22%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 77|ppo_ep: 1|act_loss: 0.01357269287109375|cri_loss: 0.176513671875|unsuper_loss: 0.0
average reward score: 1.23828125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.25%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 78|ppo_ep: 1|act_loss: 0.3056640625|cri_loss: 0.2081298828125|unsuper_loss: 0.0
average reward score: 1.6806640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.93%) |Training time=0.78s (31.29%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
[2023-07-01 08:10:52,583] [INFO] [logging.py:96:log_dist] [Rank 0] step=80, skipped=8, lr=[6.948e-06, 6.948e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:10:52,761] [INFO] [timer.py:215:stop] epoch=0/micro_step=80/global_step=80, RunningAvgSamplesPerSec=52.87346162859914, CurrSamplesPerSec=51.559023442812, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:10:52,921] [INFO] [logging.py:96:log_dist] [Rank 0] step=80, skipped=6, lr=[3.7e-06, 3.7e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 79|ppo_ep: 1|act_loss: 0.20654296875|cri_loss: 0.1474609375|unsuper_loss: 0.0
average reward score: 1.4677734375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.86%) |Training time=0.78s (31.30%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 80|ppo_ep: 1|act_loss: 0.152587890625|cri_loss: 0.2091064453125|unsuper_loss: 0.0
average reward score: 0.423583984375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.59%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 81|ppo_ep: 1|act_loss: -0.0245208740234375|cri_loss: 0.1005859375|unsuper_loss: 0.0
average reward score: 1.4228515625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.78%) |Training time=0.79s (31.41%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81
epoch: 0|step: 82|ppo_ep: 1|act_loss: 0.0177459716796875|cri_loss: 0.6953125|unsuper_loss: 0.0
average reward score: -0.423828125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.69%) |Training time=0.79s (31.53%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81
epoch: 0|step: 83|ppo_ep: 1|act_loss: -0.61376953125|cri_loss: 0.54638671875|unsuper_loss: 0.0
average reward score: 1.32421875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.23%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 84|ppo_ep: 1|act_loss: 0.044830322265625|cri_loss: 0.4111328125|unsuper_loss: 0.0
average reward score: 0.93115234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.89%) |Training time=0.78s (31.25%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 85|ppo_ep: 1|act_loss: 0.51171875|cri_loss: 0.59375|unsuper_loss: 0.0
average reward score: 0.29296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.51%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 86|ppo_ep: 1|act_loss: 0.87841796875|cri_loss: 0.859375|unsuper_loss: 0.0
average reward score: -0.36474609375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.63%) |Training time=0.79s (31.62%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.80
epoch: 0|step: 87|ppo_ep: 1|act_loss: 0.26025390625|cri_loss: 0.1788330078125|unsuper_loss: 0.0
average reward score: 0.8642578125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.75%) |Training time=0.79s (31.51%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
epoch: 0|step: 88|ppo_ep: 1|act_loss: -0.36328125|cri_loss: 0.3447265625|unsuper_loss: 0.0
average reward score: 0.7294921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.86%) |Training time=0.78s (31.34%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
[2023-07-01 08:11:17,629] [INFO] [logging.py:96:log_dist] [Rank 0] step=90, skipped=8, lr=[7.913e-06, 7.913e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:11:17,804] [INFO] [timer.py:215:stop] epoch=0/micro_step=90/global_step=90, RunningAvgSamplesPerSec=52.705695811425215, CurrSamplesPerSec=52.0945200465606, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:11:17,965] [INFO] [logging.py:96:log_dist] [Rank 0] step=90, skipped=6, lr=[4.2000000000000004e-06, 4.2000000000000004e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 89|ppo_ep: 1|act_loss: -0.42333984375|cri_loss: 0.440185546875|unsuper_loss: 0.0
average reward score: -0.0455322265625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.06%) |Training time=0.78s (31.16%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 90|ppo_ep: 1|act_loss: -0.424072265625|cri_loss: 0.2314453125|unsuper_loss: 0.0
average reward score: 2.3984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.08%) |Training time=0.78s (31.09%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 91|ppo_ep: 1|act_loss: 0.7021484375|cri_loss: 0.5908203125|unsuper_loss: 0.0
average reward score: 2.578125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.12%) |Training time=0.78s (31.11%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80
epoch: 0|step: 92|ppo_ep: 1|act_loss: 0.76513671875|cri_loss: 0.6279296875|unsuper_loss: 0.0
average reward score: 1.6533203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.07%) |Training time=0.78s (31.12%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 93|ppo_ep: 1|act_loss: 0.31298828125|cri_loss: 0.272705078125|unsuper_loss: 0.0
average reward score: 1.1591796875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.99%) |Training time=0.78s (31.20%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
epoch: 0|step: 94|ppo_ep: 1|act_loss: 0.306396484375|cri_loss: 0.343994140625|unsuper_loss: 0.0
average reward score: 0.095458984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.86%) |Training time=0.78s (31.34%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 95|ppo_ep: 1|act_loss: -1.2255859375|cri_loss: 1.7861328125|unsuper_loss: 0.0
average reward score: 0.2744140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.94%) |Training time=0.78s (31.24%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 96|ppo_ep: 1|act_loss: -0.08453369140625|cri_loss: 0.12261962890625|unsuper_loss: 0.0
average reward score: 1.5029296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.00%) |Training time=0.78s (31.18%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 97|ppo_ep: 1|act_loss: 0.412109375|cri_loss: 0.2176513671875|unsuper_loss: 0.0
average reward score: 1.33984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.18%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 98|ppo_ep: 1|act_loss: 0.35400390625|cri_loss: 0.208251953125|unsuper_loss: 0.0
average reward score: 0.7197265625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.56%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80
[2023-07-01 08:11:42,650] [INFO] [logging.py:96:log_dist] [Rank 0] step=100, skipped=8, lr=[8.878e-06, 8.878e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:11:42,830] [INFO] [timer.py:215:stop] epoch=0/micro_step=100/global_step=100, RunningAvgSamplesPerSec=52.601879430986315, CurrSamplesPerSec=49.972830583204946, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:11:42,991] [INFO] [logging.py:96:log_dist] [Rank 0] step=100, skipped=6, lr=[4.7e-06, 4.7e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 99|ppo_ep: 1|act_loss: 0.2303466796875|cri_loss: 0.10552978515625|unsuper_loss: 0.0
average reward score: 1.10546875
-------------------------------------------------------------------------------------
|E2E latency=2.52s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.30%) |Training time=0.81s (31.93%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.69 |AvgSamplesPerSec=12.80
epoch: 0|step: 100|ppo_ep: 1|act_loss: 0.24755859375|cri_loss: 0.1744384765625|unsuper_loss: 0.0
average reward score: -0.3828125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.38%) |Training time=0.80s (31.84%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.73 |AvgSamplesPerSec=12.80
epoch: 0|step: 101|ppo_ep: 1|act_loss: -0.00725555419921875|cri_loss: 0.1685791015625|unsuper_loss: 0.0
average reward score: -1.8544921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.81%) |Training time=0.79s (31.41%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 102|ppo_ep: 1|act_loss: 0.0902099609375|cri_loss: 0.1353759765625|unsuper_loss: 0.0
average reward score: 0.08642578125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.00%) |Training time=0.78s (31.16%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 103|ppo_ep: 1|act_loss: -0.06756591796875|cri_loss: 0.167724609375|unsuper_loss: 0.0
average reward score: -0.189453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.63%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 104|ppo_ep: 1|act_loss: -0.0633544921875|cri_loss: 0.1798095703125|unsuper_loss: 0.0
average reward score: -0.07470703125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.78%) |Training time=0.79s (31.40%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 105|ppo_ep: 1|act_loss: 0.1318359375|cri_loss: 0.27294921875|unsuper_loss: 0.0
average reward score: -0.595703125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.43%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
epoch: 0|step: 106|ppo_ep: 1|act_loss: -0.0809326171875|cri_loss: 0.1402587890625|unsuper_loss: 0.0
average reward score: -0.389404296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.98%) |Training time=0.78s (31.23%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 107|ppo_ep: 1|act_loss: -0.419189453125|cri_loss: 0.461669921875|unsuper_loss: 0.0
average reward score: 0.203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.88%) |Training time=0.78s (31.31%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 108|ppo_ep: 1|act_loss: -0.402099609375|cri_loss: 0.374755859375|unsuper_loss: 0.0
average reward score: -1.35546875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.89%) |Training time=0.78s (31.28%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
[2023-07-01 08:12:07,676] [INFO] [logging.py:96:log_dist] [Rank 0] step=110, skipped=8, lr=[9.649869410169466e-06, 9.649869410169466e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:12:07,852] [INFO] [timer.py:215:stop] epoch=0/micro_step=110/global_step=110, RunningAvgSamplesPerSec=52.5035659286346, CurrSamplesPerSec=52.02652925000989, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:12:08,012] [INFO] [logging.py:96:log_dist] [Rank 0] step=110, skipped=6, lr=[4.999729351164122e-06, 4.999729351164122e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 109|ppo_ep: 1|act_loss: -0.46630859375|cri_loss: 0.311767578125|unsuper_loss: 0.0
average reward score: -1.654296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.98%) |Training time=0.78s (31.23%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 110|ppo_ep: 1|act_loss: 0.2249755859375|cri_loss: 0.2509765625|unsuper_loss: 0.0
average reward score: -1.0
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.09%) |Training time=0.78s (31.12%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
epoch: 0|step: 111|ppo_ep: 1|act_loss: 0.1627197265625|cri_loss: 0.2108154296875|unsuper_loss: 0.0
average reward score: -0.64990234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.87%) |Training time=0.78s (31.32%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 112|ppo_ep: 1|act_loss: -1.0615234375|cri_loss: 1.146484375|unsuper_loss: 0.0
average reward score: 0.5087890625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.54%) |Training time=0.80s (31.65%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.74 |AvgSamplesPerSec=12.80
epoch: 0|step: 113|ppo_ep: 1|act_loss: -0.794921875|cri_loss: 0.68359375|unsuper_loss: 0.0
average reward score: 1.3603515625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.70%) |Training time=0.79s (31.42%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.80
epoch: 0|step: 114|ppo_ep: 1|act_loss: 0.352294921875|cri_loss: 0.537109375|unsuper_loss: 0.0
average reward score: 1.6796875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.61%) |Training time=0.79s (31.62%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.80
epoch: 0|step: 115|ppo_ep: 1|act_loss: 0.54638671875|cri_loss: 0.63623046875|unsuper_loss: 0.0
average reward score: 1.669921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.08%) |Training time=0.78s (31.09%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 116|ppo_ep: 1|act_loss: 0.54638671875|cri_loss: 0.7724609375|unsuper_loss: 0.0
average reward score: 0.455810546875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.95%) |Training time=0.78s (31.16%) |Others=0.22 (8.89%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.80
epoch: 0|step: 117|ppo_ep: 1|act_loss: -0.0537109375|cri_loss: 0.304931640625|unsuper_loss: 0.0
average reward score: 1.400390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.54%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
epoch: 0|step: 118|ppo_ep: 1|act_loss: -0.0611572265625|cri_loss: 0.1751708984375|unsuper_loss: 0.0
average reward score: 0.521484375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.99%) |Training time=0.78s (31.16%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80
[2023-07-01 08:12:32,709] [INFO] [logging.py:96:log_dist] [Rank 0] step=120, skipped=8, lr=[9.64529950829165e-06, 9.64529950829165e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:12:32,886] [INFO] [timer.py:215:stop] epoch=0/micro_step=120/global_step=120, RunningAvgSamplesPerSec=52.431066956321224, CurrSamplesPerSec=52.50632887623131, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:12:33,045] [INFO] [logging.py:96:log_dist] [Rank 0] step=120, skipped=6, lr=[4.996685224712077e-06, 4.996685224712077e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 119|ppo_ep: 1|act_loss: 0.335205078125|cri_loss: 0.404296875|unsuper_loss: 0.0
average reward score: 1.6015625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.20%) |Training time=0.77s (31.05%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
epoch: 0|step: 120|ppo_ep: 1|act_loss: 0.072265625|cri_loss: 0.296142578125|unsuper_loss: 0.0
average reward score: 1.1806640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.03%) |Training time=0.78s (31.18%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 121|ppo_ep: 1|act_loss: 0.07342529296875|cri_loss: 0.31103515625|unsuper_loss: 0.0
average reward score: 2.5390625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.76%) |Training time=0.79s (31.44%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80
epoch: 0|step: 122|ppo_ep: 1|act_loss: -0.37158203125|cri_loss: 0.28759765625|unsuper_loss: 0.0
average reward score: 0.6318359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.46%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 123|ppo_ep: 1|act_loss: -0.25927734375|cri_loss: 0.37158203125|unsuper_loss: 0.0
average reward score: 1.94921875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.68%) |Training time=0.79s (31.56%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80
epoch: 0|step: 124|ppo_ep: 1|act_loss: 0.1942138671875|cri_loss: 0.273193359375|unsuper_loss: 0.0
average reward score: 1.0634765625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.12%) |Training time=0.77s (31.09%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80
epoch: 0|step: 125|ppo_ep: 1|act_loss: 0.2216796875|cri_loss: 0.20849609375|unsuper_loss: 0.0
average reward score: -0.217041015625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.02%) |Training time=0.78s (31.19%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
epoch: 0|step: 126|ppo_ep: 1|act_loss: -0.400390625|cri_loss: 0.434814453125|unsuper_loss: 0.0
average reward score: 0.7265625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.84%) |Training time=0.79s (31.34%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
epoch: 0|step: 127|ppo_ep: 1|act_loss: -0.3203125|cri_loss: 0.254150390625|unsuper_loss: 0.0
average reward score: 1.5029296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.50%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 128|ppo_ep: 1|act_loss: -0.27294921875|cri_loss: 0.43896484375|unsuper_loss: 0.0
average reward score: 1.4375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.86%) |Training time=0.78s (31.27%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
[2023-07-01 08:12:57,705] [INFO] [logging.py:96:log_dist] [Rank 0] step=130, skipped=8, lr=[9.63420718206011e-06, 9.63420718206011e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:12:57,886] [INFO] [timer.py:215:stop] epoch=0/micro_step=130/global_step=130, RunningAvgSamplesPerSec=52.37184465997516, CurrSamplesPerSec=52.1022248826302, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:12:58,045] [INFO] [logging.py:96:log_dist] [Rank 0] step=130, skipped=6, lr=[4.99026279355402e-06, 4.99026279355402e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 129|ppo_ep: 1|act_loss: 0.49365234375|cri_loss: 0.46728515625|unsuper_loss: 0.0
average reward score: 0.03564453125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.86%) |Training time=0.78s (31.30%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
epoch: 0|step: 130|ppo_ep: 1|act_loss: -0.1641845703125|cri_loss: 0.4501953125|unsuper_loss: 0.0
average reward score: 0.18115234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.02%) |Training time=0.78s (31.13%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
epoch: 0|step: 131|ppo_ep: 1|act_loss: -0.006988525390625|cri_loss: 0.187744140625|unsuper_loss: 0.0
average reward score: 2.06640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.79%) |Training time=0.78s (31.34%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
epoch: 0|step: 132|ppo_ep: 1|act_loss: 0.260986328125|cri_loss: 0.1494140625|unsuper_loss: 0.0
average reward score: 0.223388671875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.84%) |Training time=0.78s (31.31%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
epoch: 0|step: 133|ppo_ep: 1|act_loss: -0.228759765625|cri_loss: 0.194580078125|unsuper_loss: 0.0
average reward score: 0.6591796875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.22%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 134|ppo_ep: 1|act_loss: 0.1409912109375|cri_loss: 0.2529296875|unsuper_loss: 0.0
average reward score: 1.7900390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.87%) |Training time=0.78s (31.33%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 135|ppo_ep: 1|act_loss: -0.2266845703125|cri_loss: 0.11761474609375|unsuper_loss: 0.0
average reward score: 0.896484375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.23%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 136|ppo_ep: 1|act_loss: -0.005107879638671875|cri_loss: 0.1849365234375|unsuper_loss: 0.0
average reward score: 2.248046875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.04%) |Training time=0.77s (31.10%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80
epoch: 0|step: 137|ppo_ep: 1|act_loss: 0.080322265625|cri_loss: 0.0968017578125|unsuper_loss: 0.0
average reward score: 2.5390625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.99%) |Training time=0.78s (31.19%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80
epoch: 0|step: 138|ppo_ep: 1|act_loss: -0.0014581680297851562|cri_loss: 0.0823974609375|unsuper_loss: 0.0
average reward score: 0.923828125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.92%) |Training time=0.78s (31.25%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
[2023-07-01 08:13:22,697] [INFO] [logging.py:96:log_dist] [Rank 0] step=140, skipped=8, lr=[9.616607440678868e-06, 9.616607440678868e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:13:22,873] [INFO] [timer.py:215:stop] epoch=0/micro_step=140/global_step=140, RunningAvgSamplesPerSec=52.34261795657865, CurrSamplesPerSec=52.49745683026082, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:13:23,033] [INFO] [logging.py:96:log_dist] [Rank 0] step=140, skipped=6, lr=[4.980470747984265e-06, 4.980470747984265e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 139|ppo_ep: 1|act_loss: -0.26171875|cri_loss: 0.2237548828125|unsuper_loss: 0.0
average reward score: 0.68359375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.13%) |Training time=0.77s (31.06%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
epoch: 0|step: 140|ppo_ep: 1|act_loss: 0.01806640625|cri_loss: 0.11468505859375|unsuper_loss: 0.0
average reward score: 1.009765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.78%) |Training time=0.79s (31.40%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 141|ppo_ep: 1|act_loss: 0.301513671875|cri_loss: 0.2069091796875|unsuper_loss: 0.0
average reward score: 2.64453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.79s (31.45%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 142|ppo_ep: 1|act_loss: 0.48779296875|cri_loss: 0.289306640625|unsuper_loss: 0.0
average reward score: 2.27734375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.78s (31.46%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80
epoch: 0|step: 143|ppo_ep: 1|act_loss: 0.0810546875|cri_loss: 0.1422119140625|unsuper_loss: 0.0
average reward score: 0.8125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.93%) |Training time=0.78s (31.25%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 144|ppo_ep: 1|act_loss: 0.0169525146484375|cri_loss: 0.061187744140625|unsuper_loss: 0.0
average reward score: 2.017578125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.85%) |Training time=0.78s (31.30%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 145|ppo_ep: 1|act_loss: 0.12054443359375|cri_loss: 0.11749267578125|unsuper_loss: 0.0
average reward score: 2.16796875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.48%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80
epoch: 0|step: 146|ppo_ep: 1|act_loss: -0.0943603515625|cri_loss: 0.10076904296875|unsuper_loss: 0.0
average reward score: 0.73291015625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.81%) |Training time=0.78s (31.32%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80
epoch: 0|step: 147|ppo_ep: 1|act_loss: 0.12384033203125|cri_loss: 0.09320068359375|unsuper_loss: 0.0
average reward score: 1.529296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.52%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
epoch: 0|step: 148|ppo_ep: 1|act_loss: -0.0909423828125|cri_loss: 0.06854248046875|unsuper_loss: 0.0
average reward score: 1.4853515625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.61%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80
[2023-07-01 08:13:47,708] [INFO] [logging.py:96:log_dist] [Rank 0] step=150, skipped=8, lr=[9.592524098639447e-06, 9.592524098639447e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:13:47,888] [INFO] [timer.py:215:stop] epoch=0/micro_step=150/global_step=150, RunningAvgSamplesPerSec=52.280948186958256, CurrSamplesPerSec=51.466181368635546, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:13:48,048] [INFO] [logging.py:96:log_dist] [Rank 0] step=150, skipped=6, lr=[4.967322337776272e-06, 4.967322337776272e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 149|ppo_ep: 1|act_loss: -0.265625|cri_loss: 0.1473388671875|unsuper_loss: 0.0
average reward score: 1.2587890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.79s (31.44%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 150|ppo_ep: 1|act_loss: -0.297607421875|cri_loss: 0.1822509765625|unsuper_loss: 0.0
average reward score: 1.873046875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.93%) |Training time=0.78s (31.21%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 151|ppo_ep: 1|act_loss: -0.14453125|cri_loss: 0.129638671875|unsuper_loss: 0.0
average reward score: 1.263671875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.22%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80
epoch: 0|step: 152|ppo_ep: 1|act_loss: -0.0584716796875|cri_loss: 0.07110595703125|unsuper_loss: 0.0
average reward score: 1.4951171875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.10%) |Training time=0.77s (31.10%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80
epoch: 0|step: 153|ppo_ep: 1|act_loss: 0.021087646484375|cri_loss: 0.1566162109375|unsuper_loss: 0.0
average reward score: 1.486328125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.22%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80
epoch: 0|step: 154|ppo_ep: 1|act_loss: 0.802734375|cri_loss: 0.6201171875|unsuper_loss: 0.0
average reward score: 1.3837890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.55%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 155|ppo_ep: 1|act_loss: 0.04766845703125|cri_loss: 0.1353759765625|unsuper_loss: 0.0
average reward score: 0.409912109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.85%) |Training time=0.78s (31.35%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 156|ppo_ep: 1|act_loss: 0.26171875|cri_loss: 0.1607666015625|unsuper_loss: 0.0
average reward score: 0.5810546875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.18%) |Training time=0.77s (31.04%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
epoch: 0|step: 157|ppo_ep: 1|act_loss: 0.2440185546875|cri_loss: 0.1148681640625|unsuper_loss: 0.0
average reward score: 2.048828125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.24%) |Training time=0.77s (30.96%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80
epoch: 0|step: 158|ppo_ep: 1|act_loss: 0.26708984375|cri_loss: 0.10986328125|unsuper_loss: 0.0
average reward score: 2.015625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.04%) |Training time=0.78s (31.13%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
[2023-07-01 08:14:12,671] [INFO] [logging.py:96:log_dist] [Rank 0] step=160, skipped=8, lr=[9.561989743497123e-06, 9.561989743497123e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:14:12,849] [INFO] [timer.py:215:stop] epoch=0/micro_step=160/global_step=160, RunningAvgSamplesPerSec=52.269319390366206, CurrSamplesPerSec=52.08991036828757, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:14:13,008] [INFO] [logging.py:96:log_dist] [Rank 0] step=160, skipped=6, lr=[4.950835354254168e-06, 4.950835354254168e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 159|ppo_ep: 1|act_loss: -0.1343994140625|cri_loss: 0.101806640625|unsuper_loss: 0.0
average reward score: 2.173828125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.93%) |Training time=0.78s (31.23%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80
epoch: 0|step: 160|ppo_ep: 1|act_loss: -0.26025390625|cri_loss: 0.2020263671875|unsuper_loss: 0.0
average reward score: 1.400390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.86%) |Training time=0.78s (31.32%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
epoch: 0|step: 161|ppo_ep: 1|act_loss: -0.0947265625|cri_loss: 0.152099609375|unsuper_loss: 0.0
average reward score: 0.2568359375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.68%) |Training time=0.79s (31.50%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80
epoch: 0|step: 162|ppo_ep: 1|act_loss: -0.0096282958984375|cri_loss: 0.0770263671875|unsuper_loss: 0.0
average reward score: 2.259765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.89%) |Training time=0.78s (31.28%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 163|ppo_ep: 1|act_loss: -0.1458740234375|cri_loss: 0.1275634765625|unsuper_loss: 0.0
average reward score: 2.134765625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.78%) |Training time=0.79s (31.40%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.80
epoch: 0|step: 164|ppo_ep: 1|act_loss: 0.0100250244140625|cri_loss: 0.12158203125|unsuper_loss: 0.0
average reward score: 1.6982421875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.79s (31.43%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 165|ppo_ep: 1|act_loss: 0.291015625|cri_loss: 0.1094970703125|unsuper_loss: 0.0
average reward score: 0.74658203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.91%) |Training time=0.78s (31.28%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80
epoch: 0|step: 166|ppo_ep: 1|act_loss: 0.18505859375|cri_loss: 0.1199951171875|unsuper_loss: 0.0
average reward score: 1.6787109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.48%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 167|ppo_ep: 1|act_loss: 0.0205841064453125|cri_loss: 0.121337890625|unsuper_loss: 0.0
average reward score: 1.40234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.72%) |Training time=0.79s (31.49%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 168|ppo_ep: 1|act_loss: 0.200927734375|cri_loss: 0.09912109375|unsuper_loss: 0.0
average reward score: 2.150390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.78%) |Training time=0.79s (31.46%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
[2023-07-01 08:14:37,687] [INFO] [logging.py:96:log_dist] [Rank 0] step=170, skipped=8, lr=[9.525045691776156e-06, 9.525045691776156e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:14:37,862] [INFO] [timer.py:215:stop] epoch=0/micro_step=170/global_step=170, RunningAvgSamplesPerSec=52.22736749070443, CurrSamplesPerSec=51.84219343081623, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:14:38,022] [INFO] [logging.py:96:log_dist] [Rank 0] step=170, skipped=6, lr=[4.931032106219029e-06, 4.931032106219029e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 169|ppo_ep: 1|act_loss: -0.19775390625|cri_loss: 0.05303955078125|unsuper_loss: 0.0
average reward score: 1.37109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.93%) |Training time=0.78s (31.28%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 170|ppo_ep: 1|act_loss: -0.5263671875|cri_loss: 0.384521484375|unsuper_loss: 0.0
average reward score: 2.90234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.12%) |Training time=0.78s (31.09%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 171|ppo_ep: 1|act_loss: 0.102294921875|cri_loss: 0.1634521484375|unsuper_loss: 0.0
average reward score: 2.15234375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.12%) |Training time=0.77s (31.11%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.80
epoch: 0|step: 172|ppo_ep: 1|act_loss: 0.14697265625|cri_loss: 0.12646484375|unsuper_loss: 0.0
average reward score: 1.748046875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.85%) |Training time=0.78s (31.33%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
epoch: 0|step: 173|ppo_ep: 1|act_loss: -0.08734130859375|cri_loss: 0.08270263671875|unsuper_loss: 0.0
average reward score: 0.697265625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.37%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80
epoch: 0|step: 174|ppo_ep: 1|act_loss: -0.31689453125|cri_loss: 0.195556640625|unsuper_loss: 0.0
average reward score: 0.99169921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.78%) |Training time=0.79s (31.46%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 175|ppo_ep: 1|act_loss: 0.1104736328125|cri_loss: 0.09356689453125|unsuper_loss: 0.0
average reward score: 2.2109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.72%) |Training time=0.79s (31.45%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
epoch: 0|step: 176|ppo_ep: 1|act_loss: 0.201904296875|cri_loss: 0.1785888671875|unsuper_loss: 0.0
average reward score: 3.423828125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.84%) |Training time=0.78s (31.33%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 177|ppo_ep: 1|act_loss: -0.07379150390625|cri_loss: 0.1068115234375|unsuper_loss: 0.0
average reward score: 1.94140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.79s (31.43%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 178|ppo_ep: 1|act_loss: -0.10406494140625|cri_loss: 0.2044677734375|unsuper_loss: 0.0
average reward score: 3.119140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.86%) |Training time=0.78s (31.26%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
[2023-07-01 08:15:02,658] [INFO] [logging.py:96:log_dist] [Rank 0] step=180, skipped=8, lr=[9.481741933063763e-06, 9.481741933063763e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:15:02,838] [INFO] [timer.py:215:stop] epoch=0/micro_step=180/global_step=180, RunningAvgSamplesPerSec=52.20197983152347, CurrSamplesPerSec=51.571048563751575, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:15:02,997] [INFO] [logging.py:96:log_dist] [Rank 0] step=180, skipped=6, lr=[4.907939389762475e-06, 4.907939389762475e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 179|ppo_ep: 1|act_loss: 0.365478515625|cri_loss: 0.300048828125|unsuper_loss: 0.0
average reward score: 1.03125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.75%) |Training time=0.79s (31.47%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 180|ppo_ep: 1|act_loss: -0.0006165504455566406|cri_loss: 0.10247802734375|unsuper_loss: 0.0
average reward score: 2.4453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.02%) |Training time=0.78s (31.13%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 181|ppo_ep: 1|act_loss: 0.0224761962890625|cri_loss: 0.07757568359375|unsuper_loss: 0.0
average reward score: 2.96484375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.83%) |Training time=0.78s (31.34%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 182|ppo_ep: 1|act_loss: -0.11004638671875|cri_loss: 0.086669921875|unsuper_loss: 0.0
average reward score: 3.24609375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.85%) |Training time=0.78s (31.31%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 183|ppo_ep: 1|act_loss: -0.07958984375|cri_loss: 0.081298828125|unsuper_loss: 0.0
average reward score: 3.369140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.21%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 184|ppo_ep: 1|act_loss: -0.051361083984375|cri_loss: 0.08209228515625|unsuper_loss: 0.0
average reward score: 0.373046875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.16%) |Training time=0.77s (31.07%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80
epoch: 0|step: 185|ppo_ep: 1|act_loss: -0.0115966796875|cri_loss: 0.1260986328125|unsuper_loss: 0.0
average reward score: 2.322265625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.05%) |Training time=0.78s (31.15%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80
epoch: 0|step: 186|ppo_ep: 1|act_loss: -0.1728515625|cri_loss: 0.16259765625|unsuper_loss: 0.0
average reward score: 4.08984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.93%) |Training time=0.78s (31.26%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80
epoch: 0|step: 187|ppo_ep: 1|act_loss: 0.286865234375|cri_loss: 0.2060546875|unsuper_loss: 0.0
average reward score: 2.9921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.79s (31.41%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 188|ppo_ep: 1|act_loss: -0.10748291015625|cri_loss: 0.06787109375|unsuper_loss: 0.0
average reward score: 1.9609375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.75%) |Training time=0.79s (31.51%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80
[2023-07-01 08:15:27,638] [INFO] [logging.py:96:log_dist] [Rank 0] step=190, skipped=8, lr=[9.432137062368396e-06, 9.432137062368396e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:15:27,813] [INFO] [timer.py:215:stop] epoch=0/micro_step=190/global_step=190, RunningAvgSamplesPerSec=52.18664239862417, CurrSamplesPerSec=51.60189971407404, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:15:27,973] [INFO] [logging.py:96:log_dist] [Rank 0] step=190, skipped=6, lr=[4.881588452008457e-06, 4.881588452008457e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 189|ppo_ep: 1|act_loss: -0.1380615234375|cri_loss: 0.06768798828125|unsuper_loss: 0.0
average reward score: 1.51171875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.81%) |Training time=0.78s (31.42%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80
epoch: 0|step: 190|ppo_ep: 1|act_loss: 0.0645751953125|cri_loss: 0.1287841796875|unsuper_loss: 0.0
average reward score: 2.787109375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (60.04%) |Training time=0.78s (31.15%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.80
epoch: 0|step: 191|ppo_ep: 1|act_loss: 0.2034912109375|cri_loss: 0.1402587890625|unsuper_loss: 0.0
average reward score: 0.9404296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.55%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
epoch: 0|step: 192|ppo_ep: 1|act_loss: -0.046600341796875|cri_loss: 0.12109375|unsuper_loss: 0.0
average reward score: 1.8515625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.87%) |Training time=0.78s (31.27%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 193|ppo_ep: 1|act_loss: 0.00504302978515625|cri_loss: 0.1109619140625|unsuper_loss: 0.0
average reward score: 1.76953125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.07%) |Training time=0.78s (31.11%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
epoch: 0|step: 194|ppo_ep: 1|act_loss: -0.228271484375|cri_loss: 0.13134765625|unsuper_loss: 0.0
average reward score: 3.765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.91%) |Training time=0.78s (31.27%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80
epoch: 0|step: 195|ppo_ep: 1|act_loss: -0.26318359375|cri_loss: 0.289794921875|unsuper_loss: 0.0
average reward score: 3.26171875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.91%) |Training time=0.78s (31.24%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 196|ppo_ep: 1|act_loss: 0.08001708984375|cri_loss: 0.1861572265625|unsuper_loss: 0.0
average reward score: 3.36328125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.57%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
epoch: 0|step: 197|ppo_ep: 1|act_loss: -0.1329345703125|cri_loss: 0.218994140625|unsuper_loss: 0.0
average reward score: 2.84375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.85%) |Training time=0.78s (31.33%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 198|ppo_ep: 1|act_loss: -0.1678466796875|cri_loss: 0.3349609375|unsuper_loss: 0.0
average reward score: 1.615234375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.11%) |Training time=0.77s (31.05%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
[2023-07-01 08:15:52,609] [INFO] [logging.py:96:log_dist] [Rank 0] step=200, skipped=8, lr=[9.376298200833905e-06, 9.376298200833905e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:15:52,786] [INFO] [timer.py:215:stop] epoch=0/micro_step=200/global_step=200, RunningAvgSamplesPerSec=52.17385596484633, CurrSamplesPerSec=52.39363895635663, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:15:52,944] [INFO] [logging.py:96:log_dist] [Rank 0] step=200, skipped=6, lr=[4.852014948832268e-06, 4.852014948832268e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 199|ppo_ep: 1|act_loss: 0.0509033203125|cri_loss: 0.178955078125|unsuper_loss: 0.0
average reward score: 3.533203125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.11%) |Training time=0.78s (31.12%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.80
epoch: 0|step: 200|ppo_ep: 1|act_loss: 0.1173095703125|cri_loss: 0.261962890625|unsuper_loss: 0.0
average reward score: 3.337890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.95%) |Training time=0.78s (31.23%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80
epoch: 0|step: 201|ppo_ep: 1|act_loss: 0.177734375|cri_loss: 0.1182861328125|unsuper_loss: 0.0
average reward score: 3.576171875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.79s (31.42%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 202|ppo_ep: 1|act_loss: -0.267822265625|cri_loss: 0.175048828125|unsuper_loss: 0.0
average reward score: 3.10546875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.86%) |Training time=0.78s (31.37%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 203|ppo_ep: 1|act_loss: -0.051361083984375|cri_loss: 0.1474609375|unsuper_loss: 0.0
average reward score: 2.3671875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.69%) |Training time=0.79s (31.55%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 204|ppo_ep: 1|act_loss: 0.1959228515625|cri_loss: 0.34814453125|unsuper_loss: 0.0
average reward score: 2.306640625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.13%) |Training time=0.77s (31.02%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.80
epoch: 0|step: 205|ppo_ep: 1|act_loss: 0.280517578125|cri_loss: 0.2239990234375|unsuper_loss: 0.0
average reward score: 2.890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.94%) |Training time=0.78s (31.23%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80
epoch: 0|step: 206|ppo_ep: 1|act_loss: -0.043243408203125|cri_loss: 0.2216796875|unsuper_loss: 0.0
average reward score: 2.68359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.98%) |Training time=0.78s (31.20%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 207|ppo_ep: 1|act_loss: -0.389892578125|cri_loss: 0.219482421875|unsuper_loss: 0.0
average reward score: 3.72265625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.06%) |Training time=0.78s (31.14%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 208|ppo_ep: 1|act_loss: 0.26611328125|cri_loss: 0.286865234375|unsuper_loss: 0.0
average reward score: 2.693359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.95%) |Training time=0.78s (31.24%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
[2023-07-01 08:16:17,584] [INFO] [logging.py:96:log_dist] [Rank 0] step=210, skipped=8, lr=[9.31430090491684e-06, 9.31430090491684e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:16:17,764] [INFO] [timer.py:215:stop] epoch=0/micro_step=210/global_step=210, RunningAvgSamplesPerSec=52.15907709400849, CurrSamplesPerSec=51.50153102806808, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:16:17,925] [INFO] [logging.py:96:log_dist] [Rank 0] step=210, skipped=6, lr=[4.819258896614014e-06, 4.819258896614014e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 209|ppo_ep: 1|act_loss: 0.2457275390625|cri_loss: 0.334716796875|unsuper_loss: 0.0
average reward score: 2.59375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.72%) |Training time=0.79s (31.41%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
epoch: 0|step: 210|ppo_ep: 1|act_loss: 0.0133819580078125|cri_loss: 0.45166015625|unsuper_loss: 0.0
average reward score: 2.435546875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.51%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80
epoch: 0|step: 211|ppo_ep: 1|act_loss: -0.0655517578125|cri_loss: 0.1932373046875|unsuper_loss: 0.0
average reward score: 1.869140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.62%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
epoch: 0|step: 212|ppo_ep: 1|act_loss: 0.024932861328125|cri_loss: 0.2200927734375|unsuper_loss: 0.0
average reward score: 0.412109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.38%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 213|ppo_ep: 1|act_loss: 0.1929931640625|cri_loss: 0.30908203125|unsuper_loss: 0.0
average reward score: 0.529296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.79s (31.48%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 214|ppo_ep: 1|act_loss: -0.1488037109375|cri_loss: 0.29833984375|unsuper_loss: 0.0
average reward score: -1.19140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.52%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 215|ppo_ep: 1|act_loss: -0.480712890625|cri_loss: 0.51318359375|unsuper_loss: 0.0
average reward score: 0.87890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.79s (31.46%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 216|ppo_ep: 1|act_loss: 0.2427978515625|cri_loss: 0.29248046875|unsuper_loss: 0.0
average reward score: -0.30810546875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.88%) |Training time=0.78s (31.36%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 217|ppo_ep: 1|act_loss: 0.2440185546875|cri_loss: 0.210205078125|unsuper_loss: 0.0
average reward score: 1.08203125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.12%) |Training time=0.77s (31.09%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.87 |AvgSamplesPerSec=12.80
epoch: 0|step: 218|ppo_ep: 1|act_loss: 0.50927734375|cri_loss: 0.491455078125|unsuper_loss: 0.0
average reward score: -1.513671875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.09%) |Training time=0.77s (31.05%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.80
[2023-07-01 08:16:42,567] [INFO] [logging.py:96:log_dist] [Rank 0] step=220, skipped=8, lr=[9.246229064149799e-06, 9.246229064149799e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:16:42,742] [INFO] [timer.py:215:stop] epoch=0/micro_step=220/global_step=220, RunningAvgSamplesPerSec=52.137330015874944, CurrSamplesPerSec=52.36796315530317, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:16:42,902] [INFO] [logging.py:96:log_dist] [Rank 0] step=220, skipped=6, lr=[4.783364618091804e-06, 4.783364618091804e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 219|ppo_ep: 1|act_loss: 0.1173095703125|cri_loss: 0.07891845703125|unsuper_loss: 0.0
average reward score: 1.5478515625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.08%) |Training time=0.78s (31.10%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80
epoch: 0|step: 220|ppo_ep: 1|act_loss: -0.0670166015625|cri_loss: 0.103271484375|unsuper_loss: 0.0
average reward score: 1.1064453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.04%) |Training time=0.78s (31.14%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 221|ppo_ep: 1|act_loss: -0.178955078125|cri_loss: 0.227783203125|unsuper_loss: 0.0
average reward score: -0.8388671875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.20%) |Training time=0.77s (31.02%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
epoch: 0|step: 222|ppo_ep: 1|act_loss: 0.13623046875|cri_loss: 0.166748046875|unsuper_loss: 0.0
average reward score: 0.187744140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.87%) |Training time=0.78s (31.35%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80
epoch: 0|step: 223|ppo_ep: 1|act_loss: 0.06787109375|cri_loss: 0.13232421875|unsuper_loss: 0.0
average reward score: -1.75
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.72%) |Training time=0.79s (31.42%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.80
epoch: 0|step: 224|ppo_ep: 1|act_loss: -0.12060546875|cri_loss: 0.16064453125|unsuper_loss: 0.0
average reward score: -0.8623046875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.85%) |Training time=0.78s (31.27%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 225|ppo_ep: 1|act_loss: -0.2724609375|cri_loss: 0.267822265625|unsuper_loss: 0.0
average reward score: -1.15625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.48%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 226|ppo_ep: 1|act_loss: -0.481689453125|cri_loss: 0.322265625|unsuper_loss: 0.0
average reward score: -0.32275390625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.89%) |Training time=0.78s (31.28%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
epoch: 0|step: 227|ppo_ep: 1|act_loss: -0.326416015625|cri_loss: 0.164794921875|unsuper_loss: 0.0
average reward score: 0.388671875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.91%) |Training time=0.78s (31.28%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
epoch: 0|step: 228|ppo_ep: 1|act_loss: -0.01126861572265625|cri_loss: 0.0947265625|unsuper_loss: 0.0
average reward score: 1.173828125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.47%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
[2023-07-01 08:17:07,550] [INFO] [logging.py:96:log_dist] [Rank 0] step=230, skipped=8, lr=[9.172174787629172e-06, 9.172174787629172e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:17:07,729] [INFO] [timer.py:215:stop] epoch=0/micro_step=230/global_step=230, RunningAvgSamplesPerSec=52.12167218935985, CurrSamplesPerSec=51.53437660663692, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:17:07,890] [INFO] [logging.py:96:log_dist] [Rank 0] step=230, skipped=6, lr=[4.74438068238795e-06, 4.74438068238795e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 229|ppo_ep: 1|act_loss: 0.2076416015625|cri_loss: 0.1134033203125|unsuper_loss: 0.0
average reward score: -0.1201171875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.80%) |Training time=0.79s (31.37%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.80
epoch: 0|step: 230|ppo_ep: 1|act_loss: 0.2239990234375|cri_loss: 0.1163330078125|unsuper_loss: 0.0
average reward score: 0.81884765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.97%) |Training time=0.78s (31.25%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 231|ppo_ep: 1|act_loss: -0.10308837890625|cri_loss: 0.0853271484375|unsuper_loss: 0.0
average reward score: -0.75634765625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.15%) |Training time=0.77s (31.07%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80
epoch: 0|step: 232|ppo_ep: 1|act_loss: -0.4169921875|cri_loss: 0.28955078125|unsuper_loss: 0.0
average reward score: -0.7265625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.03%) |Training time=0.78s (31.14%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
epoch: 0|step: 233|ppo_ep: 1|act_loss: -0.12939453125|cri_loss: 0.1280517578125|unsuper_loss: 0.0
average reward score: -0.681640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.81%) |Training time=0.78s (31.39%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80
epoch: 0|step: 234|ppo_ep: 1|act_loss: 0.334716796875|cri_loss: 0.263916015625|unsuper_loss: 0.0
average reward score: -0.54296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.65%) |Training time=0.79s (31.55%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
epoch: 0|step: 235|ppo_ep: 1|act_loss: 0.29736328125|cri_loss: 0.178466796875|unsuper_loss: 0.0
average reward score: -0.213623046875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.95%) |Training time=0.78s (31.26%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80
epoch: 0|step: 236|ppo_ep: 1|act_loss: 0.30322265625|cri_loss: 0.199462890625|unsuper_loss: 0.0
average reward score: -1.373046875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.91%) |Training time=0.78s (31.33%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 237|ppo_ep: 1|act_loss: 0.0158843994140625|cri_loss: 0.138916015625|unsuper_loss: 0.0
average reward score: 0.0645751953125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.79s (31.47%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.80
epoch: 0|step: 238|ppo_ep: 1|act_loss: -0.207763671875|cri_loss: 0.1807861328125|unsuper_loss: 0.0
average reward score: -1.05859375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.92%) |Training time=0.78s (31.30%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
[2023-07-01 08:17:32,518] [INFO] [logging.py:96:log_dist] [Rank 0] step=240, skipped=8, lr=[9.09223827938084e-06, 9.09223827938084e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:17:32,696] [INFO] [timer.py:215:stop] epoch=0/micro_step=240/global_step=240, RunningAvgSamplesPerSec=52.11019433075464, CurrSamplesPerSec=52.049711379426014, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:17:32,855] [INFO] [logging.py:96:log_dist] [Rank 0] step=240, skipped=6, lr=[4.702359839289306e-06, 4.702359839289306e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 239|ppo_ep: 1|act_loss: 0.1796875|cri_loss: 0.173583984375|unsuper_loss: 0.0
average reward score: -2.109375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.90%) |Training time=0.78s (31.30%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80
epoch: 0|step: 240|ppo_ep: 1|act_loss: 0.102294921875|cri_loss: 0.153564453125|unsuper_loss: 0.0
average reward score: -1.3837890625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.20%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
epoch: 0|step: 241|ppo_ep: 1|act_loss: 0.254638671875|cri_loss: 0.268310546875|unsuper_loss: 0.0
average reward score: -1.421875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.12%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80
epoch: 0|step: 242|ppo_ep: 1|act_loss: 0.320068359375|cri_loss: 0.2890625|unsuper_loss: 0.0
average reward score: -1.111328125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.75%) |Training time=0.79s (31.45%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.80
epoch: 0|step: 243|ppo_ep: 1|act_loss: 0.02081298828125|cri_loss: 0.0860595703125|unsuper_loss: 0.0
average reward score: -0.7314453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.89%) |Training time=0.78s (31.19%) |Others=0.22 (8.93%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
epoch: 0|step: 244|ppo_ep: 1|act_loss: 0.2783203125|cri_loss: 0.1820068359375|unsuper_loss: 0.0
average reward score: -1.31640625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.94%) |Training time=0.78s (31.20%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
epoch: 0|step: 245|ppo_ep: 1|act_loss: -0.2364501953125|cri_loss: 0.1331787109375|unsuper_loss: 0.0
average reward score: -1.1396484375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.94%) |Training time=0.78s (31.23%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80
epoch: 0|step: 246|ppo_ep: 1|act_loss: -0.419677734375|cri_loss: 0.204345703125|unsuper_loss: 0.0
average reward score: -2.09765625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.02%) |Training time=0.78s (31.11%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80
epoch: 0|step: 247|ppo_ep: 1|act_loss: -0.388427734375|cri_loss: 0.2030029296875|unsuper_loss: 0.0
average reward score: -1.853515625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.98%) |Training time=0.78s (31.22%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80
epoch: 0|step: 248|ppo_ep: 1|act_loss: 0.226806640625|cri_loss: 0.197509765625|unsuper_loss: 0.0
average reward score: -2.10546875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.85%) |Training time=0.78s (31.32%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.80
[2023-07-01 08:17:57,482] [INFO] [logging.py:96:log_dist] [Rank 0] step=250, skipped=8, lr=[9.006527702772504e-06, 9.006527702772504e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:17:57,658] [INFO] [timer.py:215:stop] epoch=0/micro_step=250/global_step=250, RunningAvgSamplesPerSec=52.10667841485632, CurrSamplesPerSec=52.08291653747903, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:17:57,817] [INFO] [logging.py:96:log_dist] [Rank 0] step=250, skipped=6, lr=[4.657358947870691e-06, 4.657358947870691e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 249|ppo_ep: 1|act_loss: 0.3671875|cri_loss: 0.2061767578125|unsuper_loss: 0.0
average reward score: -2.86328125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.24%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
epoch: 0|step: 250|ppo_ep: 1|act_loss: 0.449951171875|cri_loss: 0.2022705078125|unsuper_loss: 0.0
average reward score: -3.322265625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.13%) |Training time=0.78s (31.07%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.80
epoch: 0|step: 251|ppo_ep: 1|act_loss: 0.2705078125|cri_loss: 0.1883544921875|unsuper_loss: 0.0
average reward score: -3.009765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.79s (31.43%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
epoch: 0|step: 252|ppo_ep: 1|act_loss: 0.11505126953125|cri_loss: 0.1763916015625|unsuper_loss: 0.0
average reward score: -0.480224609375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.79s (31.46%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
epoch: 0|step: 253|ppo_ep: 1|act_loss: -0.032073974609375|cri_loss: 0.09918212890625|unsuper_loss: 0.0
average reward score: -2.31640625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.21%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.80
epoch: 0|step: 254|ppo_ep: 1|act_loss: -0.188232421875|cri_loss: 0.1361083984375|unsuper_loss: 0.0
average reward score: -0.43310546875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.12%) |Training time=0.77s (31.11%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 255|ppo_ep: 1|act_loss: 0.0109405517578125|cri_loss: 0.13916015625|unsuper_loss: 0.0
average reward score: -0.94873046875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.08%) |Training time=0.78s (31.06%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 256|ppo_ep: 1|act_loss: -0.18408203125|cri_loss: 0.10284423828125|unsuper_loss: 0.0
average reward score: -1.50390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.82%) |Training time=0.79s (31.35%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.80
epoch: 0|step: 257|ppo_ep: 1|act_loss: 0.223876953125|cri_loss: 0.0982666015625|unsuper_loss: 0.0
average reward score: -2.5703125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.68%) |Training time=0.79s (31.50%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.80
epoch: 0|step: 258|ppo_ep: 1|act_loss: -0.052978515625|cri_loss: 0.153076171875|unsuper_loss: 0.0
average reward score: -2.15625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.79s (31.42%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.80
[2023-07-01 08:18:22,461] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, but hysteresis is 2. Reducing hysteresis to 1
[2023-07-01 08:18:22,461] [INFO] [logging.py:96:log_dist] [Rank 0] step=260, skipped=9, lr=[8.924547050894679e-06, 8.924547050894679e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:18:22,462] [INFO] [timer.py:215:stop] epoch=0/micro_step=260/global_step=260, RunningAvgSamplesPerSec=52.15651524122892, CurrSamplesPerSec=74.34278909266142, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:18:22,620] [INFO] [logging.py:96:log_dist] [Rank 0] step=260, skipped=6, lr=[4.609438899557964e-06, 4.609438899557964e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 259|ppo_ep: 1|act_loss: -0.35400390625|cri_loss: 0.207763671875|unsuper_loss: 0.0
average reward score: -2.005859375
-------------------------------------------------------------------------------------
|E2E latency=2.31s |Gather latency=0.00s (0.00%) |Generate time=1.50s (64.78%) |Training time=0.59s (25.71%) |Others=0.22 (9.51%)|CurSamplesPerSec=13.84 |AvgSamplesPerSec=12.81
[2023-07-01 08:18:24,766] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024
epoch: 0|step: 260|ppo_ep: 1|act_loss: -0.1966552734375|cri_loss: 0.1453857421875|unsuper_loss: 0.0
average reward score: -2.146484375
-------------------------------------------------------------------------------------
|E2E latency=2.31s |Gather latency=0.00s (0.00%) |Generate time=1.49s (64.80%) |Training time=0.59s (25.60%) |Others=0.22 (9.60%)|CurSamplesPerSec=13.87 |AvgSamplesPerSec=12.81
epoch: 0|step: 261|ppo_ep: 1|act_loss: -0.19091796875|cri_loss: 0.292724609375|unsuper_loss: 0.0
average reward score: -2.21484375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.79s (31.42%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 262|ppo_ep: 1|act_loss: -0.1279296875|cri_loss: 0.092041015625|unsuper_loss: 0.0
average reward score: -3.62109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.60%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 263|ppo_ep: 1|act_loss: 0.021759033203125|cri_loss: 0.151611328125|unsuper_loss: 0.0
average reward score: -2.53515625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.84%) |Training time=0.78s (31.35%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 264|ppo_ep: 1|act_loss: -0.017181396484375|cri_loss: 0.1300048828125|unsuper_loss: 0.0
average reward score: -3.58984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.80%) |Training time=0.79s (31.43%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 265|ppo_ep: 1|act_loss: 0.30712890625|cri_loss: 0.19140625|unsuper_loss: 0.0
average reward score: -3.1484375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.24%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 266|ppo_ep: 1|act_loss: -0.01415252685546875|cri_loss: 0.1092529296875|unsuper_loss: 0.0
average reward score: -2.630859375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.05%) |Training time=0.78s (31.12%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 267|ppo_ep: 1|act_loss: 0.1328125|cri_loss: 0.0802001953125|unsuper_loss: 0.0
average reward score: -2.01953125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.19%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 268|ppo_ep: 1|act_loss: -0.07440185546875|cri_loss: 0.1796875|unsuper_loss: 0.0
average reward score: -1.6611328125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.90%) |Training time=0.78s (31.34%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
[2023-07-01 08:18:47,070] [INFO] [logging.py:96:log_dist] [Rank 0] step=270, skipped=10, lr=[8.838073100970824e-06, 8.838073100970824e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:18:47,246] [INFO] [timer.py:215:stop] epoch=0/micro_step=270/global_step=270, RunningAvgSamplesPerSec=52.20487912951929, CurrSamplesPerSec=52.52854267419502, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:18:47,406] [INFO] [logging.py:96:log_dist] [Rank 0] step=270, skipped=6, lr=[4.558664535734864e-06, 4.558664535734864e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 269|ppo_ep: 1|act_loss: -0.275390625|cri_loss: 0.201904296875|unsuper_loss: 0.0
average reward score: -2.205078125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.10%) |Training time=0.77s (31.09%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 270|ppo_ep: 1|act_loss: 0.048828125|cri_loss: 0.1572265625|unsuper_loss: 0.0
average reward score: -3.140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.88%) |Training time=0.78s (31.32%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 271|ppo_ep: 1|act_loss: 0.036224365234375|cri_loss: 0.1575927734375|unsuper_loss: 0.0
average reward score: -1.5205078125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.85%) |Training time=0.78s (31.34%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 272|ppo_ep: 1|act_loss: 0.05078125|cri_loss: 0.0865478515625|unsuper_loss: 0.0
average reward score: -4.41015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.19%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 273|ppo_ep: 1|act_loss: 0.10662841796875|cri_loss: 0.09228515625|unsuper_loss: 0.0
average reward score: -3.4453125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.92%) |Training time=0.78s (31.23%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 274|ppo_ep: 1|act_loss: -0.181396484375|cri_loss: 0.1827392578125|unsuper_loss: 0.0
average reward score: -4.5546875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.75%) |Training time=0.79s (31.38%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 275|ppo_ep: 1|act_loss: 0.27099609375|cri_loss: 0.1676025390625|unsuper_loss: 0.0
average reward score: -2.134765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.77%) |Training time=0.79s (31.38%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 276|ppo_ep: 1|act_loss: 0.31396484375|cri_loss: 0.1676025390625|unsuper_loss: 0.0
average reward score: -2.30859375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.89%) |Training time=0.79s (31.34%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 277|ppo_ep: 1|act_loss: 0.3427734375|cri_loss: 0.438720703125|unsuper_loss: 0.0
average reward score: -1.958984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.90%) |Training time=0.78s (31.31%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 278|ppo_ep: 1|act_loss: 0.108154296875|cri_loss: 0.1785888671875|unsuper_loss: 0.0
average reward score: -2.982421875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.92%) |Training time=0.78s (31.30%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
[2023-07-01 08:19:12,074] [INFO] [logging.py:96:log_dist] [Rank 0] step=280, skipped=10, lr=[8.736836458736355e-06, 8.736836458736355e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:19:12,253] [INFO] [timer.py:215:stop] epoch=0/micro_step=280/global_step=280, RunningAvgSamplesPerSec=52.18662218968872, CurrSamplesPerSec=51.58343614068504, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:19:12,414] [INFO] [logging.py:96:log_dist] [Rank 0] step=280, skipped=6, lr=[4.5051045600050906e-06, 4.5051045600050906e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 279|ppo_ep: 1|act_loss: 0.006816864013671875|cri_loss: 0.06549072265625|unsuper_loss: 0.0
average reward score: -2.412109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.79%) |Training time=0.79s (31.38%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 280|ppo_ep: 1|act_loss: 0.0237884521484375|cri_loss: 0.265625|unsuper_loss: 0.0
average reward score: -2.21875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.75%) |Training time=0.79s (31.41%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 281|ppo_ep: 1|act_loss: 0.0087738037109375|cri_loss: 0.1356201171875|unsuper_loss: 0.0
average reward score: -2.646484375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.83%) |Training time=0.78s (31.36%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 282|ppo_ep: 1|act_loss: 0.0026302337646484375|cri_loss: 0.09356689453125|unsuper_loss: 0.0
average reward score: -2.587890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.37%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 283|ppo_ep: 1|act_loss: -0.056884765625|cri_loss: 0.11456298828125|unsuper_loss: 0.0
average reward score: -2.431640625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.87%) |Training time=0.78s (31.31%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 284|ppo_ep: 1|act_loss: -0.126953125|cri_loss: 0.216552734375|unsuper_loss: 0.0
average reward score: -1.4580078125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.79s (31.41%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 285|ppo_ep: 1|act_loss: -0.2120361328125|cri_loss: 0.249267578125|unsuper_loss: 0.0
average reward score: -4.5234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.39%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 286|ppo_ep: 1|act_loss: 0.10711669921875|cri_loss: 0.1737060546875|unsuper_loss: 0.0
average reward score: -2.416015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.82%) |Training time=0.78s (31.39%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 287|ppo_ep: 1|act_loss: 0.06396484375|cri_loss: 0.1331787109375|unsuper_loss: 0.0
average reward score: -3.41796875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.87%) |Training time=0.78s (31.37%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 288|ppo_ep: 1|act_loss: -0.08319091796875|cri_loss: 0.1009521484375|unsuper_loss: 0.0
average reward score: -1.4404296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.91%) |Training time=0.78s (31.20%) |Others=0.22 (8.89%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
[2023-07-01 08:19:37,055] [INFO] [logging.py:96:log_dist] [Rank 0] step=290, skipped=10, lr=[8.630306648029188e-06, 8.630306648029188e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:19:37,235] [INFO] [timer.py:215:stop] epoch=0/micro_step=290/global_step=290, RunningAvgSamplesPerSec=52.16971898264238, CurrSamplesPerSec=51.749367195003416, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:19:37,396] [INFO] [logging.py:96:log_dist] [Rank 0] step=290, skipped=6, lr=[4.448831445228368e-06, 4.448831445228368e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 289|ppo_ep: 1|act_loss: 0.2198486328125|cri_loss: 0.09527587890625|unsuper_loss: 0.0
average reward score: -2.9453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.86%) |Training time=0.78s (31.30%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 290|ppo_ep: 1|act_loss: -0.0577392578125|cri_loss: 0.12420654296875|unsuper_loss: 0.0
average reward score: -2.8671875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.92%) |Training time=0.78s (31.20%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 291|ppo_ep: 1|act_loss: -0.21630859375|cri_loss: 0.1055908203125|unsuper_loss: 0.0
average reward score: -2.427734375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.89%) |Training time=0.78s (31.28%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 292|ppo_ep: 1|act_loss: -0.09027099609375|cri_loss: 0.1099853515625|unsuper_loss: 0.0
average reward score: -4.58984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.79%) |Training time=0.79s (31.38%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 293|ppo_ep: 1|act_loss: -0.005260467529296875|cri_loss: 0.088134765625|unsuper_loss: 0.0
average reward score: -3.82421875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.81%) |Training time=0.78s (31.36%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 294|ppo_ep: 1|act_loss: 0.316162109375|cri_loss: 0.095703125|unsuper_loss: 0.0
average reward score: -2.01171875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.79s (31.42%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 295|ppo_ep: 1|act_loss: -0.09100341796875|cri_loss: 0.0877685546875|unsuper_loss: 0.0
average reward score: -2.4609375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.87%) |Training time=0.78s (31.32%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 296|ppo_ep: 1|act_loss: -0.2142333984375|cri_loss: 0.107666015625|unsuper_loss: 0.0
average reward score: -2.798828125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.79%) |Training time=0.78s (31.43%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 297|ppo_ep: 1|act_loss: 0.1929931640625|cri_loss: 0.12384033203125|unsuper_loss: 0.0
average reward score: -2.666015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.84%) |Training time=0.78s (31.36%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 298|ppo_ep: 1|act_loss: 0.409912109375|cri_loss: 0.257080078125|unsuper_loss: 0.0
average reward score: -1.650390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.36%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
[2023-07-01 08:20:02,053] [INFO] [logging.py:96:log_dist] [Rank 0] step=300, skipped=10, lr=[8.518627816039882e-06, 8.518627816039882e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:20:02,229] [INFO] [timer.py:215:stop] epoch=0/micro_step=300/global_step=300, RunningAvgSamplesPerSec=52.151096640881924, CurrSamplesPerSec=51.24603745201091, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:20:02,388] [INFO] [logging.py:96:log_dist] [Rank 0] step=300, skipped=6, lr=[4.389921335456253e-06, 4.389921335456253e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 299|ppo_ep: 1|act_loss: 0.009185791015625|cri_loss: 0.052886962890625|unsuper_loss: 0.0
average reward score: -2.890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.55%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 300|ppo_ep: 1|act_loss: -0.1358642578125|cri_loss: 0.07867431640625|unsuper_loss: 0.0
average reward score: -2.0234375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.99%) |Training time=0.78s (31.24%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 301|ppo_ep: 1|act_loss: -0.2626953125|cri_loss: 0.099609375|unsuper_loss: 0.0
average reward score: -1.32421875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.01%) |Training time=0.78s (31.23%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 302|ppo_ep: 1|act_loss: -0.1563720703125|cri_loss: 0.18017578125|unsuper_loss: 0.0
average reward score: -1.7333984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.85%) |Training time=0.78s (31.39%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 303|ppo_ep: 1|act_loss: 0.296875|cri_loss: 0.194091796875|unsuper_loss: 0.0
average reward score: -4.99609375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.93%) |Training time=0.78s (31.25%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 304|ppo_ep: 1|act_loss: 0.0010328292846679688|cri_loss: 0.12188720703125|unsuper_loss: 0.0
average reward score: -2.32421875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.89%) |Training time=0.78s (31.29%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 305|ppo_ep: 1|act_loss: -0.1708984375|cri_loss: 0.233642578125|unsuper_loss: 0.0
average reward score: -1.4580078125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.53%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 306|ppo_ep: 1|act_loss: -0.1097412109375|cri_loss: 0.2275390625|unsuper_loss: 0.0
average reward score: -2.587890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.79s (31.40%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 307|ppo_ep: 1|act_loss: -0.113525390625|cri_loss: 0.10150146484375|unsuper_loss: 0.0
average reward score: -2.595703125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.68%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 308|ppo_ep: 1|act_loss: -0.1412353515625|cri_loss: 0.28759765625|unsuper_loss: 0.0
average reward score: -2.68359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.48%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
[2023-07-01 08:20:27,026] [INFO] [logging.py:96:log_dist] [Rank 0] step=310, skipped=10, lr=[8.401951077182031e-06, 8.401951077182031e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:20:27,205] [INFO] [timer.py:215:stop] epoch=0/micro_step=310/global_step=310, RunningAvgSamplesPerSec=52.13350628077688, CurrSamplesPerSec=51.58214755541754, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:20:27,364] [INFO] [logging.py:96:log_dist] [Rank 0] step=310, skipped=6, lr=[4.328453942900402e-06, 4.328453942900402e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 309|ppo_ep: 1|act_loss: -0.07696533203125|cri_loss: 0.1412353515625|unsuper_loss: 0.0
average reward score: -1.78125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.78s (31.42%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 310|ppo_ep: 1|act_loss: -0.26171875|cri_loss: 0.12054443359375|unsuper_loss: 0.0
average reward score: -1.0166015625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.92%) |Training time=0.78s (31.27%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 311|ppo_ep: 1|act_loss: 0.0108642578125|cri_loss: 0.0836181640625|unsuper_loss: 0.0
average reward score: -1.6484375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.78%) |Training time=0.78s (31.33%) |Others=0.22 (8.89%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 312|ppo_ep: 1|act_loss: 0.167724609375|cri_loss: 0.0809326171875|unsuper_loss: 0.0
average reward score: -1.09375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.07%) |Training time=0.78s (31.07%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 313|ppo_ep: 1|act_loss: 0.0885009765625|cri_loss: 0.0457763671875|unsuper_loss: 0.0
average reward score: -1.857421875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.02%) |Training time=0.78s (31.19%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 314|ppo_ep: 1|act_loss: 0.438720703125|cri_loss: 0.1385498046875|unsuper_loss: 0.0
average reward score: -2.98828125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.03%) |Training time=0.78s (31.18%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 315|ppo_ep: 1|act_loss: -0.58837890625|cri_loss: 0.387939453125|unsuper_loss: 0.0
average reward score: 1.1044921875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.06%) |Training time=0.78s (31.14%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 316|ppo_ep: 1|act_loss: -0.2413330078125|cri_loss: 0.05047607421875|unsuper_loss: 0.0
average reward score: 1.2255859375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.96%) |Training time=0.78s (31.24%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 317|ppo_ep: 1|act_loss: -0.110107421875|cri_loss: 0.201171875|unsuper_loss: 0.0
average reward score: 4.46875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.81%) |Training time=0.78s (31.37%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 318|ppo_ep: 1|act_loss: -0.237548828125|cri_loss: 0.17626953125|unsuper_loss: 0.0
average reward score: 3.4609375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.90%) |Training time=0.78s (31.31%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
[2023-07-01 08:20:51,996] [INFO] [logging.py:96:log_dist] [Rank 0] step=320, skipped=10, lr=[8.280434308616948e-06, 8.280434308616948e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:20:52,171] [INFO] [timer.py:215:stop] epoch=0/micro_step=320/global_step=320, RunningAvgSamplesPerSec=52.127762109783276, CurrSamplesPerSec=51.654149181493786, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:20:52,330] [INFO] [logging.py:96:log_dist] [Rank 0] step=320, skipped=6, lr=[4.264512440072707e-06, 4.264512440072707e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 319|ppo_ep: 1|act_loss: -0.054931640625|cri_loss: 0.1356201171875|unsuper_loss: 0.0
average reward score: 3.72265625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.79%) |Training time=0.79s (31.47%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 320|ppo_ep: 1|act_loss: -0.06903076171875|cri_loss: 0.10064697265625|unsuper_loss: 0.0
average reward score: 3.5234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.87%) |Training time=0.78s (31.22%) |Others=0.22 (8.91%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 321|ppo_ep: 1|act_loss: -0.1444091796875|cri_loss: 0.041595458984375|unsuper_loss: 0.0
average reward score: 3.27734375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.83%) |Training time=0.78s (31.32%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 322|ppo_ep: 1|act_loss: -0.04400634765625|cri_loss: 0.057281494140625|unsuper_loss: 0.0
average reward score: 1.736328125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.83%) |Training time=0.78s (31.32%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 323|ppo_ep: 1|act_loss: -0.10302734375|cri_loss: 0.11444091796875|unsuper_loss: 0.0
average reward score: 2.623046875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.86%) |Training time=0.78s (31.30%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 324|ppo_ep: 1|act_loss: 0.07342529296875|cri_loss: 0.12744140625|unsuper_loss: 0.0
average reward score: 2.306640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.57%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 325|ppo_ep: 1|act_loss: -0.017364501953125|cri_loss: 0.0731201171875|unsuper_loss: 0.0
average reward score: 4.0
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.93%) |Training time=0.78s (31.24%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 326|ppo_ep: 1|act_loss: -0.176513671875|cri_loss: 0.1510009765625|unsuper_loss: 0.0
average reward score: 2.859375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.72%) |Training time=0.79s (31.50%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81
epoch: 0|step: 327|ppo_ep: 1|act_loss: -0.005863189697265625|cri_loss: 0.17626953125|unsuper_loss: 0.0
average reward score: 2.05078125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.78%) |Training time=0.79s (31.44%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 328|ppo_ep: 1|act_loss: 0.07830810546875|cri_loss: 0.06134033203125|unsuper_loss: 0.0
average reward score: 2.84375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.91%) |Training time=0.78s (31.31%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
[2023-07-01 08:21:17,001] [INFO] [logging.py:96:log_dist] [Rank 0] step=330, skipped=10, lr=[8.154241936627547e-06, 8.154241936627547e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:21:17,177] [INFO] [timer.py:215:stop] epoch=0/micro_step=330/global_step=330, RunningAvgSamplesPerSec=52.11398943675025, CurrSamplesPerSec=51.958836384189105, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:21:17,336] [INFO] [logging.py:96:log_dist] [Rank 0] step=330, skipped=6, lr=[4.198183347243233e-06, 4.198183347243233e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 329|ppo_ep: 1|act_loss: 0.07696533203125|cri_loss: 0.068603515625|unsuper_loss: 0.0
average reward score: 4.14453125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.90%) |Training time=0.78s (31.33%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 330|ppo_ep: 1|act_loss: -0.039886474609375|cri_loss: 0.044189453125|unsuper_loss: 0.0
average reward score: 3.6640625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.92%) |Training time=0.78s (31.25%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 331|ppo_ep: 1|act_loss: -0.052337646484375|cri_loss: 0.04412841796875|unsuper_loss: 0.0
average reward score: 3.8359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.50%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 332|ppo_ep: 1|act_loss: 0.12115478515625|cri_loss: 0.1629638671875|unsuper_loss: 0.0
average reward score: 3.171875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.41%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 333|ppo_ep: 1|act_loss: 0.0147247314453125|cri_loss: 0.047698974609375|unsuper_loss: 0.0
average reward score: 4.046875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.56%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 334|ppo_ep: 1|act_loss: -0.08074951171875|cri_loss: 0.032257080078125|unsuper_loss: 0.0
average reward score: 5.09375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.99%) |Training time=0.78s (31.15%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 335|ppo_ep: 1|act_loss: -0.057952880859375|cri_loss: 0.042724609375|unsuper_loss: 0.0
average reward score: 3.677734375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.45s (58.11%) |Training time=0.82s (33.01%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 336|ppo_ep: 1|act_loss: -0.1143798828125|cri_loss: 0.045684814453125|unsuper_loss: 0.0
average reward score: 3.123046875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.35%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 337|ppo_ep: 1|act_loss: -0.1322021484375|cri_loss: 0.031494140625|unsuper_loss: 0.0
average reward score: 4.28125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.58%) |Training time=0.79s (31.62%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 338|ppo_ep: 1|act_loss: -0.144287109375|cri_loss: 0.035675048828125|unsuper_loss: 0.0
average reward score: 3.578125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.54%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
[2023-07-01 08:21:41,960] [INFO] [logging.py:96:log_dist] [Rank 0] step=340, skipped=10, lr=[8.023544714130509e-06, 8.023544714130509e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:21:42,140] [INFO] [timer.py:215:stop] epoch=0/micro_step=340/global_step=340, RunningAvgSamplesPerSec=52.08920957291715, CurrSamplesPerSec=51.04680739069268, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:21:42,300] [INFO] [logging.py:96:log_dist] [Rank 0] step=340, skipped=6, lr=[4.129556415368261e-06, 4.129556415368261e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 339|ppo_ep: 1|act_loss: -0.1048583984375|cri_loss: 0.0208587646484375|unsuper_loss: 0.0
average reward score: 3.08203125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.65%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 340|ppo_ep: 1|act_loss: -0.061767578125|cri_loss: 0.035980224609375|unsuper_loss: 0.0
average reward score: 3.326171875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.58%) |Training time=0.79s (31.58%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 341|ppo_ep: 1|act_loss: -0.05889892578125|cri_loss: 0.013214111328125|unsuper_loss: 0.0
average reward score: 3.626953125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.72%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 342|ppo_ep: 1|act_loss: -0.050262451171875|cri_loss: 0.043212890625|unsuper_loss: 0.0
average reward score: 3.720703125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.64%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 343|ppo_ep: 1|act_loss: -0.06964111328125|cri_loss: 0.1083984375|unsuper_loss: 0.0
average reward score: 4.0390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.74%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 344|ppo_ep: 1|act_loss: 0.08087158203125|cri_loss: 0.10015869140625|unsuper_loss: 0.0
average reward score: 2.064453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.71%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 345|ppo_ep: 1|act_loss: 0.041839599609375|cri_loss: 0.079345703125|unsuper_loss: 0.0
average reward score: 3.9296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.69%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 346|ppo_ep: 1|act_loss: 0.077392578125|cri_loss: 0.08160400390625|unsuper_loss: 0.0
average reward score: 2.216796875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.49%) |Training time=0.79s (31.74%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 347|ppo_ep: 1|act_loss: 0.0736083984375|cri_loss: 0.08294677734375|unsuper_loss: 0.0
average reward score: 4.203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.66%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 348|ppo_ep: 1|act_loss: 0.0135650634765625|cri_loss: 0.09674072265625|unsuper_loss: 0.0
average reward score: 3.330078125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.71%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
[2023-07-01 08:22:06,975] [INFO] [logging.py:96:log_dist] [Rank 0] step=350, skipped=10, lr=[7.888519489627777e-06, 7.888519489627777e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:22:07,151] [INFO] [timer.py:215:stop] epoch=0/micro_step=350/global_step=350, RunningAvgSamplesPerSec=52.05623551844447, CurrSamplesPerSec=50.85811852847716, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:22:07,310] [INFO] [logging.py:96:log_dist] [Rank 0] step=350, skipped=6, lr=[4.058724504646834e-06, 4.058724504646834e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 349|ppo_ep: 1|act_loss: 0.10675048828125|cri_loss: 0.12646484375|unsuper_loss: 0.0
average reward score: 1.84765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.73%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 350|ppo_ep: 1|act_loss: 0.11016845703125|cri_loss: 0.070068359375|unsuper_loss: 0.0
average reward score: 1.279296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.64%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 351|ppo_ep: 1|act_loss: 0.2406005859375|cri_loss: 0.1600341796875|unsuper_loss: 0.0
average reward score: 0.479248046875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.89%) |Training time=0.78s (31.38%) |Others=0.22 (8.73%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
[2023-07-01 08:22:14,799] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 352|ppo_ep: 1|act_loss: 0.2386474609375|cri_loss: 0.1103515625|unsuper_loss: 0.0
average reward score: 1.291015625
-------------------------------------------------------------------------------------
|E2E latency=2.45s |Gather latency=0.00s (0.00%) |Generate time=1.49s (60.63%) |Training time=0.79s (32.32%) |Others=0.17 (7.06%)|CurSamplesPerSec=13.04 |AvgSamplesPerSec=12.81
[2023-07-01 08:22:17,261] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192
epoch: 0|step: 353|ppo_ep: 1|act_loss: 0.481201171875|cri_loss: 0.308349609375|unsuper_loss: 0.0
average reward score: -0.63427734375
-------------------------------------------------------------------------------------
|E2E latency=2.46s |Gather latency=0.00s (0.00%) |Generate time=1.49s (60.49%) |Training time=0.80s (32.48%) |Others=0.17 (7.03%)|CurSamplesPerSec=13.00 |AvgSamplesPerSec=12.81
epoch: 0|step: 354|ppo_ep: 1|act_loss: 0.296142578125|cri_loss: 0.20068359375|unsuper_loss: 0.0
average reward score: -0.2060546875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.71%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 355|ppo_ep: 1|act_loss: 0.456787109375|cri_loss: 0.278564453125|unsuper_loss: 0.0
average reward score: -0.403564453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.65%) |Training time=0.79s (31.55%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 356|ppo_ep: 1|act_loss: 0.443603515625|cri_loss: 0.2313232421875|unsuper_loss: 0.0
average reward score: -0.427734375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.66%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 357|ppo_ep: 1|act_loss: 0.359375|cri_loss: 0.1785888671875|unsuper_loss: 0.0
average reward score: 0.2083740234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.58%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 358|ppo_ep: 1|act_loss: 0.0963134765625|cri_loss: 0.2235107421875|unsuper_loss: 0.0
average reward score: -0.01922607421875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.53%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
[2023-07-01 08:22:31,884] [INFO] [logging.py:96:log_dist] [Rank 0] step=360, skipped=10, lr=[7.749348967910034e-06, 7.749348967910034e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:22:32,063] [INFO] [timer.py:215:stop] epoch=0/micro_step=360/global_step=360, RunningAvgSamplesPerSec=52.0301029292908, CurrSamplesPerSec=51.29909768320761, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:22:32,223] [INFO] [logging.py:96:log_dist] [Rank 0] step=360, skipped=8, lr=[4.0005357013709215e-06, 4.0005357013709215e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 359|ppo_ep: 1|act_loss: 0.2340087890625|cri_loss: 0.140380859375|unsuper_loss: 0.0
average reward score: -0.38818359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.58%) |Training time=0.79s (31.60%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 360|ppo_ep: 1|act_loss: 0.3876953125|cri_loss: 0.318115234375|unsuper_loss: 0.0
average reward score: -2.419921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.70%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
[2023-07-01 08:22:37,221] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096
epoch: 0|step: 361|ppo_ep: 1|act_loss: 0.129150390625|cri_loss: 0.09161376953125|unsuper_loss: 0.0
average reward score: -1.6640625
-------------------------------------------------------------------------------------
|E2E latency=2.45s |Gather latency=0.00s (0.00%) |Generate time=1.49s (60.64%) |Training time=0.79s (32.34%) |Others=0.17 (7.02%)|CurSamplesPerSec=13.04 |AvgSamplesPerSec=12.81
epoch: 0|step: 362|ppo_ep: 1|act_loss: 0.0443115234375|cri_loss: 0.25|unsuper_loss: 0.0
average reward score: -2.34765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.61%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 363|ppo_ep: 1|act_loss: 0.20361328125|cri_loss: 0.2042236328125|unsuper_loss: 0.0
average reward score: -2.720703125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.78s (31.42%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 364|ppo_ep: 1|act_loss: 0.02618408203125|cri_loss: 0.276123046875|unsuper_loss: 0.0
average reward score: -0.67919921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.73%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 365|ppo_ep: 1|act_loss: 0.2371826171875|cri_loss: 0.1641845703125|unsuper_loss: 0.0
average reward score: -2.16015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.75%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 366|ppo_ep: 1|act_loss: 0.006702423095703125|cri_loss: 0.2052001953125|unsuper_loss: 0.0
average reward score: -0.93212890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.60%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 367|ppo_ep: 1|act_loss: -0.04510498046875|cri_loss: 0.1639404296875|unsuper_loss: 0.0
average reward score: -2.287109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.59%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 368|ppo_ep: 1|act_loss: -0.252197265625|cri_loss: 0.254638671875|unsuper_loss: 0.0
average reward score: -1.240234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.68%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
[2023-07-01 08:22:56,833] [INFO] [logging.py:96:log_dist] [Rank 0] step=370, skipped=10, lr=[7.606221462835909e-06, 7.606221462835909e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:22:57,010] [INFO] [timer.py:215:stop] epoch=0/micro_step=370/global_step=370, RunningAvgSamplesPerSec=52.00473590207312, CurrSamplesPerSec=51.09381442985007, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:22:57,169] [INFO] [logging.py:96:log_dist] [Rank 0] step=370, skipped=9, lr=[3.933522533409623e-06, 3.933522533409623e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 369|ppo_ep: 1|act_loss: 0.0250396728515625|cri_loss: 0.1590576171875|unsuper_loss: 0.0
average reward score: -2.888671875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.67%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 370|ppo_ep: 1|act_loss: -0.1568603515625|cri_loss: 0.397705078125|unsuper_loss: 0.0
average reward score: -2.58984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.61%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 371|ppo_ep: 1|act_loss: 0.263916015625|cri_loss: 0.1297607421875|unsuper_loss: 0.0
average reward score: -3.072265625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.23%) |Training time=0.80s (31.94%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81
epoch: 0|step: 372|ppo_ep: 1|act_loss: -0.193359375|cri_loss: 0.11859130859375|unsuper_loss: 0.0
average reward score: -2.68359375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.67%) |Training time=0.79s (31.50%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81
epoch: 0|step: 373|ppo_ep: 1|act_loss: 0.08880615234375|cri_loss: 0.061126708984375|unsuper_loss: 0.0
average reward score: -1.970703125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.30%) |Training time=0.80s (31.90%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.81
epoch: 0|step: 374|ppo_ep: 1|act_loss: 0.0213165283203125|cri_loss: 0.137451171875|unsuper_loss: 0.0
average reward score: -2.658203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.70%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 375|ppo_ep: 1|act_loss: 0.08203125|cri_loss: 0.1251220703125|unsuper_loss: 0.0
average reward score: -2.24609375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.57%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 376|ppo_ep: 1|act_loss: -0.047088623046875|cri_loss: 0.1439208984375|unsuper_loss: 0.0
average reward score: -2.392578125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.90%) |Training time=0.78s (31.22%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 377|ppo_ep: 1|act_loss: -0.1260986328125|cri_loss: 0.2059326171875|unsuper_loss: 0.0
average reward score: -1.533203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.68%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 378|ppo_ep: 1|act_loss: -0.1795654296875|cri_loss: 0.10009765625|unsuper_loss: 0.0
average reward score: -3.173828125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.73%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
[2023-07-01 08:23:21,858] [INFO] [logging.py:96:log_dist] [Rank 0] step=380, skipped=10, lr=[7.459330642521499e-06, 7.459330642521499e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:23:22,035] [INFO] [timer.py:215:stop] epoch=0/micro_step=380/global_step=380, RunningAvgSamplesPerSec=51.976800306125085, CurrSamplesPerSec=51.10830902963015, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:23:22,193] [INFO] [logging.py:96:log_dist] [Rank 0] step=380, skipped=9, lr=[3.8572239314745966e-06, 3.8572239314745966e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 379|ppo_ep: 1|act_loss: -0.103515625|cri_loss: 0.037200927734375|unsuper_loss: 0.0
average reward score: -2.57421875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.63%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 380|ppo_ep: 1|act_loss: -0.11346435546875|cri_loss: 0.11126708984375|unsuper_loss: 0.0
average reward score: -3.271484375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.51%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 381|ppo_ep: 1|act_loss: -0.10687255859375|cri_loss: 0.1171875|unsuper_loss: 0.0
average reward score: -4.25390625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.69%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 382|ppo_ep: 1|act_loss: 0.185791015625|cri_loss: 0.1903076171875|unsuper_loss: 0.0
average reward score: -1.146484375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.75%) |Training time=0.79s (31.42%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 383|ppo_ep: 1|act_loss: 0.053070068359375|cri_loss: 0.094970703125|unsuper_loss: 0.0
average reward score: -3.900390625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.52%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 384|ppo_ep: 1|act_loss: 0.1495361328125|cri_loss: 0.08660888671875|unsuper_loss: 0.0
average reward score: -1.6904296875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.52%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 385|ppo_ep: 1|act_loss: 0.138671875|cri_loss: 0.09027099609375|unsuper_loss: 0.0
average reward score: -5.4375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.78s (31.43%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 386|ppo_ep: 1|act_loss: 0.14306640625|cri_loss: 0.0672607421875|unsuper_loss: 0.0
average reward score: -4.953125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.62%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81
epoch: 0|step: 387|ppo_ep: 1|act_loss: 0.12481689453125|cri_loss: 0.08941650390625|unsuper_loss: 0.0
average reward score: -4.46484375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.71%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81
epoch: 0|step: 388|ppo_ep: 1|act_loss: -0.0234375|cri_loss: 0.04107666015625|unsuper_loss: 0.0
average reward score: -2.8984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.64%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
[2023-07-01 08:23:46,862] [INFO] [logging.py:96:log_dist] [Rank 0] step=390, skipped=10, lr=[7.308875267284935e-06, 7.308875267284935e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:23:47,043] [INFO] [timer.py:215:stop] epoch=0/micro_step=390/global_step=390, RunningAvgSamplesPerSec=51.956256933038894, CurrSamplesPerSec=50.59014466794897, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:23:47,203] [INFO] [logging.py:96:log_dist] [Rank 0] step=390, skipped=9, lr=[3.779088848132372e-06, 3.779088848132372e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 389|ppo_ep: 1|act_loss: 0.15185546875|cri_loss: 0.13330078125|unsuper_loss: 0.0
average reward score: -1.75
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.40%) |Training time=0.80s (31.83%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 390|ppo_ep: 1|act_loss: 0.064697265625|cri_loss: 0.06304931640625|unsuper_loss: 0.0
average reward score: -3.287109375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.56%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 391|ppo_ep: 1|act_loss: -0.0550537109375|cri_loss: 0.05303955078125|unsuper_loss: 0.0
average reward score: -2.60546875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.60%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 392|ppo_ep: 1|act_loss: -0.005657196044921875|cri_loss: 0.1802978515625|unsuper_loss: 0.0
average reward score: -2.046875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.73%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 393|ppo_ep: 1|act_loss: -0.1446533203125|cri_loss: 0.07208251953125|unsuper_loss: 0.0
average reward score: -2.337890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.75%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 394|ppo_ep: 1|act_loss: -0.10504150390625|cri_loss: 0.179931640625|unsuper_loss: 0.0
average reward score: -2.212890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.82%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 395|ppo_ep: 1|act_loss: 0.1029052734375|cri_loss: 0.09686279296875|unsuper_loss: 0.0
average reward score: -4.046875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.68%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 396|ppo_ep: 1|act_loss: 0.0245361328125|cri_loss: 0.1693115234375|unsuper_loss: 0.0
average reward score: -2.314453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.36%) |Training time=0.79s (31.81%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 397|ppo_ep: 1|act_loss: -0.043304443359375|cri_loss: 0.04229736328125|unsuper_loss: 0.0
average reward score: -3.359375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.81%) |Training time=0.78s (31.41%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 398|ppo_ep: 1|act_loss: -0.2254638671875|cri_loss: 0.07806396484375|unsuper_loss: 0.0
average reward score: -3.142578125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.79%) |Training time=0.78s (31.43%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
[2023-07-01 08:24:11,869] [INFO] [logging.py:96:log_dist] [Rank 0] step=400, skipped=10, lr=[7.155058920700617e-06, 7.155058920700617e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:24:12,045] [INFO] [timer.py:215:stop] epoch=0/micro_step=400/global_step=400, RunningAvgSamplesPerSec=51.93466840082398, CurrSamplesPerSec=51.04929257384149, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:24:12,206] [INFO] [logging.py:96:log_dist] [Rank 0] step=400, skipped=9, lr=[3.6992230092138004e-06, 3.6992230092138004e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 399|ppo_ep: 1|act_loss: -0.0390625|cri_loss: 0.0706787109375|unsuper_loss: 0.0
average reward score: -4.3515625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.64%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 400|ppo_ep: 1|act_loss: 0.222900390625|cri_loss: 0.07318115234375|unsuper_loss: 0.0
average reward score: -2.384765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.67%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 401|ppo_ep: 1|act_loss: 0.051666259765625|cri_loss: 0.04010009765625|unsuper_loss: 0.0
average reward score: -4.3203125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.34%) |Training time=0.80s (31.85%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 402|ppo_ep: 1|act_loss: -0.08135986328125|cri_loss: 0.0285491943359375|unsuper_loss: 0.0
average reward score: -3.83984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.78%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 403|ppo_ep: 1|act_loss: -0.12310791015625|cri_loss: 0.06390380859375|unsuper_loss: 0.0
average reward score: -2.3125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.65%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 404|ppo_ep: 1|act_loss: -0.0479736328125|cri_loss: 0.042877197265625|unsuper_loss: 0.0
average reward score: -3.5234375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.66%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 405|ppo_ep: 1|act_loss: 0.060333251953125|cri_loss: 0.060272216796875|unsuper_loss: 0.0
average reward score: -3.236328125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.81%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 406|ppo_ep: 1|act_loss: 0.192138671875|cri_loss: 0.048065185546875|unsuper_loss: 0.0
average reward score: -3.92578125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.24%) |Training time=0.80s (31.99%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81
epoch: 0|step: 407|ppo_ep: 1|act_loss: 0.02398681640625|cri_loss: 0.0236053466796875|unsuper_loss: 0.0
average reward score: -1.84765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.80s (31.79%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 408|ppo_ep: 1|act_loss: -0.066650390625|cri_loss: 0.05072021484375|unsuper_loss: 0.0
average reward score: -2.701171875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.35%) |Training time=0.80s (31.81%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
[2023-07-01 08:24:36,897] [INFO] [logging.py:96:log_dist] [Rank 0] step=410, skipped=10, lr=[6.998089734127033e-06, 6.998089734127033e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:24:37,076] [INFO] [timer.py:215:stop] epoch=0/micro_step=410/global_step=410, RunningAvgSamplesPerSec=51.90624478389156, CurrSamplesPerSec=51.19460994249568, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:24:37,237] [INFO] [logging.py:96:log_dist] [Rank 0] step=410, skipped=9, lr=[3.6177344824627854e-06, 3.6177344824627854e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 409|ppo_ep: 1|act_loss: -0.09442138671875|cri_loss: 0.05120849609375|unsuper_loss: 0.0
average reward score: -1.87890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.62%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 410|ppo_ep: 1|act_loss: -0.09503173828125|cri_loss: 0.10931396484375|unsuper_loss: 0.0
average reward score: -0.666015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.54%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 411|ppo_ep: 1|act_loss: -0.183837890625|cri_loss: 0.2255859375|unsuper_loss: 0.0
average reward score: -0.67041015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.62%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 412|ppo_ep: 1|act_loss: -0.12164306640625|cri_loss: 0.1085205078125|unsuper_loss: 0.0
average reward score: 0.165771484375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.79s (31.49%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 413|ppo_ep: 1|act_loss: -0.1180419921875|cri_loss: 0.1715087890625|unsuper_loss: 0.0
average reward score: 0.2890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.69%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 414|ppo_ep: 1|act_loss: 0.0251922607421875|cri_loss: 0.12139892578125|unsuper_loss: 0.0
average reward score: 0.818359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.61%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 415|ppo_ep: 1|act_loss: 0.307861328125|cri_loss: 0.1944580078125|unsuper_loss: 0.0
average reward score: 0.89453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.72%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 416|ppo_ep: 1|act_loss: 0.1326904296875|cri_loss: 0.1396484375|unsuper_loss: 0.0
average reward score: 1.078125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.55%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 417|ppo_ep: 1|act_loss: -0.10302734375|cri_loss: 0.1336669921875|unsuper_loss: 0.0
average reward score: 1.86328125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.61%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 418|ppo_ep: 1|act_loss: -0.364990234375|cri_loss: 0.22509765625|unsuper_loss: 0.0
average reward score: 2.28125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.68%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
[2023-07-01 08:25:01,890] [INFO] [logging.py:96:log_dist] [Rank 0] step=420, skipped=10, lr=[6.838180105080878e-06, 6.838180105080878e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:25:02,071] [INFO] [timer.py:215:stop] epoch=0/micro_step=420/global_step=420, RunningAvgSamplesPerSec=51.88667175359361, CurrSamplesPerSec=50.3423646308708, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:25:02,232] [INFO] [logging.py:96:log_dist] [Rank 0] step=420, skipped=9, lr=[3.534733531308085e-06, 3.534733531308085e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 419|ppo_ep: 1|act_loss: -0.08856201171875|cri_loss: 0.1396484375|unsuper_loss: 0.0
average reward score: 1.4580078125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.24%) |Training time=0.80s (31.94%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 420|ppo_ep: 1|act_loss: -0.0119171142578125|cri_loss: 0.044158935546875|unsuper_loss: 0.0
average reward score: 0.61083984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.65%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 421|ppo_ep: 1|act_loss: -0.01837158203125|cri_loss: 0.08209228515625|unsuper_loss: 0.0
average reward score: 2.408203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.75%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 422|ppo_ep: 1|act_loss: -0.265625|cri_loss: 0.1708984375|unsuper_loss: 0.0
average reward score: 2.84765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.69%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 423|ppo_ep: 1|act_loss: 0.432373046875|cri_loss: 0.287109375|unsuper_loss: 0.0
average reward score: 3.68359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.65%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 424|ppo_ep: 1|act_loss: -0.404296875|cri_loss: 0.175048828125|unsuper_loss: 0.0
average reward score: 4.3046875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.72%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 425|ppo_ep: 1|act_loss: -0.05853271484375|cri_loss: 0.045562744140625|unsuper_loss: 0.0
average reward score: 4.19921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.69%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 426|ppo_ep: 1|act_loss: 0.7734375|cri_loss: 0.381591796875|unsuper_loss: 0.0
average reward score: 3.810546875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.53%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 427|ppo_ep: 1|act_loss: 0.50244140625|cri_loss: 0.3427734375|unsuper_loss: 0.0
average reward score: 4.34375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.64%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 428|ppo_ep: 1|act_loss: -0.017791748046875|cri_loss: 0.09185791015625|unsuper_loss: 0.0
average reward score: 2.7265625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.60%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
[2023-07-01 08:25:26,907] [INFO] [logging.py:96:log_dist] [Rank 0] step=430, skipped=10, lr=[6.675546409838583e-06, 6.675546409838583e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:25:27,083] [INFO] [timer.py:215:stop] epoch=0/micro_step=430/global_step=430, RunningAvgSamplesPerSec=51.865142424222554, CurrSamplesPerSec=50.65506157249227, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:25:27,242] [INFO] [logging.py:96:log_dist] [Rank 0] step=430, skipped=9, lr=[3.4503324656641074e-06, 3.4503324656641074e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 429|ppo_ep: 1|act_loss: -0.09375|cri_loss: 0.09967041015625|unsuper_loss: 0.0
average reward score: 4.0703125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.39%) |Training time=0.80s (31.87%) |Others=0.22 (8.73%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 430|ppo_ep: 1|act_loss: 0.040008544921875|cri_loss: 0.062286376953125|unsuper_loss: 0.0
average reward score: 3.318359375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.59%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 431|ppo_ep: 1|act_loss: -0.0281524658203125|cri_loss: 0.11102294921875|unsuper_loss: 0.0
average reward score: 3.7890625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.58%) |Training time=0.79s (31.62%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 432|ppo_ep: 1|act_loss: 0.1917724609375|cri_loss: 0.1549072265625|unsuper_loss: 0.0
average reward score: 3.83203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.50%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 433|ppo_ep: 1|act_loss: -0.1365966796875|cri_loss: 0.047943115234375|unsuper_loss: 0.0
average reward score: 4.01953125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.63%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 434|ppo_ep: 1|act_loss: -0.0265960693359375|cri_loss: 0.070068359375|unsuper_loss: 0.0
average reward score: 2.775390625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.39%) |Training time=0.80s (31.83%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 435|ppo_ep: 1|act_loss: 0.0946044921875|cri_loss: 0.064453125|unsuper_loss: 0.0
average reward score: 2.84375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.79s (31.44%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 436|ppo_ep: 1|act_loss: 0.1785888671875|cri_loss: 0.09979248046875|unsuper_loss: 0.0
average reward score: 2.943359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.60%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 437|ppo_ep: 1|act_loss: 0.49755859375|cri_loss: 0.343505859375|unsuper_loss: 0.0
average reward score: 2.65234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.74%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 438|ppo_ep: 1|act_loss: 0.321533203125|cri_loss: 0.1927490234375|unsuper_loss: 0.0
average reward score: 3.462890625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.28%) |Training time=0.80s (31.85%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81
[2023-07-01 08:25:51,911] [INFO] [logging.py:96:log_dist] [Rank 0] step=440, skipped=10, lr=[6.5104087106541136e-06, 6.5104087106541136e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:25:52,092] [INFO] [timer.py:215:stop] epoch=0/micro_step=440/global_step=440, RunningAvgSamplesPerSec=51.84449417609737, CurrSamplesPerSec=50.286176724696695, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:25:52,251] [INFO] [logging.py:96:log_dist] [Rank 0] step=440, skipped=9, lr=[3.364645489962566e-06, 3.364645489962566e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 439|ppo_ep: 1|act_loss: 0.1375732421875|cri_loss: 0.085693359375|unsuper_loss: 0.0
average reward score: 1.3857421875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.26%) |Training time=0.80s (32.00%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 440|ppo_ep: 1|act_loss: -0.1767578125|cri_loss: 0.04156494140625|unsuper_loss: 0.0
average reward score: 4.0
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.42%) |Training time=0.80s (31.82%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 441|ppo_ep: 1|act_loss: -0.217529296875|cri_loss: 0.065673828125|unsuper_loss: 0.0
average reward score: 2.2890625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.23%) |Training time=0.80s (32.04%) |Others=0.22 (8.73%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81
epoch: 0|step: 442|ppo_ep: 1|act_loss: -0.1175537109375|cri_loss: 0.041351318359375|unsuper_loss: 0.0
average reward score: 3.23828125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.41%) |Training time=0.80s (31.82%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 443|ppo_ep: 1|act_loss: -0.10382080078125|cri_loss: 0.015106201171875|unsuper_loss: 0.0
average reward score: 3.2890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.69%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 444|ppo_ep: 1|act_loss: -0.0259552001953125|cri_loss: 0.08148193359375|unsuper_loss: 0.0
average reward score: 1.93359375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.52%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 445|ppo_ep: 1|act_loss: -0.04638671875|cri_loss: 0.166748046875|unsuper_loss: 0.0
average reward score: 3.08984375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.79s (31.49%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 446|ppo_ep: 1|act_loss: -0.053070068359375|cri_loss: 0.04461669921875|unsuper_loss: 0.0
average reward score: 1.9794921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.57%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 447|ppo_ep: 1|act_loss: 0.06768798828125|cri_loss: 0.08343505859375|unsuper_loss: 0.0
average reward score: 3.490234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.57%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 448|ppo_ep: 1|act_loss: -0.1307373046875|cri_loss: 0.0931396484375|unsuper_loss: 0.0
average reward score: 3.224609375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.60%) |Others=0.22 (8.73%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
[2023-07-01 08:26:16,899] [INFO] [logging.py:96:log_dist] [Rank 0] step=450, skipped=10, lr=[6.342990457989214e-06, 6.342990457989214e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:26:17,075] [INFO] [timer.py:215:stop] epoch=0/micro_step=450/global_step=450, RunningAvgSamplesPerSec=51.82896790981009, CurrSamplesPerSec=51.82415783352665, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:26:17,234] [INFO] [logging.py:96:log_dist] [Rank 0] step=450, skipped=9, lr=[3.277788548620639e-06, 3.277788548620639e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 449|ppo_ep: 1|act_loss: 0.042144775390625|cri_loss: 0.0300445556640625|unsuper_loss: 0.0
average reward score: 3.412109375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.78%) |Training time=0.78s (31.40%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 450|ppo_ep: 1|act_loss: 0.12384033203125|cri_loss: 0.1572265625|unsuper_loss: 0.0
average reward score: 2.734375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.53%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 451|ppo_ep: 1|act_loss: 0.0927734375|cri_loss: 0.030181884765625|unsuper_loss: 0.0
average reward score: 2.31640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.65%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 452|ppo_ep: 1|act_loss: 0.1341552734375|cri_loss: 0.041839599609375|unsuper_loss: 0.0
average reward score: 3.46484375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.80s (31.79%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 453|ppo_ep: 1|act_loss: -0.00974273681640625|cri_loss: 0.0679931640625|unsuper_loss: 0.0
average reward score: 3.43359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.75%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 454|ppo_ep: 1|act_loss: 0.0266571044921875|cri_loss: 0.0303192138671875|unsuper_loss: 0.0
average reward score: 3.1640625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.72%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 455|ppo_ep: 1|act_loss: -0.0196075439453125|cri_loss: 0.0340576171875|unsuper_loss: 0.0
average reward score: 3.009765625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.21%) |Training time=0.80s (32.01%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.81
epoch: 0|step: 456|ppo_ep: 1|act_loss: 0.05120849609375|cri_loss: 0.0267181396484375|unsuper_loss: 0.0
average reward score: 3.541015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.71%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 457|ppo_ep: 1|act_loss: 0.0208892822265625|cri_loss: 0.06402587890625|unsuper_loss: 0.0
average reward score: 2.392578125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.90%) |Training time=0.78s (31.33%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 458|ppo_ep: 1|act_loss: 0.037109375|cri_loss: 0.039093017578125|unsuper_loss: 0.0
average reward score: 3.6484375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.96%) |Training time=0.78s (31.25%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
[2023-07-01 08:26:41,914] [INFO] [logging.py:96:log_dist] [Rank 0] step=460, skipped=10, lr=[6.173518188159017e-06, 6.173518188159017e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:26:42,090] [INFO] [timer.py:215:stop] epoch=0/micro_step=460/global_step=460, RunningAvgSamplesPerSec=51.81293659980854, CurrSamplesPerSec=51.52634425877704, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:26:42,250] [INFO] [logging.py:96:log_dist] [Rank 0] step=460, skipped=9, lr=[3.189879169154723e-06, 3.189879169154723e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 459|ppo_ep: 1|act_loss: -0.04974365234375|cri_loss: 0.05517578125|unsuper_loss: 0.0
average reward score: 4.078125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.72%) |Training time=0.79s (31.46%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 460|ppo_ep: 1|act_loss: 0.115234375|cri_loss: 0.06146240234375|unsuper_loss: 0.0
average reward score: 3.22265625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.79s (31.50%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 461|ppo_ep: 1|act_loss: -0.0013332366943359375|cri_loss: 0.056243896484375|unsuper_loss: 0.0
average reward score: 2.4296875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.41%) |Training time=0.80s (31.83%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81
epoch: 0|step: 462|ppo_ep: 1|act_loss: 0.1343994140625|cri_loss: 0.045166015625|unsuper_loss: 0.0
average reward score: 2.552734375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.70%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 463|ppo_ep: 1|act_loss: 0.192138671875|cri_loss: 0.205810546875|unsuper_loss: 0.0
average reward score: 2.0859375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.59%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 464|ppo_ep: 1|act_loss: -0.054290771484375|cri_loss: 0.0467529296875|unsuper_loss: 0.0
average reward score: 3.90234375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.50%) |Training time=0.79s (31.69%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 465|ppo_ep: 1|act_loss: -0.0231475830078125|cri_loss: 0.037078857421875|unsuper_loss: 0.0
average reward score: 3.13671875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.82%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 466|ppo_ep: 1|act_loss: -0.1474609375|cri_loss: 0.043487548828125|unsuper_loss: 0.0
average reward score: 3.0859375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.27%) |Training time=0.80s (31.96%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 467|ppo_ep: 1|act_loss: 0.0682373046875|cri_loss: 0.0576171875|unsuper_loss: 0.0
average reward score: 3.060546875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.36%) |Training time=0.80s (31.86%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 468|ppo_ep: 1|act_loss: 0.006526947021484375|cri_loss: 0.0452880859375|unsuper_loss: 0.0
average reward score: 4.16796875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.59%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
[2023-07-01 08:27:06,918] [INFO] [logging.py:96:log_dist] [Rank 0] step=470, skipped=10, lr=[6.002221216802128e-06, 6.002221216802128e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:27:07,099] [INFO] [timer.py:215:stop] epoch=0/micro_step=470/global_step=470, RunningAvgSamplesPerSec=51.79474671061898, CurrSamplesPerSec=51.72822604484345, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:27:07,259] [INFO] [logging.py:96:log_dist] [Rank 0] step=470, skipped=9, lr=[3.101036303152072e-06, 3.101036303152072e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 469|ppo_ep: 1|act_loss: 0.037017822265625|cri_loss: 0.0202484130859375|unsuper_loss: 0.0
average reward score: 3.955078125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.81%) |Training time=0.78s (31.36%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 470|ppo_ep: 1|act_loss: 0.1474609375|cri_loss: 0.055572509765625|unsuper_loss: 0.0
average reward score: 2.88671875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.79s (31.41%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 471|ppo_ep: 1|act_loss: 0.09393310546875|cri_loss: 0.05352783203125|unsuper_loss: 0.0
average reward score: 3.267578125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.56%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 472|ppo_ep: 1|act_loss: -0.005138397216796875|cri_loss: 0.0299835205078125|unsuper_loss: 0.0
average reward score: 2.87109375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.78s (31.45%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 473|ppo_ep: 1|act_loss: -0.0870361328125|cri_loss: 0.0927734375|unsuper_loss: 0.0
average reward score: 2.216796875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.53%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 474|ppo_ep: 1|act_loss: -0.0041656494140625|cri_loss: 0.05389404296875|unsuper_loss: 0.0
average reward score: 2.61328125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.60%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 475|ppo_ep: 1|act_loss: -0.047088623046875|cri_loss: 0.047393798828125|unsuper_loss: 0.0
average reward score: 2.06640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.65%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 476|ppo_ep: 1|act_loss: -0.0125885009765625|cri_loss: 0.06610107421875|unsuper_loss: 0.0
average reward score: 3.3515625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.57%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 477|ppo_ep: 1|act_loss: -0.0002532005310058594|cri_loss: 0.11529541015625|unsuper_loss: 0.0
average reward score: 1.6953125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.58%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 478|ppo_ep: 1|act_loss: -0.1527099609375|cri_loss: 0.1478271484375|unsuper_loss: 0.0
average reward score: 2.587890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.56%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
[2023-07-01 08:27:31,905] [INFO] [logging.py:96:log_dist] [Rank 0] step=480, skipped=10, lr=[5.829331328589974e-06, 5.829331328589974e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:27:32,080] [INFO] [timer.py:215:stop] epoch=0/micro_step=480/global_step=480, RunningAvgSamplesPerSec=51.78436242445469, CurrSamplesPerSec=50.665846255129615, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:27:32,240] [INFO] [logging.py:96:log_dist] [Rank 0] step=480, skipped=9, lr=[3.011380165315503e-06, 3.011380165315503e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 479|ppo_ep: 1|act_loss: -0.259521484375|cri_loss: 0.10260009765625|unsuper_loss: 0.0
average reward score: 2.890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.38%) |Training time=0.80s (31.87%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 480|ppo_ep: 1|act_loss: -0.128662109375|cri_loss: 0.037445068359375|unsuper_loss: 0.0
average reward score: 2.390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.79s (31.52%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 481|ppo_ep: 1|act_loss: -0.0643310546875|cri_loss: 0.042388916015625|unsuper_loss: 0.0
average reward score: 4.25
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.65%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 482|ppo_ep: 1|act_loss: 0.1011962890625|cri_loss: 0.06683349609375|unsuper_loss: 0.0
average reward score: 3.07421875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.72%) |Training time=0.79s (31.47%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 483|ppo_ep: 1|act_loss: 0.1641845703125|cri_loss: 0.1060791015625|unsuper_loss: 0.0
average reward score: 2.236328125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.64%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 484|ppo_ep: 1|act_loss: 0.014984130859375|cri_loss: 0.029022216796875|unsuper_loss: 0.0
average reward score: 2.958984375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.42%) |Training time=0.80s (31.77%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.81
epoch: 0|step: 485|ppo_ep: 1|act_loss: -0.0292510986328125|cri_loss: 0.02252197265625|unsuper_loss: 0.0
average reward score: 3.72265625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.80s (31.80%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 486|ppo_ep: 1|act_loss: -0.004482269287109375|cri_loss: 0.01898193359375|unsuper_loss: 0.0
average reward score: 4.72265625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.66%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 487|ppo_ep: 1|act_loss: -0.1724853515625|cri_loss: 0.03619384765625|unsuper_loss: 0.0
average reward score: 2.236328125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.77%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 488|ppo_ep: 1|act_loss: -0.05059814453125|cri_loss: 0.0300445556640625|unsuper_loss: 0.0
average reward score: 4.03515625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.30%) |Training time=0.80s (31.90%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
[2023-07-01 08:27:56,927] [INFO] [logging.py:96:log_dist] [Rank 0] step=490, skipped=10, lr=[5.655082463595249e-06, 5.655082463595249e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:27:57,106] [INFO] [timer.py:215:stop] epoch=0/micro_step=490/global_step=490, RunningAvgSamplesPerSec=51.767584021520136, CurrSamplesPerSec=51.21482851213101, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:27:57,266] [INFO] [logging.py:96:log_dist] [Rank 0] step=490, skipped=9, lr=[2.9210320707989525e-06, 2.9210320707989525e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 489|ppo_ep: 1|act_loss: 0.0249176025390625|cri_loss: 0.0303802490234375|unsuper_loss: 0.0
average reward score: 3.994140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.62%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 490|ppo_ep: 1|act_loss: -0.045806884765625|cri_loss: 0.016265869140625|unsuper_loss: 0.0
average reward score: 4.609375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.78%) |Training time=0.78s (31.46%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.87 |AvgSamplesPerSec=12.81
epoch: 0|step: 491|ppo_ep: 1|act_loss: -0.007472991943359375|cri_loss: 0.00922393798828125|unsuper_loss: 0.0
average reward score: 3.900390625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.78s (31.48%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 492|ppo_ep: 1|act_loss: 0.054168701171875|cri_loss: 0.0140533447265625|unsuper_loss: 0.0
average reward score: 4.49609375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.60%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 493|ppo_ep: 1|act_loss: 0.135498046875|cri_loss: 0.0302581787109375|unsuper_loss: 0.0
average reward score: 3.87109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.53%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 494|ppo_ep: 1|act_loss: 0.064208984375|cri_loss: 0.029998779296875|unsuper_loss: 0.0
average reward score: 3.58203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.75%) |Training time=0.79s (31.49%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 495|ppo_ep: 1|act_loss: -0.01373291015625|cri_loss: 0.0218353271484375|unsuper_loss: 0.0
average reward score: 3.40625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.83%) |Training time=0.78s (31.39%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 496|ppo_ep: 1|act_loss: -0.0264129638671875|cri_loss: 0.032501220703125|unsuper_loss: 0.0
average reward score: 3.6171875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.86%) |Training time=0.78s (31.28%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 497|ppo_ep: 1|act_loss: 0.031707763671875|cri_loss: 0.0582275390625|unsuper_loss: 0.0
average reward score: 3.048828125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.57%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 498|ppo_ep: 1|act_loss: 0.043731689453125|cri_loss: 0.0711669921875|unsuper_loss: 0.0
average reward score: 3.072265625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.72%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
[2023-07-01 08:28:21,878] [INFO] [logging.py:96:log_dist] [Rank 0] step=500, skipped=10, lr=[5.479710400743868e-06, 5.479710400743868e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:28:22,058] [INFO] [timer.py:215:stop] epoch=0/micro_step=500/global_step=500, RunningAvgSamplesPerSec=51.761869176640715, CurrSamplesPerSec=51.35148201386001, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:28:22,217] [INFO] [logging.py:96:log_dist] [Rank 0] step=500, skipped=9, lr=[2.830114271054013e-06, 2.830114271054013e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 499|ppo_ep: 1|act_loss: -0.049957275390625|cri_loss: 0.04779052734375|unsuper_loss: 0.0
average reward score: 3.689453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.59%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 500|ppo_ep: 1|act_loss: -0.16162109375|cri_loss: 0.0325927734375|unsuper_loss: 0.0
average reward score: 3.6015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.69%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 501|ppo_ep: 1|act_loss: -0.043731689453125|cri_loss: 0.054473876953125|unsuper_loss: 0.0
average reward score: 2.8828125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.26%) |Training time=0.80s (31.91%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81
epoch: 0|step: 502|ppo_ep: 1|act_loss: -0.09417724609375|cri_loss: 0.0143280029296875|unsuper_loss: 0.0
average reward score: 3.181640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.38%) |Training time=0.80s (31.79%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 503|ppo_ep: 1|act_loss: -0.12103271484375|cri_loss: 0.0193939208984375|unsuper_loss: 0.0
average reward score: 3.7578125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.34%) |Training time=0.80s (31.89%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 504|ppo_ep: 1|act_loss: -0.06085205078125|cri_loss: 0.01446533203125|unsuper_loss: 0.0
average reward score: 2.8984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.39%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 505|ppo_ep: 1|act_loss: 0.09393310546875|cri_loss: 0.02349853515625|unsuper_loss: 0.0
average reward score: 3.625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.75%) |Training time=0.79s (31.49%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 506|ppo_ep: 1|act_loss: 0.091064453125|cri_loss: 0.04632568359375|unsuper_loss: 0.0
average reward score: 2.390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.42%) |Training time=0.79s (31.74%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 507|ppo_ep: 1|act_loss: 0.041412353515625|cri_loss: 0.016265869140625|unsuper_loss: 0.0
average reward score: 3.16796875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.64%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 508|ppo_ep: 1|act_loss: -0.199951171875|cri_loss: 0.0562744140625|unsuper_loss: 0.0
average reward score: 2.87109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.91%) |Others=0.22 (8.73%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
[2023-07-01 08:28:46,900] [INFO] [logging.py:96:log_dist] [Rank 0] step=510, skipped=10, lr=[5.30345243877873e-06, 5.30345243877873e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:28:47,077] [INFO] [timer.py:215:stop] epoch=0/micro_step=510/global_step=510, RunningAvgSamplesPerSec=51.74410620499772, CurrSamplesPerSec=50.86902841618862, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:28:47,236] [INFO] [logging.py:96:log_dist] [Rank 0] step=510, skipped=9, lr=[2.7387497884095297e-06, 2.7387497884095297e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 509|ppo_ep: 1|act_loss: -0.10919189453125|cri_loss: 0.10418701171875|unsuper_loss: 0.0
average reward score: 3.232421875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.46%) |Training time=0.79s (31.80%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 510|ppo_ep: 1|act_loss: -0.11083984375|cri_loss: 0.10247802734375|unsuper_loss: 0.0
average reward score: 2.951171875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.78s (31.43%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 511|ppo_ep: 1|act_loss: 0.06085205078125|cri_loss: 0.1883544921875|unsuper_loss: 0.0
average reward score: 1.4443359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.53%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 512|ppo_ep: 1|act_loss: -0.00598907470703125|cri_loss: 0.038604736328125|unsuper_loss: 0.0
average reward score: 3.337890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.67%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 513|ppo_ep: 1|act_loss: -0.01195526123046875|cri_loss: 0.038055419921875|unsuper_loss: 0.0
average reward score: 2.54296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.43%) |Training time=0.79s (31.73%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 514|ppo_ep: 1|act_loss: 0.06494140625|cri_loss: 0.052398681640625|unsuper_loss: 0.0
average reward score: 2.95703125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.31%) |Training time=0.80s (31.90%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 515|ppo_ep: 1|act_loss: 0.0782470703125|cri_loss: 0.07891845703125|unsuper_loss: 0.0
average reward score: 1.26953125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.33%) |Training time=0.80s (31.82%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 516|ppo_ep: 1|act_loss: 0.11248779296875|cri_loss: 0.177001953125|unsuper_loss: 0.0
average reward score: 1.86328125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.72%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 517|ppo_ep: 1|act_loss: -0.0275726318359375|cri_loss: 0.10992431640625|unsuper_loss: 0.0
average reward score: 2.41796875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.60%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 518|ppo_ep: 1|act_loss: -0.1278076171875|cri_loss: 0.208251953125|unsuper_loss: 0.0
average reward score: 1.6474609375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.42%) |Others=0.22 (8.91%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
[2023-07-01 08:29:11,882] [INFO] [logging.py:96:log_dist] [Rank 0] step=520, skipped=10, lr=[5.126547075166989e-06, 5.126547075166989e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:29:12,062] [INFO] [timer.py:215:stop] epoch=0/micro_step=520/global_step=520, RunningAvgSamplesPerSec=51.73223575240543, CurrSamplesPerSec=51.60632420793602, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:29:12,221] [INFO] [logging.py:96:log_dist] [Rank 0] step=520, skipped=9, lr=[2.647062249608123e-06, 2.647062249608123e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 519|ppo_ep: 1|act_loss: 0.0419921875|cri_loss: 0.143798828125|unsuper_loss: 0.0
average reward score: 2.224609375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.78s (31.44%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 520|ppo_ep: 1|act_loss: 0.0538330078125|cri_loss: 0.08062744140625|unsuper_loss: 0.0
average reward score: 0.73583984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.49%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 521|ppo_ep: 1|act_loss: -0.0163726806640625|cri_loss: 0.0753173828125|unsuper_loss: 0.0
average reward score: 2.09765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.30%) |Training time=0.80s (31.90%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 522|ppo_ep: 1|act_loss: -0.0465087890625|cri_loss: 0.22265625|unsuper_loss: 0.0
average reward score: 0.1876220703125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.78s (31.47%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 523|ppo_ep: 1|act_loss: 0.180908203125|cri_loss: 0.226318359375|unsuper_loss: 0.0
average reward score: 0.41064453125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.78s (31.48%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 524|ppo_ep: 1|act_loss: 0.133544921875|cri_loss: 0.11236572265625|unsuper_loss: 0.0
average reward score: 0.63427734375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.64%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 525|ppo_ep: 1|act_loss: -0.01505279541015625|cri_loss: 0.07958984375|unsuper_loss: 0.0
average reward score: 0.46533203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.46%) |Training time=0.79s (31.73%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 526|ppo_ep: 1|act_loss: 0.051055908203125|cri_loss: 0.037017822265625|unsuper_loss: 0.0
average reward score: 2.390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.71%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 527|ppo_ep: 1|act_loss: 0.137451171875|cri_loss: 0.061920166015625|unsuper_loss: 0.0
average reward score: 1.20703125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.78%) |Training time=0.78s (31.44%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 528|ppo_ep: 1|act_loss: -0.040985107421875|cri_loss: 0.1102294921875|unsuper_loss: 0.0
average reward score: 0.55859375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.65%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
[2023-07-01 08:29:36,860] [INFO] [logging.py:96:log_dist] [Rank 0] step=530, skipped=10, lr=[4.949233683385321e-06, 4.949233683385321e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:29:37,036] [INFO] [timer.py:215:stop] epoch=0/micro_step=530/global_step=530, RunningAvgSamplesPerSec=51.721183361310175, CurrSamplesPerSec=50.70008552148463, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:29:37,196] [INFO] [logging.py:96:log_dist] [Rank 0] step=530, skipped=9, lr=[2.5551757185248656e-06, 2.5551757185248656e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 529|ppo_ep: 1|act_loss: -0.09222412109375|cri_loss: 0.051910400390625|unsuper_loss: 0.0
average reward score: 1.794921875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.41%) |Training time=0.80s (31.78%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81
epoch: 0|step: 530|ppo_ep: 1|act_loss: -0.02020263671875|cri_loss: 0.04315185546875|unsuper_loss: 0.0
average reward score: -0.61376953125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.53%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 531|ppo_ep: 1|act_loss: 0.00592803955078125|cri_loss: 0.080078125|unsuper_loss: 0.0
average reward score: 0.5224609375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.73%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 532|ppo_ep: 1|act_loss: 0.01611328125|cri_loss: 0.11322021484375|unsuper_loss: 0.0
average reward score: -1.646484375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.64%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 533|ppo_ep: 1|act_loss: 0.01515960693359375|cri_loss: 0.08782958984375|unsuper_loss: 0.0
average reward score: -1.4111328125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.73%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 534|ppo_ep: 1|act_loss: -0.04595947265625|cri_loss: 0.1668701171875|unsuper_loss: 0.0
average reward score: -2.87109375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.34%) |Training time=0.80s (31.88%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 535|ppo_ep: 1|act_loss: 0.07501220703125|cri_loss: 0.095458984375|unsuper_loss: 0.0
average reward score: -1.78515625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.33%) |Training time=0.80s (31.86%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 536|ppo_ep: 1|act_loss: -0.032012939453125|cri_loss: 0.12078857421875|unsuper_loss: 0.0
average reward score: -2.419921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.32%) |Training time=0.80s (31.87%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
[2023-07-01 08:29:56,856] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 537|ppo_ep: 1|act_loss: -0.0178680419921875|cri_loss: 0.16845703125|unsuper_loss: 0.0
average reward score: -2.171875
-------------------------------------------------------------------------------------
|E2E latency=2.31s |Gather latency=0.00s (0.00%) |Generate time=1.48s (64.27%) |Training time=0.61s (26.22%) |Others=0.22 (9.51%)|CurSamplesPerSec=13.86 |AvgSamplesPerSec=12.81
[2023-07-01 08:29:59,164] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048
epoch: 0|step: 538|ppo_ep: 1|act_loss: 0.0302734375|cri_loss: 0.128173828125|unsuper_loss: 0.0
average reward score: -0.40087890625
-------------------------------------------------------------------------------------
|E2E latency=2.31s |Gather latency=0.00s (0.00%) |Generate time=1.48s (64.29%) |Training time=0.60s (26.16%) |Others=0.22 (9.55%)|CurSamplesPerSec=13.86 |AvgSamplesPerSec=12.81
[2023-07-01 08:30:01,484] [INFO] [logging.py:96:log_dist] [Rank 0] step=540, skipped=12, lr=[4.807250409408546e-06, 4.807250409408546e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:30:01,660] [INFO] [timer.py:215:stop] epoch=0/micro_step=540/global_step=540, RunningAvgSamplesPerSec=51.76420263166127, CurrSamplesPerSec=51.31545507585044, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:30:01,820] [INFO] [logging.py:96:log_dist] [Rank 0] step=540, skipped=9, lr=[2.46321452829447e-06, 2.46321452829447e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 539|ppo_ep: 1|act_loss: -0.033111572265625|cri_loss: 0.10748291015625|unsuper_loss: 0.0
average reward score: -1.4599609375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.60%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 540|ppo_ep: 1|act_loss: -0.1107177734375|cri_loss: 0.1082763671875|unsuper_loss: 0.0
average reward score: -3.384765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.79s (31.45%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 541|ppo_ep: 1|act_loss: 0.06768798828125|cri_loss: 0.0693359375|unsuper_loss: 0.0
average reward score: -3.0234375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.86%) |Training time=0.78s (31.35%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 542|ppo_ep: 1|act_loss: 0.248779296875|cri_loss: 0.07879638671875|unsuper_loss: 0.0
average reward score: -3.58984375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.78s (31.49%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 543|ppo_ep: 1|act_loss: 0.19091796875|cri_loss: 0.054412841796875|unsuper_loss: 0.0
average reward score: -1.0712890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.69%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 544|ppo_ep: 1|act_loss: 0.263671875|cri_loss: 0.1318359375|unsuper_loss: 0.0
average reward score: -2.3125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.83%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 545|ppo_ep: 1|act_loss: 0.1492919921875|cri_loss: 0.08697509765625|unsuper_loss: 0.0
average reward score: -3.083984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.76%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 546|ppo_ep: 1|act_loss: 0.1226806640625|cri_loss: 0.06744384765625|unsuper_loss: 0.0
average reward score: -2.255859375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.64%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 547|ppo_ep: 1|act_loss: 0.0078277587890625|cri_loss: 0.0673828125|unsuper_loss: 0.0
average reward score: -1.623046875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.26%) |Training time=0.80s (31.94%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 548|ppo_ep: 1|act_loss: -0.004375457763671875|cri_loss: 0.05511474609375|unsuper_loss: 0.0
average reward score: -0.75244140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.36%) |Training time=0.80s (31.83%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
[2023-07-01 08:30:26,468] [INFO] [logging.py:96:log_dist] [Rank 0] step=550, skipped=12, lr=[4.629807343170943e-06, 4.629807343170943e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:30:26,649] [INFO] [timer.py:215:stop] epoch=0/micro_step=550/global_step=550, RunningAvgSamplesPerSec=51.75146350786166, CurrSamplesPerSec=50.7776535196902, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:30:26,810] [INFO] [logging.py:96:log_dist] [Rank 0] step=550, skipped=9, lr=[2.371303113074134e-06, 2.371303113074134e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 549|ppo_ep: 1|act_loss: -0.1824951171875|cri_loss: 0.08197021484375|unsuper_loss: 0.0
average reward score: -2.2578125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.80s (31.73%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.81
epoch: 0|step: 550|ppo_ep: 1|act_loss: 0.0292816162109375|cri_loss: 0.09417724609375|unsuper_loss: 0.0
average reward score: -1.544921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.79s (31.45%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 551|ppo_ep: 1|act_loss: -0.08197021484375|cri_loss: 0.1343994140625|unsuper_loss: 0.0
average reward score: -1.4375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.84%) |Training time=0.78s (31.33%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 552|ppo_ep: 1|act_loss: -0.07244873046875|cri_loss: 0.03155517578125|unsuper_loss: 0.0
average reward score: -2.021484375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.53%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
[2023-07-01 08:30:36,466] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024
epoch: 0|step: 553|ppo_ep: 1|act_loss: -0.174560546875|cri_loss: 0.0902099609375|unsuper_loss: 0.0
average reward score: -1.0419921875
-------------------------------------------------------------------------------------
|E2E latency=2.32s |Gather latency=0.00s (0.00%) |Generate time=1.49s (64.23%) |Training time=0.61s (26.24%) |Others=0.22 (9.53%)|CurSamplesPerSec=13.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 554|ppo_ep: 1|act_loss: -0.2220458984375|cri_loss: 0.0888671875|unsuper_loss: 0.0
average reward score: -0.7626953125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.75%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 555|ppo_ep: 1|act_loss: -0.14794921875|cri_loss: 0.094482421875|unsuper_loss: 0.0
average reward score: -0.76318359375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.78s (31.51%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 556|ppo_ep: 1|act_loss: -0.0026340484619140625|cri_loss: 0.06329345703125|unsuper_loss: 0.0
average reward score: -0.335205078125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.44%) |Training time=0.79s (31.77%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 557|ppo_ep: 1|act_loss: 0.051513671875|cri_loss: 0.0293426513671875|unsuper_loss: 0.0
average reward score: -0.184814453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.38%) |Training time=0.80s (31.83%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 558|ppo_ep: 1|act_loss: 0.1036376953125|cri_loss: 0.06988525390625|unsuper_loss: 0.0
average reward score: -0.689453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.40%) |Training time=0.79s (31.80%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
[2023-07-01 08:30:51,273] [INFO] [logging.py:96:log_dist] [Rank 0] step=560, skipped=13, lr=[4.4703275677370524e-06, 4.4703275677370524e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:30:51,448] [INFO] [timer.py:215:stop] epoch=0/micro_step=560/global_step=560, RunningAvgSamplesPerSec=51.76853926361346, CurrSamplesPerSec=51.16896413689512, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:30:51,607] [INFO] [logging.py:96:log_dist] [Rank 0] step=560, skipped=9, lr=[2.279565839669693e-06, 2.279565839669693e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 559|ppo_ep: 1|act_loss: 0.09820556640625|cri_loss: 0.020111083984375|unsuper_loss: 0.0
average reward score: -0.248291015625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.66%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 560|ppo_ep: 1|act_loss: 0.057952880859375|cri_loss: 0.020294189453125|unsuper_loss: 0.0
average reward score: 0.00335693359375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.86%) |Training time=0.78s (31.32%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.81
epoch: 0|step: 561|ppo_ep: 1|act_loss: -0.055572509765625|cri_loss: 0.0701904296875|unsuper_loss: 0.0
average reward score: -0.8681640625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.72%) |Training time=0.78s (31.46%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 562|ppo_ep: 1|act_loss: -0.043365478515625|cri_loss: 0.030242919921875|unsuper_loss: 0.0
average reward score: -0.0408935546875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.50%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 563|ppo_ep: 1|act_loss: -0.241943359375|cri_loss: 0.09368896484375|unsuper_loss: 0.0
average reward score: -1.150390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.52%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 564|ppo_ep: 1|act_loss: 0.0892333984375|cri_loss: 0.0275421142578125|unsuper_loss: 0.0
average reward score: -1.296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.79s (31.46%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 565|ppo_ep: 1|act_loss: 0.1240234375|cri_loss: 0.0247955322265625|unsuper_loss: 0.0
average reward score: -0.0386962890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.56%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 566|ppo_ep: 1|act_loss: 0.11968994140625|cri_loss: 0.033172607421875|unsuper_loss: 0.0
average reward score: 0.352783203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.44%) |Training time=0.79s (31.70%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 567|ppo_ep: 1|act_loss: 0.0838623046875|cri_loss: 0.0297698974609375|unsuper_loss: 0.0
average reward score: 0.21728515625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.46%) |Training time=0.79s (31.67%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 568|ppo_ep: 1|act_loss: 0.11962890625|cri_loss: 0.04913330078125|unsuper_loss: 0.0
average reward score: -0.8125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.69%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
[2023-07-01 08:31:16,233] [INFO] [logging.py:96:log_dist] [Rank 0] step=570, skipped=13, lr=[4.293591324008047e-06, 4.293591324008047e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:31:16,412] [INFO] [timer.py:215:stop] epoch=0/micro_step=570/global_step=570, RunningAvgSamplesPerSec=51.7616068490536, CurrSamplesPerSec=51.40755675533235, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:31:16,572] [INFO] [logging.py:96:log_dist] [Rank 0] step=570, skipped=9, lr=[2.1881268392529074e-06, 2.1881268392529074e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 569|ppo_ep: 1|act_loss: 0.006191253662109375|cri_loss: 0.026611328125|unsuper_loss: 0.0
average reward score: 0.263916015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.48%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 570|ppo_ep: 1|act_loss: -0.060089111328125|cri_loss: 0.077392578125|unsuper_loss: 0.0
average reward score: 0.0289306640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.74%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 571|ppo_ep: 1|act_loss: -0.1455078125|cri_loss: 0.08050537109375|unsuper_loss: 0.0
average reward score: 0.332275390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.35%) |Training time=0.80s (31.87%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 572|ppo_ep: 1|act_loss: -0.07379150390625|cri_loss: 0.061767578125|unsuper_loss: 0.0
average reward score: 1.40234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.69%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 573|ppo_ep: 1|act_loss: -0.032318115234375|cri_loss: 0.0406494140625|unsuper_loss: 0.0
average reward score: -0.324462890625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.41%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 574|ppo_ep: 1|act_loss: 0.0238494873046875|cri_loss: 0.0222015380859375|unsuper_loss: 0.0
average reward score: -1.2275390625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (60.06%) |Training time=0.77s (31.13%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.87 |AvgSamplesPerSec=12.81
epoch: 0|step: 575|ppo_ep: 1|act_loss: 0.0130462646484375|cri_loss: 0.0172271728515625|unsuper_loss: 0.0
average reward score: -1.2734375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.83%) |Training time=0.78s (31.31%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 576|ppo_ep: 1|act_loss: -0.06475830078125|cri_loss: 0.1422119140625|unsuper_loss: 0.0
average reward score: 1.521484375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.73%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 577|ppo_ep: 1|act_loss: -0.08251953125|cri_loss: 0.0236663818359375|unsuper_loss: 0.0
average reward score: 0.89404296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.80s (31.78%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 578|ppo_ep: 1|act_loss: -0.1185302734375|cri_loss: 0.053955078125|unsuper_loss: 0.0
average reward score: 1.5
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.66%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
[2023-07-01 08:31:41,208] [INFO] [logging.py:96:log_dist] [Rank 0] step=580, skipped=13, lr=[4.117574137857126e-06, 4.117574137857126e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:31:41,384] [INFO] [timer.py:215:stop] epoch=0/micro_step=580/global_step=580, RunningAvgSamplesPerSec=51.75262658340682, CurrSamplesPerSec=51.67767707566083, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:31:41,545] [INFO] [logging.py:96:log_dist] [Rank 0] step=580, skipped=9, lr=[2.097109839397588e-06, 2.097109839397588e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 579|ppo_ep: 1|act_loss: -0.0074310302734375|cri_loss: 0.007106781005859375|unsuper_loss: 0.0
average reward score: 0.39892578125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.78s (31.47%) |Others=0.22 (8.90%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 580|ppo_ep: 1|act_loss: -0.05609130859375|cri_loss: 0.061798095703125|unsuper_loss: 0.0
average reward score: 1.068359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.73%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 581|ppo_ep: 1|act_loss: 0.018035888671875|cri_loss: 0.020660400390625|unsuper_loss: 0.0
average reward score: -0.53759765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.73%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 582|ppo_ep: 1|act_loss: 0.048309326171875|cri_loss: 0.009765625|unsuper_loss: 0.0
average reward score: 1.193359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.47%) |Training time=0.79s (31.71%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 583|ppo_ep: 1|act_loss: 0.061309814453125|cri_loss: 0.02166748046875|unsuper_loss: 0.0
average reward score: 1.8447265625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.35%) |Training time=0.80s (31.86%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 584|ppo_ep: 1|act_loss: -0.0028705596923828125|cri_loss: 0.01517486572265625|unsuper_loss: 0.0
average reward score: 0.37451171875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.53%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 585|ppo_ep: 1|act_loss: -0.009521484375|cri_loss: 0.024169921875|unsuper_loss: 0.0
average reward score: 0.9423828125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.63%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 586|ppo_ep: 1|act_loss: -0.07073974609375|cri_loss: 0.024169921875|unsuper_loss: 0.0
average reward score: 1.3076171875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.54%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 587|ppo_ep: 1|act_loss: -0.00460052490234375|cri_loss: 0.023193359375|unsuper_loss: 0.0
average reward score: -1.7431640625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.98%) |Training time=0.78s (31.22%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 588|ppo_ep: 1|act_loss: 0.040557861328125|cri_loss: 0.0224761962890625|unsuper_loss: 0.0
average reward score: -0.30126953125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.81%) |Training time=0.78s (31.37%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.81
[2023-07-01 08:32:06,189] [INFO] [logging.py:96:log_dist] [Rank 0] step=590, skipped=13, lr=[3.94251418095384e-06, 3.94251418095384e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:32:06,364] [INFO] [timer.py:215:stop] epoch=0/micro_step=590/global_step=590, RunningAvgSamplesPerSec=51.74402132404825, CurrSamplesPerSec=51.91287524531589, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:32:06,525] [INFO] [logging.py:96:log_dist] [Rank 0] step=590, skipped=9, lr=[2.0066379966618336e-06, 2.0066379966618336e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 589|ppo_ep: 1|act_loss: 0.0209197998046875|cri_loss: 0.01690673828125|unsuper_loss: 0.0
average reward score: 1.0419921875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.79%) |Training time=0.78s (31.36%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 590|ppo_ep: 1|act_loss: -0.031829833984375|cri_loss: 0.0240020751953125|unsuper_loss: 0.0
average reward score: 1.7119140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.60%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 591|ppo_ep: 1|act_loss: -0.004337310791015625|cri_loss: 0.0107879638671875|unsuper_loss: 0.0
average reward score: -0.9765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.74%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 592|ppo_ep: 1|act_loss: -0.050384521484375|cri_loss: 0.0309600830078125|unsuper_loss: 0.0
average reward score: -0.552734375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.78s (31.45%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 593|ppo_ep: 1|act_loss: 0.0184478759765625|cri_loss: 0.04168701171875|unsuper_loss: 0.0
average reward score: 0.400390625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.54%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 594|ppo_ep: 1|act_loss: -0.06683349609375|cri_loss: 0.040008544921875|unsuper_loss: 0.0
average reward score: 1.2734375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.63%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 595|ppo_ep: 1|act_loss: -0.0767822265625|cri_loss: 0.11517333984375|unsuper_loss: 0.0
average reward score: 0.401123046875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.62%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 596|ppo_ep: 1|act_loss: -0.2257080078125|cri_loss: 0.1441650390625|unsuper_loss: 0.0
average reward score: 0.04931640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.84%) |Training time=0.78s (31.32%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
[2023-07-01 08:32:26,493] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 597|ppo_ep: 1|act_loss: -0.38134765625|cri_loss: 0.15234375|unsuper_loss: 0.0
average reward score: 0.12054443359375
-------------------------------------------------------------------------------------
|E2E latency=2.45s |Gather latency=0.00s (0.00%) |Generate time=1.49s (60.90%) |Training time=0.79s (32.05%) |Others=0.17 (7.05%)|CurSamplesPerSec=13.05 |AvgSamplesPerSec=12.81
[2023-07-01 08:32:28,949] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192
epoch: 0|step: 598|ppo_ep: 1|act_loss: -0.5107421875|cri_loss: 0.173828125|unsuper_loss: 0.0
average reward score: 0.1925048828125
-------------------------------------------------------------------------------------
|E2E latency=2.46s |Gather latency=0.00s (0.00%) |Generate time=1.49s (60.78%) |Training time=0.79s (32.16%) |Others=0.17 (7.06%)|CurSamplesPerSec=13.03 |AvgSamplesPerSec=12.81
[2023-07-01 08:32:31,075] [INFO] [logging.py:96:log_dist] [Rank 0] step=600, skipped=13, lr=[3.7686483297255346e-06, 3.7686483297255346e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:32:31,255] [INFO] [timer.py:215:stop] epoch=0/micro_step=600/global_step=600, RunningAvgSamplesPerSec=51.73574276791265, CurrSamplesPerSec=50.58682692507015, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:32:31,416] [INFO] [logging.py:96:log_dist] [Rank 0] step=600, skipped=11, lr=[1.9347353301195425e-06, 1.9347353301195425e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 599|ppo_ep: 1|act_loss: -0.2230224609375|cri_loss: 0.0523681640625|unsuper_loss: 0.0
average reward score: 0.7177734375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.38%) |Training time=0.80s (31.79%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.81
epoch: 0|step: 600|ppo_ep: 1|act_loss: 0.050506591796875|cri_loss: 0.035491943359375|unsuper_loss: 0.0
average reward score: 1.2216796875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.41%) |Training time=0.79s (31.75%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 601|ppo_ep: 1|act_loss: 0.11126708984375|cri_loss: 0.03668212890625|unsuper_loss: 0.0
average reward score: 0.359619140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.70%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 602|ppo_ep: 1|act_loss: 0.14208984375|cri_loss: 0.0264892578125|unsuper_loss: 0.0
average reward score: 0.19873046875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.65%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 603|ppo_ep: 1|act_loss: 0.08056640625|cri_loss: 0.0195159912109375|unsuper_loss: 0.0
average reward score: -1.771484375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.68%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 604|ppo_ep: 1|act_loss: 0.0020618438720703125|cri_loss: 0.038360595703125|unsuper_loss: 0.0
average reward score: 0.88525390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.88%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 605|ppo_ep: 1|act_loss: 0.029327392578125|cri_loss: 0.018585205078125|unsuper_loss: 0.0
average reward score: -0.71630859375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.75%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 606|ppo_ep: 1|act_loss: -0.181396484375|cri_loss: 0.08489990234375|unsuper_loss: 0.0
average reward score: 1.130859375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.67%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 607|ppo_ep: 1|act_loss: -0.1104736328125|cri_loss: 0.025054931640625|unsuper_loss: 0.0
average reward score: 0.64111328125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.65%) |Training time=0.79s (31.57%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 608|ppo_ep: 1|act_loss: -0.138916015625|cri_loss: 0.032257080078125|unsuper_loss: 0.0
average reward score: -0.103759765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.53%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
[2023-07-01 08:32:56,060] [INFO] [logging.py:96:log_dist] [Rank 0] step=610, skipped=13, lr=[3.596211844836072e-06, 3.596211844836072e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:32:56,236] [INFO] [timer.py:215:stop] epoch=0/micro_step=610/global_step=610, RunningAvgSamplesPerSec=51.72506665688101, CurrSamplesPerSec=51.55987512019124, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:32:56,397] [INFO] [logging.py:96:log_dist] [Rank 0] step=610, skipped=11, lr=[1.8455526643329995e-06, 1.8455526643329995e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 609|ppo_ep: 1|act_loss: -0.1690673828125|cri_loss: 0.05035400390625|unsuper_loss: 0.0
average reward score: 0.223876953125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.69%) |Training time=0.79s (31.48%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 610|ppo_ep: 1|act_loss: -0.2095947265625|cri_loss: 0.0433349609375|unsuper_loss: 0.0
average reward score: 1.4287109375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.39%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 611|ppo_ep: 1|act_loss: -0.07135009765625|cri_loss: 0.0251007080078125|unsuper_loss: 0.0
average reward score: 0.0029296875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.84%) |Training time=0.78s (31.35%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 612|ppo_ep: 1|act_loss: 0.01320648193359375|cri_loss: 0.02313232421875|unsuper_loss: 0.0
average reward score: 1.3203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.38%) |Training time=0.79s (31.74%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 613|ppo_ep: 1|act_loss: 0.10357666015625|cri_loss: 0.057403564453125|unsuper_loss: 0.0
average reward score: 1.4208984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.62%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 614|ppo_ep: 1|act_loss: 0.1279296875|cri_loss: 0.058258056640625|unsuper_loss: 0.0
average reward score: 2.5390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.70%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 615|ppo_ep: 1|act_loss: -0.004261016845703125|cri_loss: 0.016510009765625|unsuper_loss: 0.0
average reward score: 1.955078125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.46%) |Training time=0.79s (31.74%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 616|ppo_ep: 1|act_loss: -0.0277099609375|cri_loss: 0.05364990234375|unsuper_loss: 0.0
average reward score: 2.4296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.70%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 617|ppo_ep: 1|act_loss: 0.038665771484375|cri_loss: 0.0241241455078125|unsuper_loss: 0.0
average reward score: 1.7587890625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.31%) |Training time=0.80s (31.83%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81
epoch: 0|step: 618|ppo_ep: 1|act_loss: -0.037811279296875|cri_loss: 0.0217742919921875|unsuper_loss: 0.0
average reward score: 2.966796875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.27%) |Training time=0.80s (31.90%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
[2023-07-01 08:33:21,074] [INFO] [logging.py:96:log_dist] [Rank 0] step=620, skipped=13, lr=[3.4254380528508618e-06, 3.4254380528508618e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:33:21,253] [INFO] [timer.py:215:stop] epoch=0/micro_step=620/global_step=620, RunningAvgSamplesPerSec=51.71367358422307, CurrSamplesPerSec=50.988746347777095, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:33:21,412] [INFO] [logging.py:96:log_dist] [Rank 0] step=620, skipped=11, lr=[1.7572555417026524e-06, 1.7572555417026524e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 619|ppo_ep: 1|act_loss: -0.12939453125|cri_loss: 0.0193328857421875|unsuper_loss: 0.0
average reward score: 2.59765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.65%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 620|ppo_ep: 1|act_loss: -0.05157470703125|cri_loss: 0.01513671875|unsuper_loss: 0.0
average reward score: 3.564453125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.02%) |Training time=0.78s (31.17%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 621|ppo_ep: 1|act_loss: 0.005176544189453125|cri_loss: 0.01505279541015625|unsuper_loss: 0.0
average reward score: 2.03515625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.78s (31.48%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 622|ppo_ep: 1|act_loss: -0.038787841796875|cri_loss: 0.0270538330078125|unsuper_loss: 0.0
average reward score: 2.75
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.54%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 623|ppo_ep: 1|act_loss: -0.01690673828125|cri_loss: 0.0223388671875|unsuper_loss: 0.0
average reward score: 3.76171875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.75%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 624|ppo_ep: 1|act_loss: 0.04852294921875|cri_loss: 0.0247039794921875|unsuper_loss: 0.0
average reward score: 2.546875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.63%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 625|ppo_ep: 1|act_loss: 0.048858642578125|cri_loss: 0.04608154296875|unsuper_loss: 0.0
average reward score: 3.525390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.72%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 626|ppo_ep: 1|act_loss: 0.019012451171875|cri_loss: 0.038665771484375|unsuper_loss: 0.0
average reward score: 3.388671875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.42%) |Training time=0.79s (31.78%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 627|ppo_ep: 1|act_loss: 0.0340576171875|cri_loss: 0.07501220703125|unsuper_loss: 0.0
average reward score: 1.6201171875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.56%) |Training time=0.79s (31.61%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 628|ppo_ep: 1|act_loss: -0.0972900390625|cri_loss: 0.046966552734375|unsuper_loss: 0.0
average reward score: 3.111328125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.34%) |Training time=0.80s (31.82%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
[2023-07-01 08:33:46,043] [INFO] [logging.py:96:log_dist] [Rank 0] step=630, skipped=13, lr=[3.256558030518954e-06, 3.256558030518954e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:33:46,223] [INFO] [timer.py:215:stop] epoch=0/micro_step=630/global_step=630, RunningAvgSamplesPerSec=51.706242726652974, CurrSamplesPerSec=51.18045668138959, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:33:46,382] [INFO] [logging.py:96:log_dist] [Rank 0] step=630, skipped=11, lr=[1.6699634384772317e-06, 1.6699634384772317e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 629|ppo_ep: 1|act_loss: -0.159423828125|cri_loss: 0.06622314453125|unsuper_loss: 0.0
average reward score: 3.0390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.68%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 630|ppo_ep: 1|act_loss: -0.1016845703125|cri_loss: 0.04388427734375|unsuper_loss: 0.0
average reward score: 1.5244140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.82%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 631|ppo_ep: 1|act_loss: -0.0015020370483398438|cri_loss: 0.0682373046875|unsuper_loss: 0.0
average reward score: 0.505859375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.61%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 632|ppo_ep: 1|act_loss: 0.04833984375|cri_loss: 0.06500244140625|unsuper_loss: 0.0
average reward score: 1.50390625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.57%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 633|ppo_ep: 1|act_loss: 0.046966552734375|cri_loss: 0.0240020751953125|unsuper_loss: 0.0
average reward score: 0.021728515625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.56%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 634|ppo_ep: 1|act_loss: 0.1358642578125|cri_loss: 0.053955078125|unsuper_loss: 0.0
average reward score: 0.98583984375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.83%) |Training time=0.78s (31.30%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 635|ppo_ep: 1|act_loss: 0.0301666259765625|cri_loss: 0.035614013671875|unsuper_loss: 0.0
average reward score: 1.46484375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.85%) |Training time=0.78s (31.30%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 636|ppo_ep: 1|act_loss: 0.164794921875|cri_loss: 0.11541748046875|unsuper_loss: 0.0
average reward score: 1.4052734375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.65%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 637|ppo_ep: 1|act_loss: 0.07781982421875|cri_loss: 0.08154296875|unsuper_loss: 0.0
average reward score: 2.16796875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.49%) |Training time=0.79s (31.70%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 638|ppo_ep: 1|act_loss: 0.10162353515625|cri_loss: 0.0595703125|unsuper_loss: 0.0
average reward score: 1.634765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.78%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
[2023-07-01 08:34:11,045] [INFO] [logging.py:96:log_dist] [Rank 0] step=640, skipped=13, lr=[3.0898002920993932e-06, 3.0898002920993932e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:34:11,221] [INFO] [timer.py:215:stop] epoch=0/micro_step=640/global_step=640, RunningAvgSamplesPerSec=51.69824265860959, CurrSamplesPerSec=50.96394492368189, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:34:11,380] [INFO] [logging.py:96:log_dist] [Rank 0] step=640, skipped=11, lr=[1.5837944709976382e-06, 1.5837944709976382e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 639|ppo_ep: 1|act_loss: 0.2626953125|cri_loss: 0.1829833984375|unsuper_loss: 0.0
average reward score: 1.0966796875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.75%) |Others=0.22 (8.73%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 640|ppo_ep: 1|act_loss: -0.033477783203125|cri_loss: 0.1353759765625|unsuper_loss: 0.0
average reward score: -1.376953125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.64%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 641|ppo_ep: 1|act_loss: -0.066650390625|cri_loss: 0.2049560546875|unsuper_loss: 0.0
average reward score: 1.287109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.63%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 642|ppo_ep: 1|act_loss: -0.2491455078125|cri_loss: 0.13037109375|unsuper_loss: 0.0
average reward score: 1.34375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.48%) |Training time=0.79s (31.70%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 643|ppo_ep: 1|act_loss: -0.038726806640625|cri_loss: 0.0667724609375|unsuper_loss: 0.0
average reward score: 2.1171875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.78s (31.48%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 644|ppo_ep: 1|act_loss: 0.0672607421875|cri_loss: 0.076416015625|unsuper_loss: 0.0
average reward score: 2.423828125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.55%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 645|ppo_ep: 1|act_loss: 0.1790771484375|cri_loss: 0.06671142578125|unsuper_loss: 0.0
average reward score: 0.13232421875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.40%) |Training time=0.80s (31.79%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.81
epoch: 0|step: 646|ppo_ep: 1|act_loss: 0.0909423828125|cri_loss: 0.0675048828125|unsuper_loss: 0.0
average reward score: 2.146484375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.34%) |Training time=0.80s (31.82%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.74 |AvgSamplesPerSec=12.81
epoch: 0|step: 647|ppo_ep: 1|act_loss: -0.008148193359375|cri_loss: 0.0309906005859375|unsuper_loss: 0.0
average reward score: 1.6806640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.70%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 648|ppo_ep: 1|act_loss: -0.0278167724609375|cri_loss: 0.061737060546875|unsuper_loss: 0.0
average reward score: 2.08203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.41%) |Training time=0.79s (31.73%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
[2023-07-01 08:34:36,046] [INFO] [logging.py:96:log_dist] [Rank 0] step=650, skipped=13, lr=[2.9253904801549233e-06, 2.9253904801549233e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:34:36,226] [INFO] [timer.py:215:stop] epoch=0/micro_step=650/global_step=650, RunningAvgSamplesPerSec=51.68767145837992, CurrSamplesPerSec=50.93491476359568, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:34:36,386] [INFO] [logging.py:96:log_dist] [Rank 0] step=650, skipped=11, lr=[1.4988652358718336e-06, 1.4988652358718336e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 649|ppo_ep: 1|act_loss: -0.035491943359375|cri_loss: 0.043975830078125|unsuper_loss: 0.0
average reward score: -0.086669921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.74%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 650|ppo_ep: 1|act_loss: -0.12213134765625|cri_loss: 0.0364990234375|unsuper_loss: 0.0
average reward score: 1.162109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.39%) |Training time=0.79s (31.80%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 651|ppo_ep: 1|act_loss: -0.0167694091796875|cri_loss: 0.028594970703125|unsuper_loss: 0.0
average reward score: 2.908203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.34%) |Training time=0.80s (31.85%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 652|ppo_ep: 1|act_loss: -0.0229644775390625|cri_loss: 0.07305908203125|unsuper_loss: 0.0
average reward score: 1.5966796875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.75%) |Others=0.22 (8.73%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 653|ppo_ep: 1|act_loss: 0.06427001953125|cri_loss: 0.035614013671875|unsuper_loss: 0.0
average reward score: 1.6513671875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.46%) |Training time=0.79s (31.79%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 654|ppo_ep: 1|act_loss: 0.06268310546875|cri_loss: 0.045928955078125|unsuper_loss: 0.0
average reward score: 2.173828125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.78s (31.47%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.81
epoch: 0|step: 655|ppo_ep: 1|act_loss: -0.005828857421875|cri_loss: 0.01739501953125|unsuper_loss: 0.0
average reward score: 1.564453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.64%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 656|ppo_ep: 1|act_loss: 0.08050537109375|cri_loss: 0.0489501953125|unsuper_loss: 0.0
average reward score: 2.177734375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.78s (31.44%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 657|ppo_ep: 1|act_loss: 0.0211639404296875|cri_loss: 0.065185546875|unsuper_loss: 0.0
average reward score: 2.453125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.84%) |Training time=0.78s (31.39%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 658|ppo_ep: 1|act_loss: -0.07470703125|cri_loss: 0.0252685546875|unsuper_loss: 0.0
average reward score: 1.8779296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.58%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
[2023-07-01 08:35:01,025] [INFO] [logging.py:96:log_dist] [Rank 0] step=660, skipped=13, lr=[2.763551060231423e-06, 2.763551060231423e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:35:01,200] [INFO] [timer.py:215:stop] epoch=0/micro_step=660/global_step=660, RunningAvgSamplesPerSec=51.68010101183225, CurrSamplesPerSec=51.22921588048604, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:35:01,359] [INFO] [logging.py:96:log_dist] [Rank 0] step=660, skipped=11, lr=[1.415290652206105e-06, 1.415290652206105e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 659|ppo_ep: 1|act_loss: -0.10577392578125|cri_loss: 0.041168212890625|unsuper_loss: 0.0
average reward score: 2.841796875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.65%) |Training time=0.79s (31.59%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 660|ppo_ep: 1|act_loss: -0.1065673828125|cri_loss: 0.20361328125|unsuper_loss: 0.0
average reward score: 1.41015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.62%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 661|ppo_ep: 1|act_loss: -0.040008544921875|cri_loss: 0.035797119140625|unsuper_loss: 0.0
average reward score: 2.140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.66%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 662|ppo_ep: 1|act_loss: 0.06243896484375|cri_loss: 0.037078857421875|unsuper_loss: 0.0
average reward score: 2.044921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.85%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 663|ppo_ep: 1|act_loss: -0.16162109375|cri_loss: 0.040679931640625|unsuper_loss: 0.0
average reward score: 2.998046875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.47%) |Training time=0.79s (31.66%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 664|ppo_ep: 1|act_loss: -0.10235595703125|cri_loss: 0.041961669921875|unsuper_loss: 0.0
average reward score: 2.595703125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.59%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 665|ppo_ep: 1|act_loss: -0.05584716796875|cri_loss: 0.036163330078125|unsuper_loss: 0.0
average reward score: 1.3466796875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.24%) |Training time=0.80s (31.92%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.81
epoch: 0|step: 666|ppo_ep: 1|act_loss: -0.049957275390625|cri_loss: 0.078369140625|unsuper_loss: 0.0
average reward score: 2.7578125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.24%) |Training time=0.80s (31.96%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 667|ppo_ep: 1|act_loss: -0.0160369873046875|cri_loss: 0.03741455078125|unsuper_loss: 0.0
average reward score: 3.44921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.50%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 668|ppo_ep: 1|act_loss: -0.00972747802734375|cri_loss: 0.017547607421875|unsuper_loss: 0.0
average reward score: 1.66015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.64%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
[2023-07-01 08:35:26,043] [INFO] [logging.py:96:log_dist] [Rank 0] step=670, skipped=13, lr=[2.604501019836226e-06, 2.604501019836226e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:35:26,219] [INFO] [timer.py:215:stop] epoch=0/micro_step=670/global_step=670, RunningAvgSamplesPerSec=51.66980695082694, CurrSamplesPerSec=51.67015695720682, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:35:26,378] [INFO] [logging.py:96:log_dist] [Rank 0] step=670, skipped=11, lr=[1.3331838061061835e-06, 1.3331838061061835e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 669|ppo_ep: 1|act_loss: -0.05511474609375|cri_loss: 0.019744873046875|unsuper_loss: 0.0
average reward score: 3.0
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.79%) |Training time=0.78s (31.43%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 670|ppo_ep: 1|act_loss: -0.00949859619140625|cri_loss: 0.0129547119140625|unsuper_loss: 0.0
average reward score: 2.0
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.79s (31.46%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 671|ppo_ep: 1|act_loss: -0.038421630859375|cri_loss: 0.030487060546875|unsuper_loss: 0.0
average reward score: 2.3203125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.52%) |Training time=0.79s (31.72%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 672|ppo_ep: 1|act_loss: -0.03607177734375|cri_loss: 0.0133514404296875|unsuper_loss: 0.0
average reward score: 2.83984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.66%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 673|ppo_ep: 1|act_loss: -0.012359619140625|cri_loss: 0.035400390625|unsuper_loss: 0.0
average reward score: 3.076171875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.39%) |Training time=0.80s (31.83%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 674|ppo_ep: 1|act_loss: -0.0247650146484375|cri_loss: 0.0128631591796875|unsuper_loss: 0.0
average reward score: 2.43359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.60%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 675|ppo_ep: 1|act_loss: 0.0275726318359375|cri_loss: 0.01461029052734375|unsuper_loss: 0.0
average reward score: 0.9697265625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.66%) |Training time=0.79s (31.56%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.81
epoch: 0|step: 676|ppo_ep: 1|act_loss: -0.0261077880859375|cri_loss: 0.01190185546875|unsuper_loss: 0.0
average reward score: 2.79296875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.50%) |Training time=0.79s (31.68%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 677|ppo_ep: 1|act_loss: 0.0261383056640625|cri_loss: 0.0257110595703125|unsuper_loss: 0.0
average reward score: 3.203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.75%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 678|ppo_ep: 1|act_loss: 0.054046630859375|cri_loss: 0.0160675048828125|unsuper_loss: 0.0
average reward score: 2.572265625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.28%) |Training time=0.80s (31.90%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81
[2023-07-01 08:35:51,025] [INFO] [logging.py:96:log_dist] [Rank 0] step=680, skipped=13, lr=[2.4484555721226048e-06, 2.4484555721226048e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:35:51,205] [INFO] [timer.py:215:stop] epoch=0/micro_step=680/global_step=680, RunningAvgSamplesPerSec=51.66037751112657, CurrSamplesPerSec=50.75463074832772, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:35:51,365] [INFO] [logging.py:96:log_dist] [Rank 0] step=680, skipped=11, lr=[1.2526557976586267e-06, 1.2526557976586267e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 679|ppo_ep: 1|act_loss: 0.01154327392578125|cri_loss: 0.0034656524658203125|unsuper_loss: 0.0
average reward score: 1.4794921875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.80s (31.76%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81
epoch: 0|step: 680|ppo_ep: 1|act_loss: 0.08856201171875|cri_loss: 0.0238037109375|unsuper_loss: 0.0
average reward score: 3.126953125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.46%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 681|ppo_ep: 1|act_loss: -0.01373291015625|cri_loss: 0.03765869140625|unsuper_loss: 0.0
average reward score: 3.00390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.59%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 682|ppo_ep: 1|act_loss: -0.006565093994140625|cri_loss: 0.021728515625|unsuper_loss: 0.0
average reward score: 1.580078125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.82%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 683|ppo_ep: 1|act_loss: -0.07672119140625|cri_loss: 0.0238189697265625|unsuper_loss: 0.0
average reward score: 2.21484375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.42%) |Training time=0.79s (31.74%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 684|ppo_ep: 1|act_loss: -0.07891845703125|cri_loss: 0.0185089111328125|unsuper_loss: 0.0
average reward score: 2.953125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.37%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 685|ppo_ep: 1|act_loss: -0.0187530517578125|cri_loss: 0.048583984375|unsuper_loss: 0.0
average reward score: 1.943359375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.56%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 686|ppo_ep: 1|act_loss: 0.031829833984375|cri_loss: 0.0274200439453125|unsuper_loss: 0.0
average reward score: 2.056640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.61%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 687|ppo_ep: 1|act_loss: 0.01371002197265625|cri_loss: 0.021240234375|unsuper_loss: 0.0
average reward score: 3.365234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.69%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 688|ppo_ep: 1|act_loss: -0.006954193115234375|cri_loss: 0.0207061767578125|unsuper_loss: 0.0
average reward score: 1.9736328125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.50%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
[2023-07-01 08:36:16,005] [INFO] [logging.py:96:log_dist] [Rank 0] step=690, skipped=13, lr=[2.295625864681438e-06, 2.295625864681438e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:36:16,181] [INFO] [timer.py:215:stop] epoch=0/micro_step=690/global_step=690, RunningAvgSamplesPerSec=51.65406902884359, CurrSamplesPerSec=51.01762460321171, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:36:16,340] [INFO] [logging.py:96:log_dist] [Rank 0] step=690, skipped=11, lr=[1.1738155905995186e-06, 1.1738155905995186e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 689|ppo_ep: 1|act_loss: -0.017364501953125|cri_loss: 0.02587890625|unsuper_loss: 0.0
average reward score: 1.4228515625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.50%) |Training time=0.79s (31.74%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 690|ppo_ep: 1|act_loss: -0.09100341796875|cri_loss: 0.024322509765625|unsuper_loss: 0.0
average reward score: 1.134765625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.65%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 691|ppo_ep: 1|act_loss: -0.12139892578125|cri_loss: 0.0238189697265625|unsuper_loss: 0.0
average reward score: 3.15625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.59%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 692|ppo_ep: 1|act_loss: 0.0005116462707519531|cri_loss: 0.032379150390625|unsuper_loss: 0.0
average reward score: 3.25
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.58%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 693|ppo_ep: 1|act_loss: -0.035400390625|cri_loss: 0.01525115966796875|unsuper_loss: 0.0
average reward score: 2.705078125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.72%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 694|ppo_ep: 1|act_loss: 0.0008740425109863281|cri_loss: 0.04547119140625|unsuper_loss: 0.0
average reward score: 1.80078125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.72%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 695|ppo_ep: 1|act_loss: 0.0579833984375|cri_loss: 0.0850830078125|unsuper_loss: 0.0
average reward score: 1.5263671875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.44%) |Training time=0.79s (31.77%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 696|ppo_ep: 1|act_loss: 0.11370849609375|cri_loss: 0.12408447265625|unsuper_loss: 0.0
average reward score: 2.4375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.20%) |Training time=0.80s (31.97%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 697|ppo_ep: 1|act_loss: -0.0389404296875|cri_loss: 0.097900390625|unsuper_loss: 0.0
average reward score: 1.2939453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.31%) |Training time=0.80s (31.85%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 698|ppo_ep: 1|act_loss: -0.08319091796875|cri_loss: 0.166259765625|unsuper_loss: 0.0
average reward score: 1.3935546875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.40%) |Training time=0.80s (31.83%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
[2023-07-01 08:36:41,010] [INFO] [logging.py:96:log_dist] [Rank 0] step=700, skipped=13, lr=[2.146218693834001e-06, 2.146218693834001e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:36:41,189] [INFO] [timer.py:215:stop] epoch=0/micro_step=700/global_step=700, RunningAvgSamplesPerSec=51.64200175761014, CurrSamplesPerSec=50.62376611439234, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:36:41,348] [INFO] [logging.py:96:log_dist] [Rank 0] step=700, skipped=11, lr=[1.0967698648738866e-06, 1.0967698648738866e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 699|ppo_ep: 1|act_loss: -0.171875|cri_loss: 0.11260986328125|unsuper_loss: 0.0
average reward score: 2.306640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.39%) |Training time=0.80s (31.85%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 700|ppo_ep: 1|act_loss: -0.1732177734375|cri_loss: 0.07244873046875|unsuper_loss: 0.0
average reward score: 0.767578125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.58%) |Training time=0.79s (31.59%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 701|ppo_ep: 1|act_loss: -0.1396484375|cri_loss: 0.0650634765625|unsuper_loss: 0.0
average reward score: 2.50390625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.66%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 702|ppo_ep: 1|act_loss: -0.036285400390625|cri_loss: 0.0423583984375|unsuper_loss: 0.0
average reward score: 2.64453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.54%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 703|ppo_ep: 1|act_loss: 0.05657958984375|cri_loss: 0.10009765625|unsuper_loss: 0.0
average reward score: 2.67578125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.40%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 704|ppo_ep: 1|act_loss: 0.015777587890625|cri_loss: 0.035736083984375|unsuper_loss: 0.0
average reward score: 1.482421875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.41%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 705|ppo_ep: 1|act_loss: 0.036834716796875|cri_loss: 0.046722412109375|unsuper_loss: 0.0
average reward score: 1.908203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.58%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 706|ppo_ep: 1|act_loss: 0.10400390625|cri_loss: 0.1390380859375|unsuper_loss: 0.0
average reward score: 1.59765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.68%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 707|ppo_ep: 1|act_loss: -0.0270843505859375|cri_loss: 0.06463623046875|unsuper_loss: 0.0
average reward score: 2.8828125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.73%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 708|ppo_ep: 1|act_loss: -0.0789794921875|cri_loss: 0.06964111328125|unsuper_loss: 0.0
average reward score: 1.78515625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.60%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
[2023-07-01 08:37:05,976] [INFO] [logging.py:96:log_dist] [Rank 0] step=710, skipped=13, lr=[2.0004362248125774e-06, 2.0004362248125774e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:37:06,156] [INFO] [timer.py:215:stop] epoch=0/micro_step=710/global_step=710, RunningAvgSamplesPerSec=51.63689936822974, CurrSamplesPerSec=51.00378221753546, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:37:06,315] [INFO] [logging.py:96:log_dist] [Rank 0] step=710, skipped=11, lr=[1.0216228722853735e-06, 1.0216228722853735e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 709|ppo_ep: 1|act_loss: -0.06561279296875|cri_loss: 0.08526611328125|unsuper_loss: 0.0
average reward score: 0.443115234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.76%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 710|ppo_ep: 1|act_loss: 0.196044921875|cri_loss: 0.1932373046875|unsuper_loss: 0.0
average reward score: 1.380859375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.63%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 711|ppo_ep: 1|act_loss: 0.0289306640625|cri_loss: 0.1053466796875|unsuper_loss: 0.0
average reward score: 1.048828125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.35%) |Training time=0.80s (31.77%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 712|ppo_ep: 1|act_loss: 0.0615234375|cri_loss: 0.115966796875|unsuper_loss: 0.0
average reward score: 1.66796875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.40%) |Training time=0.80s (31.79%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 713|ppo_ep: 1|act_loss: 0.0733642578125|cri_loss: 0.12017822265625|unsuper_loss: 0.0
average reward score: 1.048828125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.29%) |Training time=0.80s (31.93%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 714|ppo_ep: 1|act_loss: -0.06158447265625|cri_loss: 0.1793212890625|unsuper_loss: 0.0
average reward score: 0.439453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.58%) |Training time=0.79s (31.60%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 715|ppo_ep: 1|act_loss: -0.1697998046875|cri_loss: 0.1650390625|unsuper_loss: 0.0
average reward score: -0.44970703125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.65%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 716|ppo_ep: 1|act_loss: -0.1690673828125|cri_loss: 0.208984375|unsuper_loss: 0.0
average reward score: 0.8740234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.49%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 717|ppo_ep: 1|act_loss: -0.1729736328125|cri_loss: 0.2359619140625|unsuper_loss: 0.0
average reward score: -0.2587890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.72%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 718|ppo_ep: 1|act_loss: 0.0153656005859375|cri_loss: 0.148193359375|unsuper_loss: 0.0
average reward score: -0.103271484375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.56%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
[2023-07-01 08:37:30,985] [INFO] [logging.py:96:log_dist] [Rank 0] step=720, skipped=13, lr=[1.8584757182074397e-06, 1.8584757182074397e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:37:31,162] [INFO] [timer.py:215:stop] epoch=0/micro_step=720/global_step=720, RunningAvgSamplesPerSec=51.62766889502093, CurrSamplesPerSec=50.76904873941442, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:37:31,322] [INFO] [logging.py:96:log_dist] [Rank 0] step=720, skipped=11, lr=[9.48476295431443e-07, 9.48476295431443e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 719|ppo_ep: 1|act_loss: 0.033660888671875|cri_loss: 0.10284423828125|unsuper_loss: 0.0
average reward score: -0.1505126953125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.40%) |Training time=0.80s (31.82%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 720|ppo_ep: 1|act_loss: 0.29541015625|cri_loss: 0.163818359375|unsuper_loss: 0.0
average reward score: -0.421875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.40%) |Training time=0.80s (31.81%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 721|ppo_ep: 1|act_loss: 0.0809326171875|cri_loss: 0.1768798828125|unsuper_loss: 0.0
average reward score: 1.0595703125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.40%) |Training time=0.80s (31.86%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 722|ppo_ep: 1|act_loss: -0.0162200927734375|cri_loss: 0.301513671875|unsuper_loss: 0.0
average reward score: 0.89404296875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.60%) |Training time=0.79s (31.63%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 723|ppo_ep: 1|act_loss: 0.06671142578125|cri_loss: 0.1895751953125|unsuper_loss: 0.0
average reward score: 0.38720703125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.81%) |Training time=0.78s (31.39%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.81
epoch: 0|step: 724|ppo_ep: 1|act_loss: 0.2216796875|cri_loss: 0.252197265625|unsuper_loss: 0.0
average reward score: 0.059783935546875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.66%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 725|ppo_ep: 1|act_loss: 0.10699462890625|cri_loss: 0.1409912109375|unsuper_loss: 0.0
average reward score: 0.26123046875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.38%) |Training time=0.80s (31.80%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 726|ppo_ep: 1|act_loss: 0.1195068359375|cri_loss: 0.1507568359375|unsuper_loss: 0.0
average reward score: -1.546875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.58%) |Training time=0.79s (31.62%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 727|ppo_ep: 1|act_loss: -0.060546875|cri_loss: 0.366943359375|unsuper_loss: 0.0
average reward score: -1.1201171875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.51%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 728|ppo_ep: 1|act_loss: 0.079833984375|cri_loss: 0.1524658203125|unsuper_loss: 0.0
average reward score: 0.2132568359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.66%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
[2023-07-01 08:37:55,964] [INFO] [logging.py:96:log_dist] [Rank 0] step=730, skipped=13, lr=[1.7205292630503881e-06, 1.7205292630503881e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:37:56,144] [INFO] [timer.py:215:stop] epoch=0/micro_step=730/global_step=730, RunningAvgSamplesPerSec=51.62026118095024, CurrSamplesPerSec=51.04830235468179, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:37:56,304] [INFO] [logging.py:96:log_dist] [Rank 0] step=730, skipped=11, lr=[8.774291101150409e-07, 8.774291101150409e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 729|ppo_ep: 1|act_loss: 0.03680419921875|cri_loss: 0.1708984375|unsuper_loss: 0.0
average reward score: 0.17724609375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.68%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 730|ppo_ep: 1|act_loss: -0.13818359375|cri_loss: 0.16748046875|unsuper_loss: 0.0
average reward score: -0.94677734375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.80s (31.78%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
epoch: 0|step: 731|ppo_ep: 1|act_loss: -0.0872802734375|cri_loss: 0.1845703125|unsuper_loss: 0.0
average reward score: 0.32568359375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.55%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 732|ppo_ep: 1|act_loss: 0.098876953125|cri_loss: 0.29296875|unsuper_loss: 0.0
average reward score: -0.9912109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.62%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
[2023-07-01 08:38:06,298] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 733|ppo_ep: 1|act_loss: 0.157470703125|cri_loss: 0.200439453125|unsuper_loss: 0.0
average reward score: 0.067626953125
-------------------------------------------------------------------------------------
|E2E latency=2.46s |Gather latency=0.00s (0.00%) |Generate time=1.49s (60.45%) |Training time=0.80s (32.47%) |Others=0.17 (7.08%)|CurSamplesPerSec=13.01 |AvgSamplesPerSec=12.81
epoch: 0|step: 734|ppo_ep: 1|act_loss: 0.155029296875|cri_loss: 0.2490234375|unsuper_loss: 0.0
average reward score: -0.85009765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.46%) |Training time=0.79s (31.71%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 735|ppo_ep: 1|act_loss: 0.24755859375|cri_loss: 0.218017578125|unsuper_loss: 0.0
average reward score: 0.4453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.66%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 736|ppo_ep: 1|act_loss: -0.05853271484375|cri_loss: 0.150146484375|unsuper_loss: 0.0
average reward score: -0.090087890625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.50%) |Training time=0.79s (31.69%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
[2023-07-01 08:38:16,247] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192
epoch: 0|step: 737|ppo_ep: 1|act_loss: 0.14111328125|cri_loss: 0.1685791015625|unsuper_loss: 0.0
average reward score: 0.525390625
-------------------------------------------------------------------------------------
|E2E latency=2.45s |Gather latency=0.00s (0.00%) |Generate time=1.48s (60.52%) |Training time=0.79s (32.43%) |Others=0.17 (7.05%)|CurSamplesPerSec=13.06 |AvgSamplesPerSec=12.81
epoch: 0|step: 738|ppo_ep: 1|act_loss: 0.07269287109375|cri_loss: 0.1416015625|unsuper_loss: 0.0
average reward score: -1.103515625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.60%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
[2023-07-01 08:38:20,868] [INFO] [logging.py:96:log_dist] [Rank 0] step=740, skipped=13, lr=[1.5867835168960191e-06, 1.5867835168960191e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:38:21,044] [INFO] [timer.py:215:stop] epoch=0/micro_step=740/global_step=740, RunningAvgSamplesPerSec=51.61111994665265, CurrSamplesPerSec=50.71313119023716, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:38:21,206] [INFO] [logging.py:96:log_dist] [Rank 0] step=740, skipped=13, lr=[8.221676253347249e-07, 8.221676253347249e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 739|ppo_ep: 1|act_loss: 0.0261383056640625|cri_loss: 0.169677734375|unsuper_loss: 0.0
average reward score: -0.247314453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.75%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 740|ppo_ep: 1|act_loss: -0.0654296875|cri_loss: 0.1591796875|unsuper_loss: 0.0
average reward score: -0.61767578125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.72%) |Training time=0.79s (31.51%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 741|ppo_ep: 1|act_loss: -0.16064453125|cri_loss: 0.169677734375|unsuper_loss: 0.0
average reward score: -0.465576171875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.39%) |Training time=0.80s (31.84%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 742|ppo_ep: 1|act_loss: -0.2086181640625|cri_loss: 0.144775390625|unsuper_loss: 0.0
average reward score: -0.5927734375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.69%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 743|ppo_ep: 1|act_loss: 0.037445068359375|cri_loss: 0.10546875|unsuper_loss: 0.0
average reward score: -0.2474365234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.73%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.81
epoch: 0|step: 744|ppo_ep: 1|act_loss: 0.138427734375|cri_loss: 0.07501220703125|unsuper_loss: 0.0
average reward score: 0.61865234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.40%) |Training time=0.80s (31.78%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 745|ppo_ep: 1|act_loss: 0.1702880859375|cri_loss: 0.077880859375|unsuper_loss: 0.0
average reward score: -0.82177734375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.47%) |Training time=0.79s (31.75%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 746|ppo_ep: 1|act_loss: 0.1856689453125|cri_loss: 0.08074951171875|unsuper_loss: 0.0
average reward score: 0.21044921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.73%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 747|ppo_ep: 1|act_loss: 0.1865234375|cri_loss: 0.1158447265625|unsuper_loss: 0.0
average reward score: -1.3857421875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.26%) |Training time=0.80s (31.93%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.81
epoch: 0|step: 748|ppo_ep: 1|act_loss: 0.0037364959716796875|cri_loss: 0.07928466796875|unsuper_loss: 0.0
average reward score: 0.62353515625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.47%) |Training time=0.79s (31.74%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
[2023-07-01 08:38:45,868] [INFO] [logging.py:96:log_dist] [Rank 0] step=750, skipped=13, lr=[1.4574194532523914e-06, 1.4574194532523914e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:38:46,043] [INFO] [timer.py:215:stop] epoch=0/micro_step=750/global_step=750, RunningAvgSamplesPerSec=51.601553142551516, CurrSamplesPerSec=51.397851455320605, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:38:46,203] [INFO] [logging.py:96:log_dist] [Rank 0] step=750, skipped=13, lr=[7.551396130841406e-07, 7.551396130841406e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 749|ppo_ep: 1|act_loss: 0.04638671875|cri_loss: 0.10772705078125|unsuper_loss: 0.0
average reward score: -0.44140625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.54%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 750|ppo_ep: 1|act_loss: 0.0013256072998046875|cri_loss: 0.052490234375|unsuper_loss: 0.0
average reward score: -0.0712890625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.36%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 751|ppo_ep: 1|act_loss: 0.004772186279296875|cri_loss: 0.09405517578125|unsuper_loss: 0.0
average reward score: 0.298583984375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.38%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.81
epoch: 0|step: 752|ppo_ep: 1|act_loss: 0.09661865234375|cri_loss: 0.0604248046875|unsuper_loss: 0.0
average reward score: 0.10296630859375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.62%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 753|ppo_ep: 1|act_loss: 0.033203125|cri_loss: 0.0548095703125|unsuper_loss: 0.0
average reward score: -0.53076171875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.74%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 754|ppo_ep: 1|act_loss: 0.0653076171875|cri_loss: 0.06781005859375|unsuper_loss: 0.0
average reward score: -0.69140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.64%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 755|ppo_ep: 1|act_loss: -0.1654052734375|cri_loss: 0.11083984375|unsuper_loss: 0.0
average reward score: -0.413818359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.62%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 756|ppo_ep: 1|act_loss: 0.027496337890625|cri_loss: 0.08026123046875|unsuper_loss: 0.0
average reward score: -0.28857421875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.85%) |Training time=0.78s (31.31%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 757|ppo_ep: 1|act_loss: -0.052581787109375|cri_loss: 0.059814453125|unsuper_loss: 0.0
average reward score: -0.463623046875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.42%) |Training time=0.79s (31.70%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 758|ppo_ep: 1|act_loss: -0.08880615234375|cri_loss: 0.069580078125|unsuper_loss: 0.0
average reward score: -0.54248046875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.35%) |Training time=0.80s (31.84%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.81
[2023-07-01 08:39:10,847] [INFO] [logging.py:96:log_dist] [Rank 0] step=760, skipped=13, lr=[1.3326121167028917e-06, 1.3326121167028917e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:39:11,028] [INFO] [timer.py:215:stop] epoch=0/micro_step=760/global_step=760, RunningAvgSamplesPerSec=51.59554618091631, CurrSamplesPerSec=50.50729342630164, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:39:11,188] [INFO] [logging.py:96:log_dist] [Rank 0] step=760, skipped=13, lr=[6.904725993279232e-07, 6.904725993279232e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 759|ppo_ep: 1|act_loss: 0.0384521484375|cri_loss: 0.02911376953125|unsuper_loss: 0.0
average reward score: -1.189453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.33%) |Training time=0.80s (31.85%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.81
epoch: 0|step: 760|ppo_ep: 1|act_loss: 0.06256103515625|cri_loss: 0.05865478515625|unsuper_loss: 0.0
average reward score: -0.84326171875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.37%) |Training time=0.80s (31.83%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.81
epoch: 0|step: 761|ppo_ep: 1|act_loss: -0.0215911865234375|cri_loss: 0.037811279296875|unsuper_loss: 0.0
average reward score: -1.1328125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.31%) |Training time=0.80s (31.84%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81
epoch: 0|step: 762|ppo_ep: 1|act_loss: -0.09063720703125|cri_loss: 0.06414794921875|unsuper_loss: 0.0
average reward score: -0.6279296875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.36%) |Training time=0.80s (31.78%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.81
epoch: 0|step: 763|ppo_ep: 1|act_loss: 0.09454345703125|cri_loss: 0.12274169921875|unsuper_loss: 0.0
average reward score: 0.24560546875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.69%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.81
epoch: 0|step: 764|ppo_ep: 1|act_loss: 0.0235748291015625|cri_loss: 0.04290771484375|unsuper_loss: 0.0
average reward score: 0.29736328125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.39%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 765|ppo_ep: 1|act_loss: -0.1361083984375|cri_loss: 0.08929443359375|unsuper_loss: 0.0
average reward score: -0.6435546875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.78s (31.44%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.81
epoch: 0|step: 766|ppo_ep: 1|act_loss: 0.1015625|cri_loss: 0.0264434814453125|unsuper_loss: 0.0
average reward score: 0.68603515625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.60%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.81
epoch: 0|step: 767|ppo_ep: 1|act_loss: -0.0099945068359375|cri_loss: 0.073486328125|unsuper_loss: 0.0
average reward score: -0.603515625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.46%) |Training time=0.79s (31.76%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
epoch: 0|step: 768|ppo_ep: 1|act_loss: -0.231689453125|cri_loss: 0.1632080078125|unsuper_loss: 0.0
average reward score: -1.33984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.71%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.81
[2023-07-01 08:39:35,827] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, but hysteresis is 2. Reducing hysteresis to 1
[2023-07-01 08:39:35,827] [INFO] [logging.py:96:log_dist] [Rank 0] step=770, skipped=14, lr=[1.2243212249131722e-06, 1.2243212249131722e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:39:35,828] [INFO] [timer.py:215:stop] epoch=0/micro_step=770/global_step=770, RunningAvgSamplesPerSec=51.60825437477312, CurrSamplesPerSec=73.02824111153672, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:39:35,986] [INFO] [logging.py:96:log_dist] [Rank 0] step=770, skipped=13, lr=[6.282540860365757e-07, 6.282540860365757e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 769|ppo_ep: 1|act_loss: -0.097900390625|cri_loss: 0.07208251953125|unsuper_loss: 0.0
average reward score: -0.7880859375
-------------------------------------------------------------------------------------
|E2E latency=2.31s |Gather latency=0.00s (0.00%) |Generate time=1.49s (64.46%) |Training time=0.60s (26.09%) |Others=0.22 (9.45%)|CurSamplesPerSec=13.88 |AvgSamplesPerSec=12.81
[2023-07-01 08:39:38,139] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048
[2023-07-01 08:39:38,294] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096
epoch: 0|step: 770|ppo_ep: 1|act_loss: -0.00699615478515625|cri_loss: 0.0513916015625|unsuper_loss: 0.0
average reward score: -0.94140625
-------------------------------------------------------------------------------------
|E2E latency=2.27s |Gather latency=0.00s (0.00%) |Generate time=1.49s (65.58%) |Training time=0.61s (26.75%) |Others=0.17 (7.67%)|CurSamplesPerSec=14.11 |AvgSamplesPerSec=12.82
epoch: 0|step: 771|ppo_ep: 1|act_loss: -0.2410888671875|cri_loss: 0.2340087890625|unsuper_loss: 0.0
average reward score: 0.391357421875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.69%) |Training time=0.79s (31.49%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82
epoch: 0|step: 772|ppo_ep: 1|act_loss: -0.033966064453125|cri_loss: 0.11492919921875|unsuper_loss: 0.0
average reward score: -2.4765625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.82%) |Training time=0.78s (31.38%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82
epoch: 0|step: 773|ppo_ep: 1|act_loss: -0.1956787109375|cri_loss: 0.25927734375|unsuper_loss: 0.0
average reward score: -2.830078125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.73%) |Training time=0.79s (31.46%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82
epoch: 0|step: 774|ppo_ep: 1|act_loss: -0.07672119140625|cri_loss: 0.041412353515625|unsuper_loss: 0.0
average reward score: 0.247314453125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.78s (31.48%) |Others=0.22 (8.89%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82
epoch: 0|step: 775|ppo_ep: 1|act_loss: -0.303466796875|cri_loss: 0.209228515625|unsuper_loss: 0.0
average reward score: -1.1796875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.27%) |Training time=0.80s (31.87%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.82
epoch: 0|step: 776|ppo_ep: 1|act_loss: -0.072509765625|cri_loss: 0.1566162109375|unsuper_loss: 0.0
average reward score: -1.9306640625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.36%) |Training time=0.80s (31.79%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.82
epoch: 0|step: 777|ppo_ep: 1|act_loss: 0.0977783203125|cri_loss: 0.155029296875|unsuper_loss: 0.0
average reward score: -1.7763671875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.76%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82
[2023-07-01 08:39:57,895] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024
epoch: 0|step: 778|ppo_ep: 1|act_loss: 0.138427734375|cri_loss: 0.20068359375|unsuper_loss: 0.0
average reward score: -1.125
-------------------------------------------------------------------------------------
|E2E latency=2.31s |Gather latency=0.00s (0.00%) |Generate time=1.49s (64.43%) |Training time=0.60s (26.02%) |Others=0.22 (9.55%)|CurSamplesPerSec=13.86 |AvgSamplesPerSec=12.82
[2023-07-01 08:40:00,214] [INFO] [logging.py:96:log_dist] [Rank 0] step=780, skipped=16, lr=[1.1313721839601206e-06, 1.1313721839601206e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:40:00,394] [INFO] [timer.py:215:stop] epoch=0/micro_step=780/global_step=780, RunningAvgSamplesPerSec=51.64180602872668, CurrSamplesPerSec=51.00333643923653, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:40:00,555] [INFO] [logging.py:96:log_dist] [Rank 0] step=780, skipped=14, lr=[5.744205443756365e-07, 5.744205443756365e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 779|ppo_ep: 1|act_loss: 0.09027099609375|cri_loss: 0.1339111328125|unsuper_loss: 0.0
average reward score: -0.81591796875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.69%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 780|ppo_ep: 1|act_loss: -0.0261688232421875|cri_loss: 0.0543212890625|unsuper_loss: 0.0
average reward score: -2.0
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.79s (31.65%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82
epoch: 0|step: 781|ppo_ep: 1|act_loss: 0.0146026611328125|cri_loss: 0.1873779296875|unsuper_loss: 0.0
average reward score: -0.78955078125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.33%) |Training time=0.80s (31.86%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82
epoch: 0|step: 782|ppo_ep: 1|act_loss: 0.1781005859375|cri_loss: 0.1383056640625|unsuper_loss: 0.0
average reward score: -0.488037109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.61%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82
epoch: 0|step: 783|ppo_ep: 1|act_loss: -0.0207061767578125|cri_loss: 0.1483154296875|unsuper_loss: 0.0
average reward score: -3.36328125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.47%) |Training time=0.80s (31.78%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82
epoch: 0|step: 784|ppo_ep: 1|act_loss: 0.0156402587890625|cri_loss: 0.1343994140625|unsuper_loss: 0.0
average reward score: -2.09765625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.95%) |Training time=0.78s (31.21%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.82
epoch: 0|step: 785|ppo_ep: 1|act_loss: -0.10455322265625|cri_loss: 0.095458984375|unsuper_loss: 0.0
average reward score: -1.41796875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.78s (31.45%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82
epoch: 0|step: 786|ppo_ep: 1|act_loss: 0.0179901123046875|cri_loss: 0.1077880859375|unsuper_loss: 0.0
average reward score: -3.609375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.30%) |Training time=0.80s (31.87%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82
epoch: 0|step: 787|ppo_ep: 1|act_loss: 0.054901123046875|cri_loss: 0.1431884765625|unsuper_loss: 0.0
average reward score: -1.2724609375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.78s (31.53%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.82
epoch: 0|step: 788|ppo_ep: 1|act_loss: -0.1170654296875|cri_loss: 0.05938720703125|unsuper_loss: 0.0
average reward score: -2.380859375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.70%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82
[2023-07-01 08:40:25,465] [INFO] [logging.py:96:log_dist] [Rank 0] step=790, skipped=16, lr=[1.0196933519708125e-06, 1.0196933519708125e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:40:25,641] [INFO] [timer.py:215:stop] epoch=0/micro_step=790/global_step=790, RunningAvgSamplesPerSec=51.63518407852763, CurrSamplesPerSec=50.67183335321682, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:40:25,800] [INFO] [logging.py:96:log_dist] [Rank 0] step=790, skipped=14, lr=[5.170832921371164e-07, 5.170832921371164e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 789|ppo_ep: 1|act_loss: 0.0171356201171875|cri_loss: 0.066650390625|unsuper_loss: 0.0
average reward score: -0.09027099609375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.37%) |Training time=0.80s (31.89%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 790|ppo_ep: 1|act_loss: -0.006694793701171875|cri_loss: 0.07513427734375|unsuper_loss: 0.0
average reward score: -0.7490234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.54%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 791|ppo_ep: 1|act_loss: -0.0699462890625|cri_loss: 0.048736572265625|unsuper_loss: 0.0
average reward score: -0.6640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.76%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 792|ppo_ep: 1|act_loss: -0.0310211181640625|cri_loss: 0.037322998046875|unsuper_loss: 0.0
average reward score: -0.28076171875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.64%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82
epoch: 0|step: 793|ppo_ep: 1|act_loss: 0.049957275390625|cri_loss: 0.06402587890625|unsuper_loss: 0.0
average reward score: -1.1796875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.39%) |Training time=0.79s (31.78%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 794|ppo_ep: 1|act_loss: 0.09295654296875|cri_loss: 0.049041748046875|unsuper_loss: 0.0
average reward score: 1.130859375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.69%) |Training time=0.79s (31.48%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82
epoch: 0|step: 795|ppo_ep: 1|act_loss: -0.065185546875|cri_loss: 0.060638427734375|unsuper_loss: 0.0
average reward score: -0.9345703125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.63%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82
epoch: 0|step: 796|ppo_ep: 1|act_loss: 0.08929443359375|cri_loss: 0.040374755859375|unsuper_loss: 0.0
average reward score: -0.03204345703125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.77%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 797|ppo_ep: 1|act_loss: 0.0360107421875|cri_loss: 0.0285797119140625|unsuper_loss: 0.0
average reward score: 1.12109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.65%) |Training time=0.79s (31.54%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 798|ppo_ep: 1|act_loss: 0.06768798828125|cri_loss: 0.194580078125|unsuper_loss: 0.0
average reward score: 0.4609375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.35%) |Training time=0.80s (31.84%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82
[2023-07-01 08:40:50,465] [INFO] [logging.py:96:log_dist] [Rank 0] step=800, skipped=16, lr=[9.131635412636474e-07, 9.131635412636474e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:40:50,642] [INFO] [timer.py:215:stop] epoch=0/micro_step=800/global_step=800, RunningAvgSamplesPerSec=51.62718218371995, CurrSamplesPerSec=50.86661858599624, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:40:50,802] [INFO] [logging.py:96:log_dist] [Rank 0] step=800, skipped=14, lr=[4.624291562079719e-07, 4.624291562079719e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 799|ppo_ep: 1|act_loss: 0.1156005859375|cri_loss: 0.060760498046875|unsuper_loss: 0.0
average reward score: 0.00958251953125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.47%) |Training time=0.79s (31.76%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 800|ppo_ep: 1|act_loss: 0.0296478271484375|cri_loss: 0.00811004638671875|unsuper_loss: 0.0
average reward score: 1.1142578125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.52%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.82
epoch: 0|step: 801|ppo_ep: 1|act_loss: 0.0025157928466796875|cri_loss: 0.027435302734375|unsuper_loss: 0.0
average reward score: 1.2216796875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.78s (31.49%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.82
epoch: 0|step: 802|ppo_ep: 1|act_loss: 0.06158447265625|cri_loss: 0.0352783203125|unsuper_loss: 0.0
average reward score: -0.1055908203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.64%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 803|ppo_ep: 1|act_loss: 0.1011962890625|cri_loss: 0.0164642333984375|unsuper_loss: 0.0
average reward score: -0.1495361328125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.39%) |Training time=0.80s (31.79%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82
epoch: 0|step: 804|ppo_ep: 1|act_loss: -0.0045623779296875|cri_loss: 0.01430511474609375|unsuper_loss: 0.0
average reward score: 0.54833984375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.33%) |Training time=0.80s (31.90%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82
epoch: 0|step: 805|ppo_ep: 1|act_loss: -0.00469970703125|cri_loss: 0.05517578125|unsuper_loss: 0.0
average reward score: 0.65185546875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.77%) |Training time=0.78s (31.47%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82
epoch: 0|step: 806|ppo_ep: 1|act_loss: -0.0382080078125|cri_loss: 0.053131103515625|unsuper_loss: 0.0
average reward score: 1.369140625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.85%) |Training time=0.78s (31.35%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.82
epoch: 0|step: 807|ppo_ep: 1|act_loss: -0.08514404296875|cri_loss: 0.051727294921875|unsuper_loss: 0.0
average reward score: 0.53857421875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.85%) |Training time=0.78s (31.27%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82
epoch: 0|step: 808|ppo_ep: 1|act_loss: -0.0787353515625|cri_loss: 0.0428466796875|unsuper_loss: 0.0
average reward score: 0.389892578125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.59%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
[2023-07-01 08:41:15,429] [INFO] [logging.py:96:log_dist] [Rank 0] step=810, skipped=16, lr=[8.119268990291768e-07, 8.119268990291768e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:41:15,610] [INFO] [timer.py:215:stop] epoch=0/micro_step=810/global_step=810, RunningAvgSamplesPerSec=51.6220586884916, CurrSamplesPerSec=50.11381577129612, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:41:15,769] [INFO] [logging.py:96:log_dist] [Rank 0] step=810, skipped=14, lr=[4.1053208997358816e-07, 4.1053208997358816e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 809|ppo_ep: 1|act_loss: -0.05810546875|cri_loss: 0.048583984375|unsuper_loss: 0.0
average reward score: 0.169921875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.19%) |Training time=0.80s (32.03%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.82
epoch: 0|step: 810|ppo_ep: 1|act_loss: -0.1319580078125|cri_loss: 0.0921630859375|unsuper_loss: 0.0
average reward score: -1.751953125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.31%) |Training time=0.80s (31.90%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82
epoch: 0|step: 811|ppo_ep: 1|act_loss: -0.115234375|cri_loss: 0.049957275390625|unsuper_loss: 0.0
average reward score: -0.065185546875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.67%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82
epoch: 0|step: 812|ppo_ep: 1|act_loss: -0.05712890625|cri_loss: 0.05364990234375|unsuper_loss: 0.0
average reward score: -0.892578125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.72%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 813|ppo_ep: 1|act_loss: 0.05908203125|cri_loss: 0.05279541015625|unsuper_loss: 0.0
average reward score: 0.740234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.71%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 814|ppo_ep: 1|act_loss: -0.01099395751953125|cri_loss: 0.045440673828125|unsuper_loss: 0.0
average reward score: 1.83984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.59%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 815|ppo_ep: 1|act_loss: -0.034515380859375|cri_loss: 0.0224456787109375|unsuper_loss: 0.0
average reward score: 0.0625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.57%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82
epoch: 0|step: 816|ppo_ep: 1|act_loss: 0.049652099609375|cri_loss: 0.0233001708984375|unsuper_loss: 0.0
average reward score: -0.461181640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.50%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82
epoch: 0|step: 817|ppo_ep: 1|act_loss: 0.055389404296875|cri_loss: 0.03802490234375|unsuper_loss: 0.0
average reward score: 0.720703125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.53%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 818|ppo_ep: 1|act_loss: 0.09637451171875|cri_loss: 0.0672607421875|unsuper_loss: 0.0
average reward score: 0.5283203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.56%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
[2023-07-01 08:41:40,412] [INFO] [logging.py:96:log_dist] [Rank 0] step=820, skipped=16, lr=[7.161204101870459e-07, 7.161204101870459e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:41:40,588] [INFO] [timer.py:215:stop] epoch=0/micro_step=820/global_step=820, RunningAvgSamplesPerSec=51.616404119052, CurrSamplesPerSec=51.77697640793942, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:41:40,746] [INFO] [logging.py:96:log_dist] [Rank 0] step=820, skipped=14, lr=[3.614623161842565e-07, 3.614623161842565e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 819|ppo_ep: 1|act_loss: 0.05255126953125|cri_loss: 0.01995849609375|unsuper_loss: 0.0
average reward score: 0.64306640625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.83%) |Training time=0.78s (31.41%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.82
epoch: 0|step: 820|ppo_ep: 1|act_loss: -0.04034423828125|cri_loss: 0.130126953125|unsuper_loss: 0.0
average reward score: 0.168212890625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.93%) |Training time=0.78s (31.30%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.82
epoch: 0|step: 821|ppo_ep: 1|act_loss: 0.01351165771484375|cri_loss: 0.07733154296875|unsuper_loss: 0.0
average reward score: 1.2470703125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.55%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82
epoch: 0|step: 822|ppo_ep: 1|act_loss: -0.1065673828125|cri_loss: 0.0628662109375|unsuper_loss: 0.0
average reward score: -0.47412109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.65%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 823|ppo_ep: 1|act_loss: -0.0014324188232421875|cri_loss: 0.035858154296875|unsuper_loss: 0.0
average reward score: 0.188720703125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.46%) |Training time=0.80s (31.78%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82
epoch: 0|step: 824|ppo_ep: 1|act_loss: -0.1934814453125|cri_loss: 0.1246337890625|unsuper_loss: 0.0
average reward score: -0.21630859375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.69%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 825|ppo_ep: 1|act_loss: 0.01076507568359375|cri_loss: 0.0192718505859375|unsuper_loss: 0.0
average reward score: -0.4912109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.70%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82
epoch: 0|step: 826|ppo_ep: 1|act_loss: -0.049957275390625|cri_loss: 0.01800537109375|unsuper_loss: 0.0
average reward score: 0.06787109375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.80s (31.73%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.82
epoch: 0|step: 827|ppo_ep: 1|act_loss: -0.0654296875|cri_loss: 0.0181121826171875|unsuper_loss: 0.0
average reward score: 0.607421875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.13%) |Training time=0.80s (32.01%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.73 |AvgSamplesPerSec=12.82
epoch: 0|step: 828|ppo_ep: 1|act_loss: -0.040008544921875|cri_loss: 0.04144287109375|unsuper_loss: 0.0
average reward score: 0.72509765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.30%) |Training time=0.80s (31.91%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82
[2023-07-01 08:42:05,412] [INFO] [logging.py:96:log_dist] [Rank 0] step=830, skipped=16, lr=[6.258737120295009e-07, 6.258737120295009e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:42:05,592] [INFO] [timer.py:215:stop] epoch=0/micro_step=830/global_step=830, RunningAvgSamplesPerSec=51.60849889648195, CurrSamplesPerSec=51.36816773050813, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:42:05,751] [INFO] [logging.py:96:log_dist] [Rank 0] step=830, skipped=14, lr=[3.1528623193564286e-07, 3.1528623193564286e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 829|ppo_ep: 1|act_loss: 0.06610107421875|cri_loss: 0.022186279296875|unsuper_loss: 0.0
average reward score: 0.09521484375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.51%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82
epoch: 0|step: 830|ppo_ep: 1|act_loss: -0.0310821533203125|cri_loss: 0.0281524658203125|unsuper_loss: 0.0
average reward score: 0.420166015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.65%) |Training time=0.79s (31.48%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 831|ppo_ep: 1|act_loss: 0.047149658203125|cri_loss: 0.01361083984375|unsuper_loss: 0.0
average reward score: 0.59765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.72%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 832|ppo_ep: 1|act_loss: 0.0447998046875|cri_loss: 0.021392822265625|unsuper_loss: 0.0
average reward score: -1.0263671875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.72%) |Training time=0.79s (31.49%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82
epoch: 0|step: 833|ppo_ep: 1|act_loss: 0.03436279296875|cri_loss: 0.00853729248046875|unsuper_loss: 0.0
average reward score: 0.73583984375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.78s (31.50%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.82
epoch: 0|step: 834|ppo_ep: 1|act_loss: 0.07049560546875|cri_loss: 0.0280914306640625|unsuper_loss: 0.0
average reward score: 0.76171875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.69%) |Training time=0.79s (31.57%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82
epoch: 0|step: 835|ppo_ep: 1|act_loss: 0.030242919921875|cri_loss: 0.024688720703125|unsuper_loss: 0.0
average reward score: 0.5947265625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.61%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82
epoch: 0|step: 836|ppo_ep: 1|act_loss: 0.0147857666015625|cri_loss: 0.0347900390625|unsuper_loss: 0.0
average reward score: 1.072265625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.42%) |Training time=0.80s (31.82%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 837|ppo_ep: 1|act_loss: -0.00695037841796875|cri_loss: 0.01007843017578125|unsuper_loss: 0.0
average reward score: 1.263671875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.79s (31.79%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 838|ppo_ep: 1|act_loss: -0.02899169921875|cri_loss: 0.0163726806640625|unsuper_loss: 0.0
average reward score: -0.4306640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.55%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82
[2023-07-01 08:42:30,373] [INFO] [logging.py:96:log_dist] [Rank 0] step=840, skipped=16, lr=[5.413089188070959e-07, 5.413089188070959e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:42:30,551] [INFO] [timer.py:215:stop] epoch=0/micro_step=840/global_step=840, RunningAvgSamplesPerSec=51.60365103384502, CurrSamplesPerSec=51.48618054838966, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:42:30,710] [INFO] [logging.py:96:log_dist] [Rank 0] step=840, skipped=14, lr=[2.720663188258199e-07, 2.720663188258199e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 839|ppo_ep: 1|act_loss: -0.01837158203125|cri_loss: 0.0267486572265625|unsuper_loss: 0.0
average reward score: 0.833984375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.54%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82
epoch: 0|step: 840|ppo_ep: 1|act_loss: 0.0253448486328125|cri_loss: 0.050506591796875|unsuper_loss: 0.0
average reward score: 1.953125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.48%) |Training time=0.79s (31.67%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 841|ppo_ep: 1|act_loss: -0.10357666015625|cri_loss: 0.0260772705078125|unsuper_loss: 0.0
average reward score: 1.794921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.48%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 842|ppo_ep: 1|act_loss: -0.1046142578125|cri_loss: 0.1341552734375|unsuper_loss: 0.0
average reward score: 0.62646484375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.54%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 843|ppo_ep: 1|act_loss: 0.09100341796875|cri_loss: 0.03955078125|unsuper_loss: 0.0
average reward score: -0.39794921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.65%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82
epoch: 0|step: 844|ppo_ep: 1|act_loss: -0.08953857421875|cri_loss: 0.035308837890625|unsuper_loss: 0.0
average reward score: 1.5791015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.58%) |Training time=0.79s (31.57%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 845|ppo_ep: 1|act_loss: -0.157958984375|cri_loss: 0.06439208984375|unsuper_loss: 0.0
average reward score: 0.45654296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.46%) |Training time=0.79s (31.71%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82
epoch: 0|step: 846|ppo_ep: 1|act_loss: -0.076171875|cri_loss: 0.04150390625|unsuper_loss: 0.0
average reward score: 1.568359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.47%) |Training time=0.79s (31.74%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 847|ppo_ep: 1|act_loss: -0.0772705078125|cri_loss: 0.03521728515625|unsuper_loss: 0.0
average reward score: 1.546875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.63%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82
epoch: 0|step: 848|ppo_ep: 1|act_loss: -0.0516357421875|cri_loss: 0.033599853515625|unsuper_loss: 0.0
average reward score: -0.73828125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.59%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82
[2023-07-01 08:42:55,362] [INFO] [logging.py:96:log_dist] [Rank 0] step=850, skipped=16, lr=[4.6254045649395126e-07, 4.6254045649395126e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:42:55,539] [INFO] [timer.py:215:stop] epoch=0/micro_step=850/global_step=850, RunningAvgSamplesPerSec=51.598060108699976, CurrSamplesPerSec=51.04028493601603, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:42:55,698] [INFO] [logging.py:96:log_dist] [Rank 0] step=850, skipped=14, lr=[2.3186105841041418e-07, 2.3186105841041418e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 849|ppo_ep: 1|act_loss: -0.11651611328125|cri_loss: 0.045501708984375|unsuper_loss: 0.0
average reward score: 1.197265625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.70%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 850|ppo_ep: 1|act_loss: -0.0482177734375|cri_loss: 0.055572509765625|unsuper_loss: 0.0
average reward score: 1.2587890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.30%) |Training time=0.80s (31.90%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82
epoch: 0|step: 851|ppo_ep: 1|act_loss: -0.025909423828125|cri_loss: 0.0386962890625|unsuper_loss: 0.0
average reward score: 0.958984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.72%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 852|ppo_ep: 1|act_loss: -0.15966796875|cri_loss: 0.06219482421875|unsuper_loss: 0.0
average reward score: 1.181640625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.53%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82
epoch: 0|step: 853|ppo_ep: 1|act_loss: -0.2315673828125|cri_loss: 0.11053466796875|unsuper_loss: 0.0
average reward score: 0.57666015625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.76%) |Training time=0.78s (31.45%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.82
epoch: 0|step: 854|ppo_ep: 1|act_loss: -0.4189453125|cri_loss: 0.18017578125|unsuper_loss: 0.0
average reward score: -2.33984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.42%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82
epoch: 0|step: 855|ppo_ep: 1|act_loss: -0.0657958984375|cri_loss: 0.08935546875|unsuper_loss: 0.0
average reward score: -0.64306640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.62%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 856|ppo_ep: 1|act_loss: -0.0465087890625|cri_loss: 0.051605224609375|unsuper_loss: 0.0
average reward score: -0.497314453125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.75%) |Training time=0.78s (31.44%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82
epoch: 0|step: 857|ppo_ep: 1|act_loss: -0.1798095703125|cri_loss: 0.091796875|unsuper_loss: 0.0
average reward score: 1.6103515625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.40%) |Training time=0.80s (31.81%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 858|ppo_ep: 1|act_loss: -0.178466796875|cri_loss: 0.058807373046875|unsuper_loss: 0.0
average reward score: -0.4541015625
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.40%) |Training time=0.80s (31.80%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82
[2023-07-01 08:43:20,340] [INFO] [logging.py:96:log_dist] [Rank 0] step=860, skipped=16, lr=[3.8967490795613135e-07, 3.8967490795613135e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:43:20,521] [INFO] [timer.py:215:stop] epoch=0/micro_step=860/global_step=860, RunningAvgSamplesPerSec=51.59237982285124, CurrSamplesPerSec=50.91605612772703, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:43:20,681] [INFO] [logging.py:96:log_dist] [Rank 0] step=860, skipped=14, lr=[1.9472485307027945e-07, 1.9472485307027945e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 859|ppo_ep: 1|act_loss: -0.278564453125|cri_loss: 0.06634521484375|unsuper_loss: 0.0
average reward score: -0.243408203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.79s (31.75%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82
epoch: 0|step: 860|ppo_ep: 1|act_loss: -0.2164306640625|cri_loss: 0.0438232421875|unsuper_loss: 0.0
average reward score: 0.44384765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.65%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 861|ppo_ep: 1|act_loss: -0.11895751953125|cri_loss: 0.0361328125|unsuper_loss: 0.0
average reward score: -0.544921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.48%) |Training time=0.79s (31.76%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82
epoch: 0|step: 862|ppo_ep: 1|act_loss: 0.025390625|cri_loss: 0.01168060302734375|unsuper_loss: 0.0
average reward score: 1.1591796875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.56%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82
epoch: 0|step: 863|ppo_ep: 1|act_loss: -0.043853759765625|cri_loss: 0.018585205078125|unsuper_loss: 0.0
average reward score: 1.791015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.41%) |Training time=0.80s (31.84%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 864|ppo_ep: 1|act_loss: 0.0258026123046875|cri_loss: 0.01141357421875|unsuper_loss: 0.0
average reward score: 0.355224609375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.63%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 865|ppo_ep: 1|act_loss: 0.043121337890625|cri_loss: 0.0229034423828125|unsuper_loss: 0.0
average reward score: -0.026123046875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.79s (31.50%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82
epoch: 0|step: 866|ppo_ep: 1|act_loss: -0.0021762847900390625|cri_loss: 0.01274871826171875|unsuper_loss: 0.0
average reward score: 0.68505859375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.61%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82
epoch: 0|step: 867|ppo_ep: 1|act_loss: 0.0406494140625|cri_loss: 0.0157928466796875|unsuper_loss: 0.0
average reward score: 0.3671875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.56%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82
epoch: 0|step: 868|ppo_ep: 1|act_loss: 0.0023136138916015625|cri_loss: 0.0426025390625|unsuper_loss: 0.0
average reward score: 0.7802734375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.63%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
[2023-07-01 08:43:45,314] [INFO] [logging.py:96:log_dist] [Rank 0] step=870, skipped=16, lr=[3.2281086873267354e-07, 3.2281086873267354e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:43:45,490] [INFO] [timer.py:215:stop] epoch=0/micro_step=870/global_step=870, RunningAvgSamplesPerSec=51.5877933467114, CurrSamplesPerSec=51.501807697237, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:43:45,650] [INFO] [logging.py:96:log_dist] [Rank 0] step=870, skipped=14, lr=[1.607079523987662e-07, 1.607079523987662e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 869|ppo_ep: 1|act_loss: 0.033447265625|cri_loss: 0.0211181640625|unsuper_loss: 0.0
average reward score: 0.86181640625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.65%) |Training time=0.79s (31.56%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82
epoch: 0|step: 870|ppo_ep: 1|act_loss: 0.01003265380859375|cri_loss: 0.034332275390625|unsuper_loss: 0.0
average reward score: -1.119140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.70%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82
epoch: 0|step: 871|ppo_ep: 1|act_loss: -0.0030384063720703125|cri_loss: 0.0274658203125|unsuper_loss: 0.0
average reward score: 1.5908203125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.67%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 872|ppo_ep: 1|act_loss: 0.005985260009765625|cri_loss: 0.0111846923828125|unsuper_loss: 0.0
average reward score: 1.9921875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.36%) |Training time=0.80s (31.78%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82
epoch: 0|step: 873|ppo_ep: 1|act_loss: -0.0399169921875|cri_loss: 0.01165771484375|unsuper_loss: 0.0
average reward score: 1.345703125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.24%) |Training time=0.80s (31.95%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.82
epoch: 0|step: 874|ppo_ep: 1|act_loss: -0.0528564453125|cri_loss: 0.0241241455078125|unsuper_loss: 0.0
average reward score: 2.185546875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.36%) |Training time=0.80s (31.86%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82
epoch: 0|step: 875|ppo_ep: 1|act_loss: 0.0750732421875|cri_loss: 0.0255889892578125|unsuper_loss: 0.0
average reward score: 0.236328125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.43%) |Training time=0.79s (31.76%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 876|ppo_ep: 1|act_loss: -0.0244140625|cri_loss: 0.01404571533203125|unsuper_loss: 0.0
average reward score: 1.009765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.50%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 877|ppo_ep: 1|act_loss: -0.1212158203125|cri_loss: 0.0931396484375|unsuper_loss: 0.0
average reward score: 1.9169921875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.80s (31.73%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82
epoch: 0|step: 878|ppo_ep: 1|act_loss: -0.0438232421875|cri_loss: 0.015869140625|unsuper_loss: 0.0
average reward score: -1.5380859375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.54%) |Training time=0.79s (31.65%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82
[2023-07-01 08:44:10,336] [INFO] [logging.py:96:log_dist] [Rank 0] step=880, skipped=16, lr=[2.6203881362437934e-07, 2.6203881362437934e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:44:10,513] [INFO] [timer.py:215:stop] epoch=0/micro_step=880/global_step=880, RunningAvgSamplesPerSec=51.57930822120318, CurrSamplesPerSec=50.97633293859113, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:44:10,673] [INFO] [logging.py:96:log_dist] [Rank 0] step=880, skipped=14, lr=[1.298563852081905e-07, 1.298563852081905e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 879|ppo_ep: 1|act_loss: -0.07330322265625|cri_loss: 0.0267181396484375|unsuper_loss: 0.0
average reward score: 0.132568359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.71%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 880|ppo_ep: 1|act_loss: -0.05224609375|cri_loss: 0.0175323486328125|unsuper_loss: 0.0
average reward score: 0.99609375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.95%) |Training time=0.78s (31.29%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.82
epoch: 0|step: 881|ppo_ep: 1|act_loss: 0.037750244140625|cri_loss: 0.0272369384765625|unsuper_loss: 0.0
average reward score: 2.283203125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.80%) |Training time=0.78s (31.33%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.82
epoch: 0|step: 882|ppo_ep: 1|act_loss: -0.028076171875|cri_loss: 0.018157958984375|unsuper_loss: 0.0
average reward score: 1.2802734375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.69%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 883|ppo_ep: 1|act_loss: 0.0125579833984375|cri_loss: 0.03961181640625|unsuper_loss: 0.0
average reward score: 0.75537109375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.65%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82
epoch: 0|step: 884|ppo_ep: 1|act_loss: -0.0096435546875|cri_loss: 0.01611328125|unsuper_loss: 0.0
average reward score: -1.69921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.67%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82
epoch: 0|step: 885|ppo_ep: 1|act_loss: -0.038299560546875|cri_loss: 0.0307769775390625|unsuper_loss: 0.0
average reward score: 0.420166015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.51%) |Training time=0.79s (31.73%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82
epoch: 0|step: 886|ppo_ep: 1|act_loss: 0.074951171875|cri_loss: 0.037750244140625|unsuper_loss: 0.0
average reward score: -0.012451171875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.65%) |Training time=0.79s (31.52%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 887|ppo_ep: 1|act_loss: -0.0635986328125|cri_loss: 0.09283447265625|unsuper_loss: 0.0
average reward score: 1.556640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.56%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82
epoch: 0|step: 888|ppo_ep: 1|act_loss: 0.02508544921875|cri_loss: 0.0252838134765625|unsuper_loss: 0.0
average reward score: -0.7734375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.57%) |Training time=0.79s (31.66%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
[2023-07-01 08:44:35,288] [INFO] [logging.py:96:log_dist] [Rank 0] step=890, skipped=16, lr=[2.0744097427091748e-07, 2.0744097427091748e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:44:35,468] [INFO] [timer.py:215:stop] epoch=0/micro_step=890/global_step=890, RunningAvgSamplesPerSec=51.57652001427069, CurrSamplesPerSec=50.996631323297976, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:44:35,627] [INFO] [logging.py:96:log_dist] [Rank 0] step=890, skipped=14, lr=[1.0221189724751502e-07, 1.0221189724751502e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 889|ppo_ep: 1|act_loss: -0.0243377685546875|cri_loss: 0.0635986328125|unsuper_loss: 0.0
average reward score: -0.226318359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.71%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 890|ppo_ep: 1|act_loss: -0.000568389892578125|cri_loss: 0.021392822265625|unsuper_loss: 0.0
average reward score: 0.23974609375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.57%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82
epoch: 0|step: 891|ppo_ep: 1|act_loss: -0.0159454345703125|cri_loss: 0.022857666015625|unsuper_loss: 0.0
average reward score: 0.48779296875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.80s (31.79%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82
epoch: 0|step: 892|ppo_ep: 1|act_loss: -0.0914306640625|cri_loss: 0.0703125|unsuper_loss: 0.0
average reward score: -0.37548828125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.34%) |Training time=0.80s (31.86%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82
epoch: 0|step: 893|ppo_ep: 1|act_loss: -0.045745849609375|cri_loss: 0.02825927734375|unsuper_loss: 0.0
average reward score: 0.36669921875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.71%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 894|ppo_ep: 1|act_loss: -0.035736083984375|cri_loss: 0.018524169921875|unsuper_loss: 0.0
average reward score: 1.7275390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.61%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 895|ppo_ep: 1|act_loss: -0.005207061767578125|cri_loss: 0.0181121826171875|unsuper_loss: 0.0
average reward score: 0.04931640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.66%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 896|ppo_ep: 1|act_loss: 0.031280517578125|cri_loss: 0.0194244384765625|unsuper_loss: 0.0
average reward score: 1.267578125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.61%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 897|ppo_ep: 1|act_loss: 0.02276611328125|cri_loss: 0.0172882080078125|unsuper_loss: 0.0
average reward score: -1.1484375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.53%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82
epoch: 0|step: 898|ppo_ep: 1|act_loss: -0.0015802383422851562|cri_loss: 0.00897216796875|unsuper_loss: 0.0
average reward score: 0.467041015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.74%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
[2023-07-01 08:45:00,289] [INFO] [logging.py:96:log_dist] [Rank 0] step=900, skipped=16, lr=[1.590912278818792e-07, 1.590912278818792e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:45:00,465] [INFO] [timer.py:215:stop] epoch=0/micro_step=900/global_step=900, RunningAvgSamplesPerSec=51.570055453634914, CurrSamplesPerSec=50.599642682512595, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:45:00,626] [INFO] [logging.py:96:log_dist] [Rank 0] step=900, skipped=14, lr=[7.781189471550543e-08, 7.781189471550543e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 899|ppo_ep: 1|act_loss: -0.060302734375|cri_loss: 0.026123046875|unsuper_loss: 0.0
average reward score: 0.7802734375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.33%) |Training time=0.80s (31.87%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 900|ppo_ep: 1|act_loss: -0.0238494873046875|cri_loss: 0.0219268798828125|unsuper_loss: 0.0
average reward score: -0.389892578125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.79s (31.47%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82
epoch: 0|step: 901|ppo_ep: 1|act_loss: -0.0010786056518554688|cri_loss: 0.00994873046875|unsuper_loss: 0.0
average reward score: 0.892578125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.54%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 902|ppo_ep: 1|act_loss: 0.0167388916015625|cri_loss: 0.01763916015625|unsuper_loss: 0.0
average reward score: 0.5302734375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.71%) |Training time=0.79s (31.52%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82
epoch: 0|step: 903|ppo_ep: 1|act_loss: 0.0302581787109375|cri_loss: 0.0230712890625|unsuper_loss: 0.0
average reward score: 1.37890625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.55%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82
epoch: 0|step: 904|ppo_ep: 1|act_loss: -0.052398681640625|cri_loss: 0.052215576171875|unsuper_loss: 0.0
average reward score: 0.705078125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.59%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 905|ppo_ep: 1|act_loss: 0.002933502197265625|cri_loss: 0.016021728515625|unsuper_loss: 0.0
average reward score: -0.1961669921875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.26%) |Training time=0.80s (31.94%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.76 |AvgSamplesPerSec=12.82
epoch: 0|step: 906|ppo_ep: 1|act_loss: -0.050933837890625|cri_loss: 0.0238189697265625|unsuper_loss: 0.0
average reward score: 1.1953125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.11%) |Training time=0.81s (32.11%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.75 |AvgSamplesPerSec=12.82
epoch: 0|step: 907|ppo_ep: 1|act_loss: 0.0265960693359375|cri_loss: 0.0195159912109375|unsuper_loss: 0.0
average reward score: -0.09381103515625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.51%) |Training time=0.79s (31.70%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82
epoch: 0|step: 908|ppo_ep: 1|act_loss: 0.0308837890625|cri_loss: 0.0213623046875|unsuper_loss: 0.0
average reward score: 1.70703125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.55%) |Training time=0.79s (31.63%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
[2023-07-01 08:45:25,270] [INFO] [logging.py:96:log_dist] [Rank 0] step=910, skipped=16, lr=[1.1705499727233991e-07, 1.1705499727233991e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:45:25,449] [INFO] [timer.py:215:stop] epoch=0/micro_step=910/global_step=910, RunningAvgSamplesPerSec=51.564973884615625, CurrSamplesPerSec=51.461879529159155, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:45:25,609] [INFO] [logging.py:96:log_dist] [Rank 0] step=910, skipped=14, lr=[5.6689393645807666e-08, 5.6689393645807666e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 909|ppo_ep: 1|act_loss: -0.0219268798828125|cri_loss: 0.01120758056640625|unsuper_loss: 0.0
average reward score: 0.9296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.64%) |Training time=0.79s (31.50%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 910|ppo_ep: 1|act_loss: -0.00641632080078125|cri_loss: 0.00908660888671875|unsuper_loss: 0.0
average reward score: 1.1748046875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.50%) |Training time=0.79s (31.60%) |Others=0.22 (8.90%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82
epoch: 0|step: 911|ppo_ep: 1|act_loss: 0.07330322265625|cri_loss: 0.026702880859375|unsuper_loss: 0.0
average reward score: 0.471923828125
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.49%) |Training time=0.80s (31.76%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82
epoch: 0|step: 912|ppo_ep: 1|act_loss: -0.0416259765625|cri_loss: 0.034393310546875|unsuper_loss: 0.0
average reward score: 0.239501953125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.58%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82
epoch: 0|step: 913|ppo_ep: 1|act_loss: -0.057342529296875|cri_loss: 0.0198822021484375|unsuper_loss: 0.0
average reward score: -0.6708984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.68%) |Training time=0.79s (31.56%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 914|ppo_ep: 1|act_loss: 0.0325927734375|cri_loss: 0.0117340087890625|unsuper_loss: 0.0
average reward score: -0.008056640625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.72%) |Training time=0.79s (31.48%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 915|ppo_ep: 1|act_loss: 0.057861328125|cri_loss: 0.02679443359375|unsuper_loss: 0.0
average reward score: 1.568359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.53%) |Training time=0.79s (31.66%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 916|ppo_ep: 1|act_loss: -0.0123748779296875|cri_loss: 0.0109405517578125|unsuper_loss: 0.0
average reward score: 1.1484375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.67%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82
epoch: 0|step: 917|ppo_ep: 1|act_loss: 0.01230621337890625|cri_loss: 0.01358795166015625|unsuper_loss: 0.0
average reward score: 0.7021484375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.87%) |Training time=0.78s (31.33%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.82
epoch: 0|step: 918|ppo_ep: 1|act_loss: -0.0010576248168945312|cri_loss: 0.01271820068359375|unsuper_loss: 0.0
average reward score: 0.3583984375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.59%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
[2023-07-01 08:45:50,256] [INFO] [logging.py:96:log_dist] [Rank 0] step=920, skipped=16, lr=[8.13891623382061e-08, 8.13891623382061e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:45:50,434] [INFO] [timer.py:215:stop] epoch=0/micro_step=920/global_step=920, RunningAvgSamplesPerSec=51.560951397224706, CurrSamplesPerSec=50.731532967172114, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:45:50,594] [INFO] [logging.py:96:log_dist] [Rank 0] step=920, skipped=14, lr=[3.887297523242184e-08, 3.887297523242184e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 919|ppo_ep: 1|act_loss: 0.009857177734375|cri_loss: 0.019012451171875|unsuper_loss: 0.0
average reward score: 0.151123046875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.40%) |Training time=0.80s (31.77%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82
epoch: 0|step: 920|ppo_ep: 1|act_loss: -0.019256591796875|cri_loss: 0.01953125|unsuper_loss: 0.0
average reward score: 1.4775390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.70%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82
epoch: 0|step: 921|ppo_ep: 1|act_loss: 0.057342529296875|cri_loss: 0.0191802978515625|unsuper_loss: 0.0
average reward score: 0.003662109375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.61%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82
epoch: 0|step: 922|ppo_ep: 1|act_loss: 0.06622314453125|cri_loss: 0.0276641845703125|unsuper_loss: 0.0
average reward score: 1.396484375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.60%) |Training time=0.79s (31.61%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82
epoch: 0|step: 923|ppo_ep: 1|act_loss: -0.037200927734375|cri_loss: 0.0157318115234375|unsuper_loss: 0.0
average reward score: 1.4443359375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.58%) |Training time=0.79s (31.57%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 924|ppo_ep: 1|act_loss: -0.0157012939453125|cri_loss: 0.0157928466796875|unsuper_loss: 0.0
average reward score: 0.2183837890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.51%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 925|ppo_ep: 1|act_loss: -0.0152130126953125|cri_loss: 0.0086669921875|unsuper_loss: 0.0
average reward score: 0.62841796875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.43%) |Training time=0.80s (31.79%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82
epoch: 0|step: 926|ppo_ep: 1|act_loss: 0.04119873046875|cri_loss: 0.0268096923828125|unsuper_loss: 0.0
average reward score: 1.16015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.54%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 927|ppo_ep: 1|act_loss: 0.02349853515625|cri_loss: 0.0118255615234375|unsuper_loss: 0.0
average reward score: 0.32275390625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.63%) |Others=0.22 (8.77%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82
epoch: 0|step: 928|ppo_ep: 1|act_loss: -0.035491943359375|cri_loss: 0.026763916015625|unsuper_loss: 0.0
average reward score: -0.451416015625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.49%) |Training time=0.79s (31.70%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82
[2023-07-01 08:46:15,231] [INFO] [logging.py:96:log_dist] [Rank 0] step=930, skipped=16, lr=[5.2141983091115555e-08, 5.2141983091115555e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:46:15,407] [INFO] [timer.py:215:stop] epoch=0/micro_step=930/global_step=930, RunningAvgSamplesPerSec=51.55690491493285, CurrSamplesPerSec=51.13309535329369, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:46:15,568] [INFO] [logging.py:96:log_dist] [Rank 0] step=930, skipped=14, lr=[2.4386747156034395e-08, 2.4386747156034395e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 929|ppo_ep: 1|act_loss: -0.0701904296875|cri_loss: 0.0291595458984375|unsuper_loss: 0.0
average reward score: -0.09228515625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.52%) |Training time=0.79s (31.64%) |Others=0.22 (8.84%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82
epoch: 0|step: 930|ppo_ep: 1|act_loss: -0.0335693359375|cri_loss: 0.01067352294921875|unsuper_loss: 0.0
average reward score: 1.45703125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.55%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.82
epoch: 0|step: 931|ppo_ep: 1|act_loss: -0.0938720703125|cri_loss: 0.0655517578125|unsuper_loss: 0.0
average reward score: 0.4677734375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.75%) |Training time=0.78s (31.47%) |Others=0.22 (8.78%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82
epoch: 0|step: 932|ppo_ep: 1|act_loss: 0.0243988037109375|cri_loss: 0.0057373046875|unsuper_loss: 0.0
average reward score: 0.1580810546875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.74%) |Training time=0.78s (31.43%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82
epoch: 0|step: 933|ppo_ep: 1|act_loss: -0.06317138671875|cri_loss: 0.03302001953125|unsuper_loss: 0.0
average reward score: 1.09765625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.38%) |Training time=0.80s (31.80%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.78 |AvgSamplesPerSec=12.82
epoch: 0|step: 934|ppo_ep: 1|act_loss: 0.0249786376953125|cri_loss: 0.00763702392578125|unsuper_loss: 0.0
average reward score: 0.52197265625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.74%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 935|ppo_ep: 1|act_loss: 0.0228118896484375|cri_loss: 0.0472412109375|unsuper_loss: 0.0
average reward score: -1.62890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.61%) |Training time=0.79s (31.52%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 936|ppo_ep: 1|act_loss: -0.049560546875|cri_loss: 0.0103912353515625|unsuper_loss: 0.0
average reward score: 1.54296875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.67%) |Training time=0.79s (31.54%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 937|ppo_ep: 1|act_loss: -0.04327392578125|cri_loss: 0.0217437744140625|unsuper_loss: 0.0
average reward score: -0.12451171875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.70%) |Training time=0.79s (31.45%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.79 |AvgSamplesPerSec=12.82
epoch: 0|step: 938|ppo_ep: 1|act_loss: 0.04107666015625|cri_loss: 0.016998291015625|unsuper_loss: 0.0
average reward score: 0.767578125
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.44%) |Training time=0.80s (31.75%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82
[2023-07-01 08:46:40,218] [INFO] [logging.py:96:log_dist] [Rank 0] step=940, skipped=16, lr=[2.935303435704569e-08, 2.935303435704569e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:46:40,398] [INFO] [timer.py:215:stop] epoch=0/micro_step=940/global_step=940, RunningAvgSamplesPerSec=51.55280382459751, CurrSamplesPerSec=50.4776799280318, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:46:40,557] [INFO] [logging.py:96:log_dist] [Rank 0] step=940, skipped=14, lr=[1.3250310963527358e-08, 1.3250310963527358e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 939|ppo_ep: 1|act_loss: 0.01678466796875|cri_loss: 0.01320648193359375|unsuper_loss: 0.0
average reward score: -0.998046875
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.35%) |Training time=0.80s (31.89%) |Others=0.22 (8.76%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82
epoch: 0|step: 940|ppo_ep: 1|act_loss: -0.016937255859375|cri_loss: 0.037261962890625|unsuper_loss: 0.0
average reward score: 0.9150390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.63%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 941|ppo_ep: 1|act_loss: -0.07684326171875|cri_loss: 0.05615234375|unsuper_loss: 0.0
average reward score: -1.2978515625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.63%) |Training time=0.79s (31.54%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.82
epoch: 0|step: 942|ppo_ep: 1|act_loss: 0.015228271484375|cri_loss: 0.00942230224609375|unsuper_loss: 0.0
average reward score: 1.7900390625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.59%) |Training time=0.79s (31.56%) |Others=0.22 (8.85%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 943|ppo_ep: 1|act_loss: -0.0157012939453125|cri_loss: 0.00725555419921875|unsuper_loss: 0.0
average reward score: 0.22119140625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.56%) |Training time=0.79s (31.62%) |Others=0.22 (8.82%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82
epoch: 0|step: 944|ppo_ep: 1|act_loss: -0.030517578125|cri_loss: 0.0161285400390625|unsuper_loss: 0.0
average reward score: 0.79736328125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.62%) |Training time=0.79s (31.64%) |Others=0.22 (8.74%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82
epoch: 0|step: 945|ppo_ep: 1|act_loss: -0.0014429092407226562|cri_loss: 0.016357421875|unsuper_loss: 0.0
average reward score: 0.77783203125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.58%) |Training time=0.79s (31.63%) |Others=0.22 (8.79%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82
epoch: 0|step: 946|ppo_ep: 1|act_loss: -0.004383087158203125|cri_loss: 0.0068206787109375|unsuper_loss: 0.0
average reward score: -0.0230712890625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.48s (59.40%) |Training time=0.79s (31.79%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82
epoch: 0|step: 947|ppo_ep: 1|act_loss: -0.0028743743896484375|cri_loss: 0.0187835693359375|unsuper_loss: 0.0
average reward score: 0.63818359375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.66%) |Training time=0.79s (31.54%) |Others=0.22 (8.80%)|CurSamplesPerSec=12.83 |AvgSamplesPerSec=12.82
epoch: 0|step: 948|ppo_ep: 1|act_loss: -0.017242431640625|cri_loss: 0.0217742919921875|unsuper_loss: 0.0
average reward score: 0.158935546875
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.65%) |Training time=0.79s (31.48%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.82
[2023-07-01 08:47:05,179] [INFO] [logging.py:96:log_dist] [Rank 0] step=950, skipped=16, lr=[1.3053152226982066e-08, 1.3053152226982066e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:47:05,355] [INFO] [timer.py:215:stop] epoch=0/micro_step=950/global_step=950, RunningAvgSamplesPerSec=51.55028020879753, CurrSamplesPerSec=51.49623536272809, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:47:05,513] [INFO] [logging.py:96:log_dist] [Rank 0] step=950, skipped=14, lr=[5.478735544813263e-09, 5.478735544813263e-09], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 949|ppo_ep: 1|act_loss: 0.01763916015625|cri_loss: 0.0254058837890625|unsuper_loss: 0.0
average reward score: 0.6220703125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.69%) |Training time=0.79s (31.56%) |Others=0.22 (8.75%)|CurSamplesPerSec=12.84 |AvgSamplesPerSec=12.82
epoch: 0|step: 950|ppo_ep: 1|act_loss: 0.014923095703125|cri_loss: 0.01549530029296875|unsuper_loss: 0.0
average reward score: 0.87109375
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.99%) |Training time=0.78s (31.20%) |Others=0.22 (8.81%)|CurSamplesPerSec=12.87 |AvgSamplesPerSec=12.82
epoch: 0|step: 951|ppo_ep: 1|act_loss: 0.09979248046875|cri_loss: 0.053955078125|unsuper_loss: 0.0
average reward score: 1.1416015625
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.45%) |Training time=0.79s (31.72%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=12.82
epoch: 0|step: 952|ppo_ep: 1|act_loss: -0.037689208984375|cri_loss: 0.0236358642578125|unsuper_loss: 0.0
average reward score: 1.1552734375
-------------------------------------------------------------------------------------
|E2E latency=2.51s |Gather latency=0.00s (0.00%) |Generate time=1.49s (59.36%) |Training time=0.80s (31.81%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.77 |AvgSamplesPerSec=12.82
epoch: 0|step: 953|ppo_ep: 1|act_loss: -0.0743408203125|cri_loss: 0.01526641845703125|unsuper_loss: 0.0
average reward score: 1.4033203125
-------------------------------------------------------------------------------------
saving model ...saving model ...saving model ...saving model ...saving model ...saving model ...saving model ...
saving model ...


[2023-07-01 08:47:19,072] [INFO] [launch.py:346:main] Process 8928 exits successfully.
[2023-07-01 08:47:19,072] [INFO] [launch.py:346:main] Process 8930 exits successfully.
[2023-07-01 08:47:19,072] [INFO] [launch.py:346:main] Process 8929 exits successfully.
[2023-07-01 08:47:19,072] [INFO] [launch.py:346:main] Process 8931 exits successfully.
[2023-07-01 08:47:20,074] [INFO] [launch.py:346:main] Process 8925 exits successfully.
[2023-07-01 08:47:20,074] [INFO] [launch.py:346:main] Process 8926 exits successfully.
[2023-07-01 08:47:20,074] [INFO] [launch.py:346:main] Process 8927 exits successfully.
[2023-07-01 08:47:27,082] [INFO] [launch.py:346:main] Process 8924 exits successfully.