Setting ds_accelerator to cuda (auto detect)
[2023-07-01 08:03:29,654] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Detected CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7: setting --include=localhost:0,1,2,3,4,5,6,7
[2023-07-01 08:03:29,722] [INFO] [runner.py:555:main] cmd = /home/zhaiyuanzhao/anaconda3/envs/RLHF/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=12346 --enable_each_rank_log=None main.py --data_path /home/zhaiyuanzhao/llm/dataset/rm-static/data --data_split 2,4,4 --actor_model_name_or_path /home/zhaiyuanzhao/code/DeepSpeedExamples-4datasets/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/output-1.3b --critic_model_name_or_path /home/zhaiyuanzhao/code/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/output-RM --num_padding_at_beginning 1 --per_device_train_batch_size 4 --per_device_mini_train_batch_size 4 --generation_batch_numbers 1 --ppo_epochs 1 --max_answer_seq_len 256 --max_prompt_seq_len 256 --actor_learning_rate 9.65e-6 --critic_learning_rate 5e-6 --num_train_epochs 1 --lr_scheduler_type cosine --gradient_accumulation_steps 1 --disable_actor_dropout --num_warmup_steps 100 --deepspeed --seed 1234 --enable_hybrid_engine --actor_zero_stage 2 --critic_zero_stage 2 --enable_ema --output_dir ./output-1.3b-RM_350m-nokl --kl_ctl 0
Setting ds_accelerator to cuda (auto detect)
[2023-07-01 08:03:32,113] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2023-07-01 08:03:32,113] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=8, node_rank=0
[2023-07-01 08:03:32,114] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2023-07-01 08:03:32,114] [INFO] [launch.py:163:main] dist_world_size=8
[2023-07-01 08:03:32,114] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
Setting ds_accelerator to cuda (auto detect)
Setting ds_accelerator to cuda (auto detect)
Setting ds_accelerator to cuda (auto detect)
Setting ds_accelerator to cuda (auto detect)
Setting ds_accelerator to cuda (auto detect)
Setting ds_accelerator to cuda (auto detect)
Setting ds_accelerator to cuda (auto detect)
Setting ds_accelerator to cuda (auto detect)
[2023-07-01 08:03:57,552] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-01 08:03:57,552] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-01 08:03:57,583] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-01 08:03:57,584] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-01 08:03:57,584] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2023-07-01 08:03:57,629] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-01 08:03:57,629] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-01 08:03:57,661] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-01 08:03:57,661] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-01 08:03:57,682] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-01 08:03:57,682] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-01 08:03:57,697] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-01 08:03:57,697] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-01 08:03:57,708] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-01 08:03:57,708] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-01 08:03:57,710] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-01 08:03:57,710] [INFO] [comm.py:594:init_distributed] cdb=None
Found cached dataset parquet (/home/zhaiyuanzhao/.cache/huggingface/datasets/parquet/default-d09980a08a1dbd7c/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
  0%|          | 0/2 [00:00<?, ?it/s] 50%|█████     | 1/2 [00:00<00:00,  1.72it/s]100%|██████████| 2/2 [00:00<00:00,  3.11it/s]
************************[start] Initializing Actor Model [start] *************************
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combinationInstalled CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combinationInstalled CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination


Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...


Loading extension module fused_adam...
Loading extension module fused_adam...Loading extension module fused_adam...Loading extension module fused_adam...Loading extension module fused_adam...Loading extension module fused_adam...Loading extension module fused_adam...
Loading extension module fused_adam...


Time to load fused_adam op: 0.8045144081115723 secondsTime to load fused_adam op: 0.8174877166748047 secondsTime to load fused_adam op: 0.8151278495788574 seconds
Time to load fused_adam op: 0.8147773742675781 secondsTime to load fused_adam op: 0.8157918453216553 seconds
Time to load fused_adam op: 0.8176009654998779 seconds
Time to load fused_adam op: 0.8175563812255859 seconds
Time to load fused_adam op: 0.8175740242004395 seconds


[2023-07-01 08:05:17,068] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.3, git-hash=unknown, git-branch=unknown
[2023-07-01 08:05:28,556] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-07-01 08:05:28,558] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer
[2023-07-01 08:05:28,558] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2023-07-01 08:05:28,577] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam
[2023-07-01 08:05:28,577] [INFO] [utils.py:54:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'deepspeed.ops.adam.fused_adam.FusedAdam'>
[2023-07-01 08:05:28,577] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer
[2023-07-01 08:05:28,577] [INFO] [stage_1_and_2.py:133:__init__] Reduce bucket size 500,000,000
[2023-07-01 08:05:28,577] [INFO] [stage_1_and_2.py:134:__init__] Allgather bucket size 500,000,000
[2023-07-01 08:05:28,577] [INFO] [stage_1_and_2.py:135:__init__] CPU Offload: False
[2023-07-01 08:05:28,577] [INFO] [stage_1_and_2.py:136:__init__] Round robin gradient partitioning: False
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...

Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Loading extension module utils...
Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils...
Loading extension module utils...
Loading extension module utils...Loading extension module utils...


Time to load utils op: 0.581378698348999 secondsTime to load utils op: 0.5792138576507568 secondsTime to load utils op: 0.5809986591339111 secondsTime to load utils op: 0.5794942378997803 secondsTime to load utils op: 0.5797784328460693 secondsTime to load utils op: 0.5806789398193359 seconds
Time to load utils op: 0.5814478397369385 seconds
Time to load utils op: 0.5814688205718994 seconds


Rank: 1 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 6 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 7 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 2 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 5 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 3 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 4 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Rank: 0 partition count [8, 8] and sizes[(164401920, False), (67840, False)] 
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...

No modifications detected for re-loaded extension module utils, skipping build step...Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...

Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Loading extension module utils...
Time to load utils op: 0.0009922981262207031 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...

Loading extension module utils...
Time to load utils op: 0.0007748603820800781 seconds
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0008537769317626953 seconds
No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step...

Loading extension module utils...Loading extension module utils...

Time to load utils op: 0.0007731914520263672 seconds
Time to load utils op: 0.0008482933044433594 seconds
Time to load utils op: 0.0007557868957519531 seconds
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.002641916275024414 seconds
[2023-07-01 08:05:38,577] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states
[2023-07-01 08:05:38,578] [INFO] [utils.py:786:see_memory_usage] MA 3.06 GB         Max_MA 3.06 GB         CA 3.07 GB         Max_CA 3 GB 
[2023-07-01 08:05:38,578] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 37.52 GB, percent = 3.7%
[2023-07-01 08:05:38,729] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states
[2023-07-01 08:05:38,730] [INFO] [utils.py:786:see_memory_usage] MA 4.29 GB         Max_MA 4.91 GB         CA 4.91 GB         Max_CA 5 GB 
[2023-07-01 08:05:38,730] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 37.52 GB, percent = 3.7%
[2023-07-01 08:05:38,730] [INFO] [stage_1_and_2.py:489:__init__] optimizer state initialized
[2023-07-01 08:05:38,872] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer
[2023-07-01 08:05:38,873] [INFO] [utils.py:786:see_memory_usage] MA 4.29 GB         Max_MA 4.29 GB         CA 4.91 GB         Max_CA 5 GB 
[2023-07-01 08:05:38,873] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 37.52 GB, percent = 3.7%
[2023-07-01 08:05:38,875] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2023-07-01 08:05:38,875] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2023-07-01 08:05:38,875] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x2b494ff536a0>
[2023-07-01 08:05:38,875] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:05:38,876] [INFO] [config.py:960:print] DeepSpeedEngine configuration:
[2023-07-01 08:05:38,876] [INFO] [config.py:964:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2023-07-01 08:05:38,876] [INFO] [config.py:964:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-07-01 08:05:38,876] [INFO] [config.py:964:print]   amp_enabled .................. False
[2023-07-01 08:05:38,876] [INFO] [config.py:964:print]   amp_params ................... False
[2023-07-01 08:05:38,876] [INFO] [config.py:964:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   bfloat16_enabled ............. False
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   checkpoint_parallel_write_pipeline  False
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   checkpoint_tag_validation_enabled  True
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   checkpoint_tag_validation_fail  False
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x2b4957b11e80>
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   communication_data_type ...... None
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   curriculum_enabled_legacy .... False
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   curriculum_params_legacy ..... False
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   data_efficiency_enabled ...... False
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   dataloader_drop_last ......... False
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   disable_allgather ............ False
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   dump_state ................... False
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1}
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   eigenvalue_enabled ........... False
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   eigenvalue_gas_boundary_resolution  1
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   eigenvalue_layer_num ......... 0
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   eigenvalue_max_iter .......... 100
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   eigenvalue_stability ......... 1e-06
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   eigenvalue_tol ............... 0.01
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   eigenvalue_verbose ........... False
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   elasticity_enabled ........... False
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   fp16_auto_cast ............... False
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   fp16_enabled ................. True
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   fp16_master_weights_and_gradients  False
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   global_rank .................. 0
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   grad_accum_dtype ............. None
[2023-07-01 08:05:38,877] [INFO] [config.py:964:print]   gradient_accumulation_steps .. 1
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   gradient_clipping ............ 1.0
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   gradient_predivide_factor .... 1.0
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   hybrid_engine ................ enabled=True max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   initial_dynamic_scale ........ 65536
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   load_universal_checkpoint .... False
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   loss_scale ................... 0
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   memory_breakdown ............. False
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   mics_hierarchial_params_gather  False
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   mics_shard_size .............. -1
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   optimizer_legacy_fusion ...... False
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   optimizer_name ............... None
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   optimizer_params ............. None
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   pld_enabled .................. False
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   pld_params ................... False
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   prescale_gradients ........... False
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   scheduler_name ............... None
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   scheduler_params ............. None
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   sparse_attention ............. None
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   sparse_gradients_enabled ..... False
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   steps_per_print .............. 10
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   train_batch_size ............. 32
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   train_micro_batch_size_per_gpu  4
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   use_node_local_storage ....... False
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   wall_clock_breakdown ......... False
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   world_size ................... 8
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   zero_allow_untested_optimizer  False
[2023-07-01 08:05:38,878] [INFO] [config.py:964:print]   zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False
[2023-07-01 08:05:38,879] [INFO] [config.py:964:print]   zero_enabled ................. True
[2023-07-01 08:05:38,879] [INFO] [config.py:964:print]   zero_force_ds_cpu_optimizer .. True
[2023-07-01 08:05:38,879] [INFO] [config.py:964:print]   zero_optimization_stage ...... 2
[2023-07-01 08:05:38,879] [INFO] [config.py:950:print_user_config]   json = {
    "train_batch_size": 32, 
    "train_micro_batch_size_per_gpu": 4, 
    "steps_per_print": 10, 
    "zero_optimization": {
        "stage": 2, 
        "offload_param": {
            "device": "none"
        }, 
        "offload_optimizer": {
            "device": "none"
        }, 
        "stage3_param_persistence_threshold": 1.000000e+04, 
        "stage3_max_live_parameters": 3.000000e+07, 
        "stage3_prefetch_bucket_size": 3.000000e+07, 
        "memory_efficient_linear": false
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale_window": 100
    }, 
    "gradient_clipping": 1.0, 
    "prescale_gradients": false, 
    "wall_clock_breakdown": false, 
    "hybrid_engine": {
        "enabled": true, 
        "max_out_tokens": 512, 
        "inference_tp_size": 1, 
        "release_inference_cache": false, 
        "pin_parameters": true, 
        "tp_gather_partition_size": 8
    }
}
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0009009838104248047 seconds
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117/transformer_inference/build.ninja...
Building extension module transformer_inference...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module transformer_inference...
Loading extension module transformer_inference...Loading extension module transformer_inference...Loading extension module transformer_inference...Loading extension module transformer_inference...
Loading extension module transformer_inference...


Loading extension module transformer_inference...Loading extension module transformer_inference...

Time to load transformer_inference op: 1.1420021057128906 secondsTime to load transformer_inference op: 1.1416473388671875 secondsTime to load transformer_inference op: 1.1401557922363281 seconds
Time to load transformer_inference op: 1.128908395767212 seconds

Time to load transformer_inference op: 1.1238932609558105 seconds

Time to load transformer_inference op: 1.1305203437805176 seconds
Time to load transformer_inference op: 1.1278560161590576 seconds
Time to load transformer_inference op: 1.1280443668365479 seconds
[2023-07-01 08:05:40,500] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 2048, 'intermediate_size': 8192, 'heads': 32, 'num_hidden_layers': -1, 'dtype': torch.float16, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.ReLU: 2>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 512, 'min_out_tokens': 512, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': True, 'transposed_mode': True}
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.04882502555847168 seconds
Time to load transformer_inference op: 0.04806041717529297 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.05038714408874512 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.05112457275390625 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.05184316635131836 seconds
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.05165362358093262 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.05492138862609863 seconds
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.05556344985961914 seconds
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Time to load transformer_inference op: 0.04999732971191406 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...No modifications detected for re-loaded extension module transformer_inference, skipping build step...

Loading extension module transformer_inference...
Time to load transformer_inference op: 0.04800009727478027 seconds
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.0471189022064209 seconds
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.04591679573059082 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.05129432678222656 secondsUsing /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...

No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.05254411697387695 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.05318403244018555 seconds
******************[end] Initialized Actor Model [end] (duration: 49.50s)******************
*************************[start] Initializing Ref Model [start] **************************
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module transformer_inference, skipping build step...
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.055281877517700195 seconds
model loaded
[2023-07-01 08:05:57,456] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.3, git-hash=unknown, git-branch=unknown
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
[2023-07-01 08:06:08,170] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-07-01 08:06:08,172] [INFO] [config.py:960:print] DeepSpeedEngine configuration:
[2023-07-01 08:06:08,173] [INFO] [config.py:964:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2023-07-01 08:06:08,173] [INFO] [config.py:964:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-07-01 08:06:08,173] [INFO] [config.py:964:print]   amp_enabled .................. False
[2023-07-01 08:06:08,173] [INFO] [config.py:964:print]   amp_params ................... False
[2023-07-01 08:06:08,173] [INFO] [config.py:964:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2023-07-01 08:06:08,173] [INFO] [config.py:964:print]   bfloat16_enabled ............. False
[2023-07-01 08:06:08,173] [INFO] [config.py:964:print]   checkpoint_parallel_write_pipeline  False
[2023-07-01 08:06:08,173] [INFO] [config.py:964:print]   checkpoint_tag_validation_enabled  True
[2023-07-01 08:06:08,173] [INFO] [config.py:964:print]   checkpoint_tag_validation_fail  False
[2023-07-01 08:06:08,173] [INFO] [config.py:964:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x2b49607ef250>
[2023-07-01 08:06:08,173] [INFO] [config.py:964:print]   communication_data_type ...... None
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   curriculum_enabled_legacy .... False
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   curriculum_params_legacy ..... False
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   data_efficiency_enabled ...... False
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   dataloader_drop_last ......... False
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   disable_allgather ............ False
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   dump_state ................... False
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   dynamic_loss_scale_args ...... None
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   eigenvalue_enabled ........... False
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   eigenvalue_gas_boundary_resolution  1
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   eigenvalue_layer_num ......... 0
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   eigenvalue_max_iter .......... 100
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   eigenvalue_stability ......... 1e-06
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   eigenvalue_tol ............... 0.01
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   eigenvalue_verbose ........... False
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   elasticity_enabled ........... False
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   fp16_auto_cast ............... False
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   fp16_enabled ................. True
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   fp16_master_weights_and_gradients  False
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   global_rank .................. 0
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   grad_accum_dtype ............. None
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   gradient_accumulation_steps .. 1
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   gradient_clipping ............ 1.0
[2023-07-01 08:06:08,174] [INFO] [config.py:964:print]   gradient_predivide_factor .... 1.0
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   initial_dynamic_scale ........ 65536
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   load_universal_checkpoint .... False
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   loss_scale ................... 0
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   memory_breakdown ............. False
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   mics_hierarchial_params_gather  False
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   mics_shard_size .............. -1
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   optimizer_legacy_fusion ...... False
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   optimizer_name ............... None
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   optimizer_params ............. None
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   pld_enabled .................. False
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   pld_params ................... False
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   prescale_gradients ........... False
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   scheduler_name ............... None
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   scheduler_params ............. None
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   sparse_attention ............. None
[2023-07-01 08:06:08,175] [INFO] [config.py:964:print]   sparse_gradients_enabled ..... False
[2023-07-01 08:06:08,176] [INFO] [config.py:964:print]   steps_per_print .............. 10
[2023-07-01 08:06:08,176] [INFO] [config.py:964:print]   train_batch_size ............. 32
[2023-07-01 08:06:08,176] [INFO] [config.py:964:print]   train_micro_batch_size_per_gpu  4
[2023-07-01 08:06:08,176] [INFO] [config.py:964:print]   use_node_local_storage ....... False
[2023-07-01 08:06:08,176] [INFO] [config.py:964:print]   wall_clock_breakdown ......... False
[2023-07-01 08:06:08,176] [INFO] [config.py:964:print]   world_size ................... 8
[2023-07-01 08:06:08,176] [INFO] [config.py:964:print]   zero_allow_untested_optimizer  False
[2023-07-01 08:06:08,176] [INFO] [config.py:964:print]   zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False
[2023-07-01 08:06:08,176] [INFO] [config.py:964:print]   zero_enabled ................. False
[2023-07-01 08:06:08,176] [INFO] [config.py:964:print]   zero_force_ds_cpu_optimizer .. True
[2023-07-01 08:06:08,176] [INFO] [config.py:964:print]   zero_optimization_stage ...... 0
[2023-07-01 08:06:08,176] [INFO] [config.py:950:print_user_config]   json = {
    "train_batch_size": 32, 
    "train_micro_batch_size_per_gpu": 4, 
    "steps_per_print": 10, 
    "zero_optimization": {
        "stage": 0, 
        "stage3_param_persistence_threshold": 1.000000e+04, 
        "offload_param": {
            "device": "none"
        }, 
        "memory_efficient_linear": false
    }, 
    "fp16": {
        "enabled": true
    }, 
    "gradient_clipping": 1.0, 
    "prescale_gradients": false, 
    "wall_clock_breakdown": false
}
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.002164125442504883 seconds
*******************[end] Initialized Ref Model [end] (duration: 27.32s)*******************
*************************[start] Initializing EMA Model [start] **************************
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.001653432846069336 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.002496004104614258 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0014553070068359375 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0012822151184082031 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0011990070343017578 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0014603137969970703 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.002300739288330078 seconds
model loaded
[2023-07-01 08:06:24,453] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.3, git-hash=unknown, git-branch=unknown
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
[2023-07-01 08:06:34,732] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-07-01 08:06:34,733] [INFO] [config.py:960:print] DeepSpeedEngine configuration:
[2023-07-01 08:06:34,734] [INFO] [config.py:964:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2023-07-01 08:06:34,734] [INFO] [config.py:964:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-07-01 08:06:34,734] [INFO] [config.py:964:print]   amp_enabled .................. False
[2023-07-01 08:06:34,734] [INFO] [config.py:964:print]   amp_params ................... False
[2023-07-01 08:06:34,734] [INFO] [config.py:964:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2023-07-01 08:06:34,734] [INFO] [config.py:964:print]   bfloat16_enabled ............. False
[2023-07-01 08:06:34,734] [INFO] [config.py:964:print]   checkpoint_parallel_write_pipeline  False
[2023-07-01 08:06:34,734] [INFO] [config.py:964:print]   checkpoint_tag_validation_enabled  True
[2023-07-01 08:06:34,734] [INFO] [config.py:964:print]   checkpoint_tag_validation_fail  False
[2023-07-01 08:06:34,734] [INFO] [config.py:964:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x2b4956fabcd0>
[2023-07-01 08:06:34,734] [INFO] [config.py:964:print]   communication_data_type ...... None
[2023-07-01 08:06:34,734] [INFO] [config.py:964:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-07-01 08:06:34,734] [INFO] [config.py:964:print]   curriculum_enabled_legacy .... False
[2023-07-01 08:06:34,734] [INFO] [config.py:964:print]   curriculum_params_legacy ..... False
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   data_efficiency_enabled ...... False
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   dataloader_drop_last ......... False
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   disable_allgather ............ False
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   dump_state ................... False
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   dynamic_loss_scale_args ...... None
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   eigenvalue_enabled ........... False
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   eigenvalue_gas_boundary_resolution  1
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   eigenvalue_layer_num ......... 0
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   eigenvalue_max_iter .......... 100
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   eigenvalue_stability ......... 1e-06
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   eigenvalue_tol ............... 0.01
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   eigenvalue_verbose ........... False
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   elasticity_enabled ........... False
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   fp16_auto_cast ............... False
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   fp16_enabled ................. True
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   fp16_master_weights_and_gradients  False
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   global_rank .................. 0
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   grad_accum_dtype ............. None
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   gradient_accumulation_steps .. 1
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   gradient_clipping ............ 1.0
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   gradient_predivide_factor .... 1.0
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   initial_dynamic_scale ........ 65536
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   load_universal_checkpoint .... False
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   loss_scale ................... 0
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   memory_breakdown ............. False
[2023-07-01 08:06:34,735] [INFO] [config.py:964:print]   mics_hierarchial_params_gather  False
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   mics_shard_size .............. -1
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   optimizer_legacy_fusion ...... False
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   optimizer_name ............... None
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   optimizer_params ............. None
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   pld_enabled .................. False
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   pld_params ................... False
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   prescale_gradients ........... False
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   scheduler_name ............... None
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   scheduler_params ............. None
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   sparse_attention ............. None
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   sparse_gradients_enabled ..... False
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   steps_per_print .............. 10
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   train_batch_size ............. 32
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   train_micro_batch_size_per_gpu  4
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   use_node_local_storage ....... False
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   wall_clock_breakdown ......... False
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   world_size ................... 8
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   zero_allow_untested_optimizer  False
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   zero_enabled ................. False
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   zero_force_ds_cpu_optimizer .. True
[2023-07-01 08:06:34,736] [INFO] [config.py:964:print]   zero_optimization_stage ...... 0
[2023-07-01 08:06:34,737] [INFO] [config.py:950:print_user_config]   json = {
    "train_batch_size": 32, 
    "train_micro_batch_size_per_gpu": 4, 
    "steps_per_print": 10, 
    "zero_optimization": {
        "stage": 0, 
        "stage3_param_persistence_threshold": 1.000000e+04, 
        "offload_param": {
            "device": "none"
        }, 
        "memory_efficient_linear": false
    }, 
    "fp16": {
        "enabled": true
    }, 
    "gradient_clipping": 1.0, 
    "prescale_gradients": false, 
    "wall_clock_breakdown": false
}
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.006189823150634766 seconds
*******************[end] Initialized EMA Model [end] (duration: 26.56s)*******************
************************[start] Initializing Critic Model [start] ************************
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0012662410736083984 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0014331340789794922 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0011889934539794922 seconds
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0016870498657226562 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0011091232299804688 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0018308162689208984 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0019044876098632812 seconds
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.0026984214782714844 seconds
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.0019664764404296875 seconds
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.009085655212402344 seconds
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.008561134338378906 seconds
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.009094715118408203 seconds
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.0236053466796875 seconds
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.023058176040649414 seconds
[2023-07-01 08:06:52,419] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.3, git-hash=unknown, git-branch=unknown
No modifications detected for re-loaded extension module fused_adam, skipping build step...
Loading extension module fused_adam...
Time to load fused_adam op: 0.03352189064025879 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.008983373641967773 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0023987293243408203 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0010488033294677734 seconds
[2023-07-01 08:07:01,952] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-07-01 08:07:01,953] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer
[2023-07-01 08:07:01,953] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0007925033569335938 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.002661466598510742 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0010764598846435547 seconds
[2023-07-01 08:07:01,970] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam
[2023-07-01 08:07:01,970] [INFO] [utils.py:54:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'deepspeed.ops.adam.fused_adam.FusedAdam'>
[2023-07-01 08:07:01,970] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer
[2023-07-01 08:07:01,970] [INFO] [stage_1_and_2.py:133:__init__] Reduce bucket size 500,000,000
[2023-07-01 08:07:01,970] [INFO] [stage_1_and_2.py:134:__init__] Allgather bucket size 500,000,000
[2023-07-01 08:07:01,970] [INFO] [stage_1_and_2.py:135:__init__] CPU Offload: False
[2023-07-01 08:07:01,970] [INFO] [stage_1_and_2.py:136:__init__] Round robin gradient partitioning: False
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0009701251983642578 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0022344589233398438 seconds
Rank: 4 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 7 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 5 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 2 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 1 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 0 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 6 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Rank: 3 partition count [8, 8] and sizes[(41365824, False), (33792, False)] 
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Time to load utils op: 0.001149892807006836 seconds

No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
No modifications detected for re-loaded extension module utils, skipping build step...
Time to load utils op: 0.0009431838989257812 secondsLoading extension module utils...

No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...Time to load utils op: 0.0007071495056152344 secondsUsing /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...


Time to load utils op: 0.0010597705841064453 seconds
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.00122833251953125 seconds
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0017001628875732422 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0030956268310546875 seconds
[2023-07-01 08:07:10,098] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states
[2023-07-01 08:07:10,099] [INFO] [utils.py:786:see_memory_usage] MA 10.58 GB         Max_MA 10.58 GB         CA 10.97 GB         Max_CA 11 GB 
[2023-07-01 08:07:10,099] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 59.17 GB, percent = 5.9%
[2023-07-01 08:07:10,364] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states
[2023-07-01 08:07:10,364] [INFO] [utils.py:786:see_memory_usage] MA 10.89 GB         Max_MA 11.05 GB         CA 11.43 GB         Max_CA 11 GB 
[2023-07-01 08:07:10,365] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 60.12 GB, percent = 6.0%
[2023-07-01 08:07:10,365] [INFO] [stage_1_and_2.py:489:__init__] optimizer state initialized
[2023-07-01 08:07:10,627] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer
[2023-07-01 08:07:10,628] [INFO] [utils.py:786:see_memory_usage] MA 10.89 GB         Max_MA 10.89 GB         CA 11.43 GB         Max_CA 11 GB 
[2023-07-01 08:07:10,628] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 61.06 GB, percent = 6.1%
[2023-07-01 08:07:10,630] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2023-07-01 08:07:10,630] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2023-07-01 08:07:10,630] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x2b49620c2c70>
[2023-07-01 08:07:10,630] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:07:10,631] [INFO] [config.py:960:print] DeepSpeedEngine configuration:
[2023-07-01 08:07:10,631] [INFO] [config.py:964:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2023-07-01 08:07:10,631] [INFO] [config.py:964:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-07-01 08:07:10,631] [INFO] [config.py:964:print]   amp_enabled .................. False
[2023-07-01 08:07:10,631] [INFO] [config.py:964:print]   amp_params ................... False
[2023-07-01 08:07:10,631] [INFO] [config.py:964:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2023-07-01 08:07:10,631] [INFO] [config.py:964:print]   bfloat16_enabled ............. False
[2023-07-01 08:07:10,631] [INFO] [config.py:964:print]   checkpoint_parallel_write_pipeline  False
[2023-07-01 08:07:10,631] [INFO] [config.py:964:print]   checkpoint_tag_validation_enabled  True
[2023-07-01 08:07:10,631] [INFO] [config.py:964:print]   checkpoint_tag_validation_fail  False
[2023-07-01 08:07:10,631] [INFO] [config.py:964:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x2b49620c98e0>
[2023-07-01 08:07:10,631] [INFO] [config.py:964:print]   communication_data_type ...... None
[2023-07-01 08:07:10,631] [INFO] [config.py:964:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   curriculum_enabled_legacy .... False
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   curriculum_params_legacy ..... False
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   data_efficiency_enabled ...... False
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   dataloader_drop_last ......... False
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   disable_allgather ............ False
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   dump_state ................... False
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1}
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   eigenvalue_enabled ........... False
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   eigenvalue_gas_boundary_resolution  1
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   eigenvalue_layer_num ......... 0
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   eigenvalue_max_iter .......... 100
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   eigenvalue_stability ......... 1e-06
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   eigenvalue_tol ............... 0.01
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   eigenvalue_verbose ........... False
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   elasticity_enabled ........... False
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   fp16_auto_cast ............... False
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   fp16_enabled ................. True
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   fp16_master_weights_and_gradients  False
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   global_rank .................. 0
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   grad_accum_dtype ............. None
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   gradient_accumulation_steps .. 1
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   gradient_clipping ............ 1.0
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   gradient_predivide_factor .... 1.0
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   initial_dynamic_scale ........ 65536
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   load_universal_checkpoint .... False
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   loss_scale ................... 0
[2023-07-01 08:07:10,632] [INFO] [config.py:964:print]   memory_breakdown ............. False
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   mics_hierarchial_params_gather  False
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   mics_shard_size .............. -1
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   optimizer_legacy_fusion ...... False
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   optimizer_name ............... None
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   optimizer_params ............. None
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   pld_enabled .................. False
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   pld_params ................... False
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   prescale_gradients ........... False
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   scheduler_name ............... None
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   scheduler_params ............. None
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   sparse_attention ............. None
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   sparse_gradients_enabled ..... False
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   steps_per_print .............. 10
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   train_batch_size ............. 32
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   train_micro_batch_size_per_gpu  4
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   use_node_local_storage ....... False
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   wall_clock_breakdown ......... False
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   world_size ................... 8
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   zero_allow_untested_optimizer  False
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   zero_enabled ................. True
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   zero_force_ds_cpu_optimizer .. True
[2023-07-01 08:07:10,633] [INFO] [config.py:964:print]   zero_optimization_stage ...... 2
[2023-07-01 08:07:10,633] [INFO] [config.py:950:print_user_config]   json = {
    "train_batch_size": 32, 
    "train_micro_batch_size_per_gpu": 4, 
    "steps_per_print": 10, 
    "zero_optimization": {
        "stage": 2, 
        "offload_param": {
            "device": "none"
        }, 
        "offload_optimizer": {
            "device": "none"
        }, 
        "stage3_param_persistence_threshold": 1.000000e+04, 
        "stage3_max_live_parameters": 3.000000e+07, 
        "stage3_prefetch_bucket_size": 3.000000e+07, 
        "memory_efficient_linear": false
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale_window": 100
    }, 
    "gradient_clipping": 1.0, 
    "prescale_gradients": false, 
    "wall_clock_breakdown": false, 
    "hybrid_engine": {
        "enabled": false, 
        "max_out_tokens": 512, 
        "inference_tp_size": 1, 
        "release_inference_cache": false, 
        "pin_parameters": true, 
        "tp_gather_partition_size": 8
    }
}
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0009529590606689453 seconds
*****************[end] Initialized Critic Model [end] (duration: 35.89s)******************
************************[start] Initializing Reward Model [start] ************************
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
model loaded
[2023-07-01 08:07:23,866] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.3, git-hash=unknown, git-branch=unknown
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0057489871978759766 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0014493465423583984 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0013382434844970703 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0013589859008789062 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0016677379608154297 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0011448860168457031 seconds
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0011296272277832031 seconds
[2023-07-01 08:07:31,957] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-07-01 08:07:31,958] [INFO] [config.py:960:print] DeepSpeedEngine configuration:
[2023-07-01 08:07:31,958] [INFO] [config.py:964:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2023-07-01 08:07:31,958] [INFO] [config.py:964:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-07-01 08:07:31,958] [INFO] [config.py:964:print]   amp_enabled .................. False
[2023-07-01 08:07:31,958] [INFO] [config.py:964:print]   amp_params ................... False
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   bfloat16_enabled ............. False
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   checkpoint_parallel_write_pipeline  False
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   checkpoint_tag_validation_enabled  True
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   checkpoint_tag_validation_fail  False
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x2b4962167a90>
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   communication_data_type ...... None
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   curriculum_enabled_legacy .... False
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   curriculum_params_legacy ..... False
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   data_efficiency_enabled ...... False
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   dataloader_drop_last ......... False
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   disable_allgather ............ False
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   dump_state ................... False
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   dynamic_loss_scale_args ...... None
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   eigenvalue_enabled ........... False
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   eigenvalue_gas_boundary_resolution  1
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   eigenvalue_layer_num ......... 0
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   eigenvalue_max_iter .......... 100
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   eigenvalue_stability ......... 1e-06
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   eigenvalue_tol ............... 0.01
[2023-07-01 08:07:31,959] [INFO] [config.py:964:print]   eigenvalue_verbose ........... False
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   elasticity_enabled ........... False
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   fp16_auto_cast ............... False
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   fp16_enabled ................. True
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   fp16_master_weights_and_gradients  False
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   global_rank .................. 0
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   grad_accum_dtype ............. None
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   gradient_accumulation_steps .. 1
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   gradient_clipping ............ 1.0
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   gradient_predivide_factor .... 1.0
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   initial_dynamic_scale ........ 65536
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   load_universal_checkpoint .... False
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   loss_scale ................... 0
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   memory_breakdown ............. False
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   mics_hierarchial_params_gather  False
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   mics_shard_size .............. -1
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   optimizer_legacy_fusion ...... False
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   optimizer_name ............... None
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   optimizer_params ............. None
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   pld_enabled .................. False
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   pld_params ................... False
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   prescale_gradients ........... False
[2023-07-01 08:07:31,960] [INFO] [config.py:964:print]   scheduler_name ............... None
[2023-07-01 08:07:31,961] [INFO] [config.py:964:print]   scheduler_params ............. None
[2023-07-01 08:07:31,961] [INFO] [config.py:964:print]   sparse_attention ............. None
[2023-07-01 08:07:31,961] [INFO] [config.py:964:print]   sparse_gradients_enabled ..... False
[2023-07-01 08:07:31,961] [INFO] [config.py:964:print]   steps_per_print .............. 10
[2023-07-01 08:07:31,961] [INFO] [config.py:964:print]   train_batch_size ............. 32
[2023-07-01 08:07:31,961] [INFO] [config.py:964:print]   train_micro_batch_size_per_gpu  4
[2023-07-01 08:07:31,961] [INFO] [config.py:964:print]   use_node_local_storage ....... False
[2023-07-01 08:07:31,961] [INFO] [config.py:964:print]   wall_clock_breakdown ......... False
[2023-07-01 08:07:31,961] [INFO] [config.py:964:print]   world_size ................... 8
[2023-07-01 08:07:31,961] [INFO] [config.py:964:print]   zero_allow_untested_optimizer  False
[2023-07-01 08:07:31,961] [INFO] [config.py:964:print]   zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False
[2023-07-01 08:07:31,961] [INFO] [config.py:964:print]   zero_enabled ................. False
[2023-07-01 08:07:31,961] [INFO] [config.py:964:print]   zero_force_ds_cpu_optimizer .. True
[2023-07-01 08:07:31,961] [INFO] [config.py:964:print]   zero_optimization_stage ...... 0
[2023-07-01 08:07:31,961] [INFO] [config.py:950:print_user_config]   json = {
    "train_batch_size": 32, 
    "train_micro_batch_size_per_gpu": 4, 
    "steps_per_print": 10, 
    "zero_optimization": {
        "stage": 0, 
        "stage3_param_persistence_threshold": 1.000000e+04, 
        "offload_param": {
            "device": "none"
        }, 
        "memory_efficient_linear": false
    }, 
    "fp16": {
        "enabled": true
    }, 
    "gradient_clipping": 1.0, 
    "prescale_gradients": false, 
    "wall_clock_breakdown": false
}
Using /home/zhaiyuanzhao/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0013725757598876953 seconds
*****************[end] Initialized Reward Model [end] (duration: 21.33s)******************
***** Running training *****
Beginning of Epoch 1/1, Total Generation Batches 954
------------------------------------------------------
Free memory : 26.453308 (GigaBytes)  
Total memory: 39.586121 (GigaBytes)  
Requested memory: 1.031250 (GigaBytes) 
Setting maximum total tokens (input + output) to 512 
WorkSpace: 0x2b4ed0000000 
------------------------------------------------------
[2023-07-01 08:07:36,757] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1
[2023-07-01 08:07:36,921] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 0|ppo_ep: 1|act_loss: 0.00479888916015625|cri_loss: 0.201416015625|unsuper_loss: 0.0
average reward score: -1.482421875
-------------------------------------------------------------------------------------
|E2E latency=4.93s |Gather latency=0.00s (0.00%) |Generate time=3.91s (79.35%) |Training time=0.83s (16.80%) |Others=0.19 (3.85%)|CurSamplesPerSec=6.49 |AvgSamplesPerSec=6.49
[2023-07-01 08:07:39,092] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768
[2023-07-01 08:07:39,251] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768
epoch: 0|step: 1|ppo_ep: 1|act_loss: -0.30029296875|cri_loss: 1.76953125|unsuper_loss: 0.0
average reward score: -3.720703125
-------------------------------------------------------------------------------------
|E2E latency=2.33s |Gather latency=0.00s (0.00%) |Generate time=1.52s (65.40%) |Training time=0.62s (26.43%) |Others=0.19 (8.18%)|CurSamplesPerSec=13.73 |AvgSamplesPerSec=8.82
[2023-07-01 08:07:41,420] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384
[2023-07-01 08:07:41,581] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384
epoch: 0|step: 2|ppo_ep: 1|act_loss: -0.11614990234375|cri_loss: 0.6455078125|unsuper_loss: 0.0
average reward score: -1.78515625
-------------------------------------------------------------------------------------
|E2E latency=2.33s |Gather latency=0.00s (0.00%) |Generate time=1.53s (65.56%) |Training time=0.62s (26.51%) |Others=0.18 (7.93%)|CurSamplesPerSec=13.73 |AvgSamplesPerSec=10.01
[2023-07-01 08:07:44,100] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192
epoch: 0|step: 3|ppo_ep: 1|act_loss: -0.0853271484375|cri_loss: 0.2236328125|unsuper_loss: 0.0
average reward score: 0.70947265625
-------------------------------------------------------------------------------------
|E2E latency=2.52s |Gather latency=0.00s (0.00%) |Generate time=1.53s (60.77%) |Training time=0.80s (31.96%) |Others=0.18 (7.27%)|CurSamplesPerSec=12.72 |AvgSamplesPerSec=10.57
[2023-07-01 08:07:46,255] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192
epoch: 0|step: 4|ppo_ep: 1|act_loss: -0.032318115234375|cri_loss: 0.200439453125|unsuper_loss: 0.0
average reward score: -0.22509765625
-------------------------------------------------------------------------------------
|E2E latency=2.36s |Gather latency=0.00s (0.00%) |Generate time=1.52s (64.56%) |Training time=0.61s (25.79%) |Others=0.23 (9.66%)|CurSamplesPerSec=13.56 |AvgSamplesPerSec=11.06
epoch: 0|step: 5|ppo_ep: 1|act_loss: -0.345458984375|cri_loss: 1.0078125|unsuper_loss: 0.0
average reward score: -0.45458984375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.63%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=11.28
[2023-07-01 08:07:51,512] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096
epoch: 0|step: 6|ppo_ep: 1|act_loss: 0.09063720703125|cri_loss: 0.2022705078125|unsuper_loss: 0.0
average reward score: 0.46240234375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.52s (60.65%) |Training time=0.80s (31.99%) |Others=0.18 (7.36%)|CurSamplesPerSec=12.80 |AvgSamplesPerSec=11.48
epoch: 0|step: 7|ppo_ep: 1|act_loss: 0.14013671875|cri_loss: 0.08990478515625|unsuper_loss: 0.0
average reward score: -1.755859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.80s (31.64%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=11.61
[2023-07-01 08:07:56,194] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096
epoch: 0|step: 8|ppo_ep: 1|act_loss: -0.0853271484375|cri_loss: 0.61572265625|unsuper_loss: 0.0
average reward score: -0.556640625
-------------------------------------------------------------------------------------
|E2E latency=2.35s |Gather latency=0.00s (0.00%) |Generate time=1.51s (64.14%) |Training time=0.62s (26.23%) |Others=0.23 (9.64%)|CurSamplesPerSec=13.62 |AvgSamplesPerSec=11.80
[2023-07-01 08:07:58,553] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=5, lr=[4.825e-07, 4.825e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:07:58,731] [INFO] [timer.py:215:stop] epoch=0/micro_step=10/global_step=10, RunningAvgSamplesPerSec=57.05384728645054, CurrSamplesPerSec=50.21911950796253, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:07:58,896] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=5, lr=[2.5000000000000004e-07, 2.5000000000000004e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 9|ppo_ep: 1|act_loss: 0.199462890625|cri_loss: 0.1419677734375|unsuper_loss: 0.0
average reward score: -0.7958984375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.26%) |Training time=0.81s (31.87%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=11.88
epoch: 0|step: 10|ppo_ep: 1|act_loss: 0.123046875|cri_loss: 0.1575927734375|unsuper_loss: 0.0
average reward score: -0.1348876953125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.56%) |Training time=0.80s (31.46%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=11.94
epoch: 0|step: 11|ppo_ep: 1|act_loss: 0.0229949951171875|cri_loss: 0.1405029296875|unsuper_loss: 0.0
average reward score: -1.232421875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.14%) |Training time=0.81s (31.83%) |Others=0.23 (9.03%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=11.99
[2023-07-01 08:08:06,190] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048
epoch: 0|step: 12|ppo_ep: 1|act_loss: -0.11627197265625|cri_loss: 1.2509765625|unsuper_loss: 0.0
average reward score: -2.00390625
-------------------------------------------------------------------------------------
|E2E latency=2.36s |Gather latency=0.00s (0.00%) |Generate time=1.51s (64.07%) |Training time=0.62s (26.34%) |Others=0.23 (9.59%)|CurSamplesPerSec=13.56 |AvgSamplesPerSec=12.09
[2023-07-01 08:08:08,886] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048
epoch: 0|step: 13|ppo_ep: 1|act_loss: -0.10198974609375|cri_loss: 0.263427734375|unsuper_loss: 0.0
average reward score: -0.035888671875
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.51s (60.59%) |Training time=0.80s (32.22%) |Others=0.18 (7.19%)|CurSamplesPerSec=12.85 |AvgSamplesPerSec=12.14
epoch: 0|step: 14|ppo_ep: 1|act_loss: -0.367919921875|cri_loss: 0.2203369140625|unsuper_loss: 0.0
average reward score: -0.9765625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.80s (31.59%) |Others=0.23 (9.03%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.17
epoch: 0|step: 15|ppo_ep: 1|act_loss: 0.12371826171875|cri_loss: 0.16064453125|unsuper_loss: 0.0
average reward score: 0.66015625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.46%) |Training time=0.80s (31.58%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.20
epoch: 0|step: 16|ppo_ep: 1|act_loss: -0.037841796875|cri_loss: 0.1976318359375|unsuper_loss: 0.0
average reward score: -0.326904296875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.26%) |Training time=0.81s (31.78%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.22
epoch: 0|step: 17|ppo_ep: 1|act_loss: 0.03997802734375|cri_loss: 0.0546875|unsuper_loss: 0.0
average reward score: 0.403076171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.81s (31.67%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.24
epoch: 0|step: 18|ppo_ep: 1|act_loss: -0.11065673828125|cri_loss: 0.1983642578125|unsuper_loss: 0.0
average reward score: -0.39208984375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.18%) |Training time=0.81s (31.81%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.25
[2023-07-01 08:08:23,780] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=6, lr=[1.3510000000000003e-06, 1.3510000000000003e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:08:23,961] [INFO] [timer.py:215:stop] epoch=0/micro_step=20/global_step=20, RunningAvgSamplesPerSec=54.095903560254364, CurrSamplesPerSec=50.21105986335567, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:08:24,128] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=6, lr=[7.000000000000001e-07, 7.000000000000001e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 19|ppo_ep: 1|act_loss: -0.1505126953125|cri_loss: 0.1583251953125|unsuper_loss: 0.0
average reward score: 0.42919921875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.72%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.27
epoch: 0|step: 20|ppo_ep: 1|act_loss: 0.05194091796875|cri_loss: 0.10504150390625|unsuper_loss: 0.0
average reward score: -0.5322265625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.81s (31.68%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.28
epoch: 0|step: 21|ppo_ep: 1|act_loss: -0.157958984375|cri_loss: 1.16015625|unsuper_loss: 0.0
average reward score: -0.0382080078125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.80s (31.64%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.29
epoch: 0|step: 22|ppo_ep: 1|act_loss: 0.057037353515625|cri_loss: 0.112060546875|unsuper_loss: 0.0
average reward score: -0.424072265625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.59%) |Training time=0.80s (31.49%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.31
epoch: 0|step: 23|ppo_ep: 1|act_loss: -0.10540771484375|cri_loss: 0.46435546875|unsuper_loss: 0.0
average reward score: -0.78173828125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.56%) |Training time=0.80s (31.53%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.32
epoch: 0|step: 24|ppo_ep: 1|act_loss: 0.038909912109375|cri_loss: 0.07330322265625|unsuper_loss: 0.0
average reward score: -0.0738525390625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.56%) |Training time=0.80s (31.46%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.33
epoch: 0|step: 25|ppo_ep: 1|act_loss: 0.037322998046875|cri_loss: 0.1531982421875|unsuper_loss: 0.0
average reward score: 0.059814453125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.58%) |Training time=0.80s (31.47%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.34
epoch: 0|step: 26|ppo_ep: 1|act_loss: 0.1767578125|cri_loss: 0.11053466796875|unsuper_loss: 0.0
average reward score: -1.0439453125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.55%) |Training time=0.80s (31.50%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.35
epoch: 0|step: 27|ppo_ep: 1|act_loss: 0.07501220703125|cri_loss: 0.1719970703125|unsuper_loss: 0.0
average reward score: 1.2197265625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.26%) |Training time=0.81s (31.84%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.36
epoch: 0|step: 28|ppo_ep: 1|act_loss: 0.2171630859375|cri_loss: 0.1456298828125|unsuper_loss: 0.0
average reward score: -0.366943359375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.80s (31.60%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.36
[2023-07-01 08:08:49,209] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=6, lr=[2.316e-06, 2.316e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:08:49,387] [INFO] [timer.py:215:stop] epoch=0/micro_step=30/global_step=30, RunningAvgSamplesPerSec=52.846617835000465, CurrSamplesPerSec=50.29139602390737, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:08:49,553] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=6, lr=[1.2000000000000002e-06, 1.2000000000000002e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 29|ppo_ep: 1|act_loss: 0.052337646484375|cri_loss: 0.166015625|unsuper_loss: 0.0
average reward score: -1.0771484375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.79%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.37
epoch: 0|step: 30|ppo_ep: 1|act_loss: 0.053802490234375|cri_loss: 0.1561279296875|unsuper_loss: 0.0
average reward score: 0.409423828125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.23%) |Training time=0.81s (31.77%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.38
epoch: 0|step: 31|ppo_ep: 1|act_loss: 0.2149658203125|cri_loss: 0.2462158203125|unsuper_loss: 0.0
average reward score: 0.31982421875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.81s (31.68%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.38
epoch: 0|step: 32|ppo_ep: 1|act_loss: -0.01303863525390625|cri_loss: 0.0921630859375|unsuper_loss: 0.0
average reward score: 1.552734375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.51%) |Training time=0.80s (31.55%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.39
epoch: 0|step: 33|ppo_ep: 1|act_loss: -0.08642578125|cri_loss: 0.1295166015625|unsuper_loss: 0.0
average reward score: 1.685546875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.16%) |Training time=0.81s (31.87%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.39
epoch: 0|step: 34|ppo_ep: 1|act_loss: 0.1895751953125|cri_loss: 0.1927490234375|unsuper_loss: 0.0
average reward score: -0.654296875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.23%) |Training time=0.81s (31.78%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.40
epoch: 0|step: 35|ppo_ep: 1|act_loss: 0.12469482421875|cri_loss: 0.1187744140625|unsuper_loss: 0.0
average reward score: -0.56787109375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.80%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.40
epoch: 0|step: 36|ppo_ep: 1|act_loss: 0.0865478515625|cri_loss: 0.06805419921875|unsuper_loss: 0.0
average reward score: 0.978515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.50%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.41
epoch: 0|step: 37|ppo_ep: 1|act_loss: 0.1055908203125|cri_loss: 0.4501953125|unsuper_loss: 0.0
average reward score: -0.71044921875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.81s (31.73%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.41
epoch: 0|step: 38|ppo_ep: 1|act_loss: 0.07806396484375|cri_loss: 0.384521484375|unsuper_loss: 0.0
average reward score: 0.85986328125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.58%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.41
[2023-07-01 08:09:14,695] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=6, lr=[3.2810000000000004e-06, 3.2810000000000004e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:09:14,872] [INFO] [timer.py:215:stop] epoch=0/micro_step=40/global_step=40, RunningAvgSamplesPerSec=52.19390649599056, CurrSamplesPerSec=50.73494643104871, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:09:15,039] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=6, lr=[1.7000000000000002e-06, 1.7000000000000002e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 39|ppo_ep: 1|act_loss: -0.156982421875|cri_loss: 0.2034912109375|unsuper_loss: 0.0
average reward score: -0.13427734375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.49%) |Training time=0.80s (31.58%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.42
epoch: 0|step: 40|ppo_ep: 1|act_loss: -0.037841796875|cri_loss: 0.12188720703125|unsuper_loss: 0.0
average reward score: -0.04388427734375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.56%) |Training time=0.80s (31.53%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.42
epoch: 0|step: 41|ppo_ep: 1|act_loss: 0.14306640625|cri_loss: 0.2890625|unsuper_loss: 0.0
average reward score: -0.447265625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.79%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.43
epoch: 0|step: 42|ppo_ep: 1|act_loss: -0.0902099609375|cri_loss: 0.1630859375|unsuper_loss: 0.0
average reward score: 0.9951171875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.31%) |Training time=0.81s (31.72%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.43
epoch: 0|step: 43|ppo_ep: 1|act_loss: -0.083251953125|cri_loss: 0.0677490234375|unsuper_loss: 0.0
average reward score: 0.8994140625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.62%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.43
epoch: 0|step: 44|ppo_ep: 1|act_loss: 0.0496826171875|cri_loss: 0.2196044921875|unsuper_loss: 0.0
average reward score: -0.120361328125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.25%) |Training time=0.81s (31.86%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.43
epoch: 0|step: 45|ppo_ep: 1|act_loss: -0.08050537109375|cri_loss: 0.154541015625|unsuper_loss: 0.0
average reward score: 1.1318359375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.81s (31.77%) |Others=0.23 (8.86%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.44
epoch: 0|step: 46|ppo_ep: 1|act_loss: 0.0667724609375|cri_loss: 0.10546875|unsuper_loss: 0.0
average reward score: -1.0234375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.58%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.44
epoch: 0|step: 47|ppo_ep: 1|act_loss: -0.1798095703125|cri_loss: 0.11456298828125|unsuper_loss: 0.0
average reward score: 0.80078125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.22%) |Training time=0.81s (31.81%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.44
epoch: 0|step: 48|ppo_ep: 1|act_loss: -0.0260009765625|cri_loss: 0.1363525390625|unsuper_loss: 0.0
average reward score: 1.5859375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.19%) |Training time=0.81s (31.87%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.45
[2023-07-01 08:09:40,151] [INFO] [logging.py:96:log_dist] [Rank 0] step=50, skipped=6, lr=[4.2460000000000005e-06, 4.2460000000000005e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:09:40,333] [INFO] [timer.py:215:stop] epoch=0/micro_step=50/global_step=50, RunningAvgSamplesPerSec=51.811458544558285, CurrSamplesPerSec=49.8596639560758, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:09:40,498] [INFO] [logging.py:96:log_dist] [Rank 0] step=50, skipped=6, lr=[2.2e-06, 2.2e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 49|ppo_ep: 1|act_loss: -0.0413818359375|cri_loss: 0.21044921875|unsuper_loss: 0.0
average reward score: 1.5859375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.17%) |Training time=0.82s (31.93%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.45
epoch: 0|step: 50|ppo_ep: 1|act_loss: -0.0550537109375|cri_loss: 0.12091064453125|unsuper_loss: 0.0
average reward score: -0.29931640625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.37%) |Training time=0.81s (31.72%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.45
epoch: 0|step: 51|ppo_ep: 1|act_loss: -0.1781005859375|cri_loss: 0.267333984375|unsuper_loss: 0.0
average reward score: 0.7529296875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.21%) |Training time=0.81s (31.87%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.45
epoch: 0|step: 52|ppo_ep: 1|act_loss: -0.1932373046875|cri_loss: 0.1837158203125|unsuper_loss: 0.0
average reward score: 1.6982421875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.62%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.45
epoch: 0|step: 53|ppo_ep: 1|act_loss: -0.09027099609375|cri_loss: 0.09649658203125|unsuper_loss: 0.0
average reward score: 1.9296875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.23%) |Training time=0.81s (31.82%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.46
epoch: 0|step: 54|ppo_ep: 1|act_loss: -0.307861328125|cri_loss: 0.280029296875|unsuper_loss: 0.0
average reward score: 0.8291015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.66%) |Training time=0.80s (31.45%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.46
epoch: 0|step: 55|ppo_ep: 1|act_loss: -0.11029052734375|cri_loss: 0.0491943359375|unsuper_loss: 0.0
average reward score: 1.361328125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.57%) |Training time=0.80s (31.48%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.46
epoch: 0|step: 56|ppo_ep: 1|act_loss: -0.061004638671875|cri_loss: 0.06158447265625|unsuper_loss: 0.0
average reward score: 0.578125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.50%) |Training time=0.80s (31.55%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.46
epoch: 0|step: 57|ppo_ep: 1|act_loss: -0.04693603515625|cri_loss: 0.1094970703125|unsuper_loss: 0.0
average reward score: 0.85546875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.41%) |Training time=0.81s (31.64%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.46
epoch: 0|step: 58|ppo_ep: 1|act_loss: -0.057037353515625|cri_loss: 0.2431640625|unsuper_loss: 0.0
average reward score: 1.8388671875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.26%) |Training time=0.81s (31.82%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.47
[2023-07-01 08:10:05,603] [INFO] [logging.py:96:log_dist] [Rank 0] step=60, skipped=6, lr=[5.211000000000001e-06, 5.211000000000001e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:10:05,781] [INFO] [timer.py:215:stop] epoch=0/micro_step=60/global_step=60, RunningAvgSamplesPerSec=51.608427254494195, CurrSamplesPerSec=51.24631138292929, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:10:05,944] [INFO] [logging.py:96:log_dist] [Rank 0] step=60, skipped=6, lr=[2.7000000000000004e-06, 2.7000000000000004e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 59|ppo_ep: 1|act_loss: -0.01324462890625|cri_loss: 0.06976318359375|unsuper_loss: 0.0
average reward score: 1.5302734375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.68%) |Training time=0.80s (31.46%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.47
epoch: 0|step: 60|ppo_ep: 1|act_loss: 0.0682373046875|cri_loss: 0.05938720703125|unsuper_loss: 0.0
average reward score: 1.580078125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.56%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.47
epoch: 0|step: 61|ppo_ep: 1|act_loss: 0.0312042236328125|cri_loss: 0.0911865234375|unsuper_loss: 0.0
average reward score: 0.83642578125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.12%) |Training time=0.81s (31.95%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.47
epoch: 0|step: 62|ppo_ep: 1|act_loss: 0.1256103515625|cri_loss: 0.08343505859375|unsuper_loss: 0.0
average reward score: 0.404296875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.01%) |Training time=0.82s (32.03%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.47
epoch: 0|step: 63|ppo_ep: 1|act_loss: 0.142333984375|cri_loss: 0.1456298828125|unsuper_loss: 0.0
average reward score: 1.8056640625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.26%) |Training time=0.81s (31.82%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.47
epoch: 0|step: 64|ppo_ep: 1|act_loss: 0.151611328125|cri_loss: 0.078125|unsuper_loss: 0.0
average reward score: 0.9873046875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.81s (31.63%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.48
epoch: 0|step: 65|ppo_ep: 1|act_loss: 0.259765625|cri_loss: 0.125244140625|unsuper_loss: 0.0
average reward score: -0.45849609375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.81s (31.72%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.48
epoch: 0|step: 66|ppo_ep: 1|act_loss: 0.1741943359375|cri_loss: 0.22265625|unsuper_loss: 0.0
average reward score: -1.7998046875
-------------------------------------------------------------------------------------
|E2E latency=2.56s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.12%) |Training time=0.82s (31.93%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.52 |AvgSamplesPerSec=12.48
epoch: 0|step: 67|ppo_ep: 1|act_loss: -0.0309600830078125|cri_loss: 0.15966796875|unsuper_loss: 0.0
average reward score: -1.15625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.17%) |Training time=0.81s (31.88%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.48
epoch: 0|step: 68|ppo_ep: 1|act_loss: 0.06488037109375|cri_loss: 0.232177734375|unsuper_loss: 0.0
average reward score: -2.349609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.78%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.48
[2023-07-01 08:10:31,089] [INFO] [logging.py:96:log_dist] [Rank 0] step=70, skipped=6, lr=[6.176000000000001e-06, 6.176000000000001e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:10:31,269] [INFO] [timer.py:215:stop] epoch=0/micro_step=70/global_step=70, RunningAvgSamplesPerSec=51.409136051593045, CurrSamplesPerSec=50.844574422344415, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:10:31,433] [INFO] [logging.py:96:log_dist] [Rank 0] step=70, skipped=6, lr=[3.2000000000000003e-06, 3.2000000000000003e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 69|ppo_ep: 1|act_loss: 0.16064453125|cri_loss: 0.2646484375|unsuper_loss: 0.0
average reward score: -3.220703125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.60%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.48
epoch: 0|step: 70|ppo_ep: 1|act_loss: 0.211669921875|cri_loss: 0.4462890625|unsuper_loss: 0.0
average reward score: -4.57421875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.57%) |Training time=0.80s (31.49%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.48
epoch: 0|step: 71|ppo_ep: 1|act_loss: -0.0789794921875|cri_loss: 0.3310546875|unsuper_loss: 0.0
average reward score: -3.51953125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.52%) |Training time=0.80s (31.59%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.48
epoch: 0|step: 72|ppo_ep: 1|act_loss: -0.232177734375|cri_loss: 0.1922607421875|unsuper_loss: 0.0
average reward score: -4.6328125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.49%) |Training time=0.81s (31.65%) |Others=0.23 (8.86%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.48
epoch: 0|step: 73|ppo_ep: 1|act_loss: 0.09918212890625|cri_loss: 0.218017578125|unsuper_loss: 0.0
average reward score: -4.625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.64%) |Training time=0.80s (31.50%) |Others=0.23 (8.86%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.49
epoch: 0|step: 74|ppo_ep: 1|act_loss: -0.314208984375|cri_loss: 0.272705078125|unsuper_loss: 0.0
average reward score: -4.375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.66%) |Training time=0.80s (31.46%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.49
epoch: 0|step: 75|ppo_ep: 1|act_loss: 0.0401611328125|cri_loss: 0.1531982421875|unsuper_loss: 0.0
average reward score: -3.9375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.81s (31.70%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.49
epoch: 0|step: 76|ppo_ep: 1|act_loss: -0.2626953125|cri_loss: 0.11065673828125|unsuper_loss: 0.0
average reward score: -4.96875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.48%) |Training time=0.80s (31.57%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.49
epoch: 0|step: 77|ppo_ep: 1|act_loss: -0.06817626953125|cri_loss: 0.1427001953125|unsuper_loss: 0.0
average reward score: -4.08203125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.56%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.49
epoch: 0|step: 78|ppo_ep: 1|act_loss: -0.11895751953125|cri_loss: 0.13671875|unsuper_loss: 0.0
average reward score: -3.625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.56%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.49
[2023-07-01 08:10:56,546] [INFO] [logging.py:96:log_dist] [Rank 0] step=80, skipped=6, lr=[7.141000000000001e-06, 7.141000000000001e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:10:56,728] [INFO] [timer.py:215:stop] epoch=0/micro_step=80/global_step=80, RunningAvgSamplesPerSec=51.32700471811606, CurrSamplesPerSec=50.60044388296178, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:10:56,894] [INFO] [logging.py:96:log_dist] [Rank 0] step=80, skipped=6, lr=[3.7e-06, 3.7e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 79|ppo_ep: 1|act_loss: -0.11212158203125|cri_loss: 0.263427734375|unsuper_loss: 0.0
average reward score: -3.41015625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.45%) |Training time=0.81s (31.59%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.49
epoch: 0|step: 80|ppo_ep: 1|act_loss: -0.10980224609375|cri_loss: 0.2310791015625|unsuper_loss: 0.0
average reward score: -5.07421875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.37%) |Training time=0.81s (31.64%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.49
epoch: 0|step: 81|ppo_ep: 1|act_loss: 0.03558349609375|cri_loss: 0.38037109375|unsuper_loss: 0.0
average reward score: -4.1328125
-------------------------------------------------------------------------------------
|E2E latency=2.56s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.34%) |Training time=0.81s (31.74%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.52 |AvgSamplesPerSec=12.49
epoch: 0|step: 82|ppo_ep: 1|act_loss: 0.05450439453125|cri_loss: 0.372314453125|unsuper_loss: 0.0
average reward score: -6.484375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.26%) |Training time=0.81s (31.80%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.49
epoch: 0|step: 83|ppo_ep: 1|act_loss: 0.10955810546875|cri_loss: 0.173583984375|unsuper_loss: 0.0
average reward score: -3.87109375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.50%) |Training time=0.80s (31.58%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.49
epoch: 0|step: 84|ppo_ep: 1|act_loss: 0.1385498046875|cri_loss: 0.340576171875|unsuper_loss: 0.0
average reward score: -4.765625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.51%) |Training time=0.80s (31.53%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.50
epoch: 0|step: 85|ppo_ep: 1|act_loss: 0.052093505859375|cri_loss: 0.161376953125|unsuper_loss: 0.0
average reward score: -4.9453125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.81s (31.62%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.50
epoch: 0|step: 86|ppo_ep: 1|act_loss: 0.141357421875|cri_loss: 0.194091796875|unsuper_loss: 0.0
average reward score: -5.6953125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.25%) |Training time=0.81s (31.85%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.50
epoch: 0|step: 87|ppo_ep: 1|act_loss: 0.1273193359375|cri_loss: 0.04815673828125|unsuper_loss: 0.0
average reward score: -5.54296875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.56%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.50
epoch: 0|step: 88|ppo_ep: 1|act_loss: 0.0860595703125|cri_loss: 0.058013916015625|unsuper_loss: 0.0
average reward score: -5.4921875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.26%) |Training time=0.81s (31.82%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.50
[2023-07-01 08:11:22,017] [INFO] [logging.py:96:log_dist] [Rank 0] step=90, skipped=6, lr=[8.106e-06, 8.106e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:11:22,195] [INFO] [timer.py:215:stop] epoch=0/micro_step=90/global_step=90, RunningAvgSamplesPerSec=51.229823380821855, CurrSamplesPerSec=50.06663279595282, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:11:22,361] [INFO] [logging.py:96:log_dist] [Rank 0] step=90, skipped=6, lr=[4.2000000000000004e-06, 4.2000000000000004e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 89|ppo_ep: 1|act_loss: 0.07598876953125|cri_loss: 0.05743408203125|unsuper_loss: 0.0
average reward score: -6.0703125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.18%) |Training time=0.81s (31.92%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.50
epoch: 0|step: 90|ppo_ep: 1|act_loss: 0.0255126953125|cri_loss: 0.08709716796875|unsuper_loss: 0.0
average reward score: -3.384765625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.68%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.50
epoch: 0|step: 91|ppo_ep: 1|act_loss: 0.039703369140625|cri_loss: 0.071044921875|unsuper_loss: 0.0
average reward score: -3.388671875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.62%) |Training time=0.80s (31.51%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.50
epoch: 0|step: 92|ppo_ep: 1|act_loss: 0.0927734375|cri_loss: 0.038482666015625|unsuper_loss: 0.0
average reward score: -4.67578125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.77%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.50
epoch: 0|step: 93|ppo_ep: 1|act_loss: -0.034576416015625|cri_loss: 0.1512451171875|unsuper_loss: 0.0
average reward score: -6.234375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.64%) |Training time=0.80s (31.42%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.50
epoch: 0|step: 94|ppo_ep: 1|act_loss: -0.045196533203125|cri_loss: 0.120361328125|unsuper_loss: 0.0
average reward score: -5.26953125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.42%) |Training time=0.81s (31.60%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.50
epoch: 0|step: 95|ppo_ep: 1|act_loss: -0.1468505859375|cri_loss: 0.1895751953125|unsuper_loss: 0.0
average reward score: -4.4765625
-------------------------------------------------------------------------------------
|E2E latency=2.56s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.22%) |Training time=0.81s (31.84%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.52 |AvgSamplesPerSec=12.50
[2023-07-01 08:11:39,836] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024
epoch: 0|step: 96|ppo_ep: 1|act_loss: -0.2025146484375|cri_loss: 0.178466796875|unsuper_loss: 0.0
average reward score: -2.916015625
-------------------------------------------------------------------------------------
|E2E latency=2.35s |Gather latency=0.00s (0.00%) |Generate time=1.51s (64.20%) |Training time=0.62s (26.18%) |Others=0.23 (9.62%)|CurSamplesPerSec=13.60 |AvgSamplesPerSec=12.51
epoch: 0|step: 97|ppo_ep: 1|act_loss: -0.09326171875|cri_loss: 0.11358642578125|unsuper_loss: 0.0
average reward score: -4.484375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.81s (31.62%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.51
epoch: 0|step: 98|ppo_ep: 1|act_loss: -0.183837890625|cri_loss: 0.23583984375|unsuper_loss: 0.0
average reward score: -4.40625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.44%) |Training time=0.81s (31.61%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.51
[2023-07-01 08:11:47,305] [INFO] [logging.py:96:log_dist] [Rank 0] step=100, skipped=7, lr=[8.974500000000002e-06, 8.974500000000002e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:11:47,488] [INFO] [timer.py:215:stop] epoch=0/micro_step=100/global_step=100, RunningAvgSamplesPerSec=51.32384275389724, CurrSamplesPerSec=50.546857564437786, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:11:47,653] [INFO] [logging.py:96:log_dist] [Rank 0] step=100, skipped=6, lr=[4.7e-06, 4.7e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 99|ppo_ep: 1|act_loss: -0.1881103515625|cri_loss: 0.11492919921875|unsuper_loss: 0.0
average reward score: -3.041015625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.45%) |Training time=0.81s (31.63%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.51
epoch: 0|step: 100|ppo_ep: 1|act_loss: -0.1273193359375|cri_loss: 0.10101318359375|unsuper_loss: 0.0
average reward score: -4.1015625
-------------------------------------------------------------------------------------
|E2E latency=2.56s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.50%) |Training time=0.81s (31.61%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.52 |AvgSamplesPerSec=12.51
epoch: 0|step: 101|ppo_ep: 1|act_loss: -0.06329345703125|cri_loss: 0.048126220703125|unsuper_loss: 0.0
average reward score: -5.5
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.66%) |Training time=0.80s (31.47%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.52
epoch: 0|step: 102|ppo_ep: 1|act_loss: 0.0202789306640625|cri_loss: 0.046173095703125|unsuper_loss: 0.0
average reward score: -4.0
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.68%) |Training time=0.80s (31.44%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.52
epoch: 0|step: 103|ppo_ep: 1|act_loss: -0.0049285888671875|cri_loss: 0.01212310791015625|unsuper_loss: 0.0
average reward score: -4.2109375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.60%) |Training time=0.80s (31.50%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.52
epoch: 0|step: 104|ppo_ep: 1|act_loss: 0.042816162109375|cri_loss: 0.0548095703125|unsuper_loss: 0.0
average reward score: -3.89453125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.47%) |Training time=0.80s (31.56%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.52
epoch: 0|step: 105|ppo_ep: 1|act_loss: -0.0083160400390625|cri_loss: 0.0223388671875|unsuper_loss: 0.0
average reward score: -4.953125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.58%) |Training time=0.80s (31.51%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.52
epoch: 0|step: 106|ppo_ep: 1|act_loss: 0.10870361328125|cri_loss: 0.0516357421875|unsuper_loss: 0.0
average reward score: -3.88671875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.61%) |Training time=0.80s (31.47%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.52
epoch: 0|step: 107|ppo_ep: 1|act_loss: 0.1151123046875|cri_loss: 0.05731201171875|unsuper_loss: 0.0
average reward score: -4.47265625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.48%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.52
epoch: 0|step: 108|ppo_ep: 1|act_loss: 0.1324462890625|cri_loss: 0.06695556640625|unsuper_loss: 0.0
average reward score: -4.80859375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.75%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.52
[2023-07-01 08:12:12,783] [INFO] [logging.py:96:log_dist] [Rank 0] step=110, skipped=7, lr=[9.649706174538074e-06, 9.649706174538074e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:12:12,961] [INFO] [timer.py:215:stop] epoch=0/micro_step=110/global_step=110, RunningAvgSamplesPerSec=51.28166560587494, CurrSamplesPerSec=50.821221532326284, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:12:13,127] [INFO] [logging.py:96:log_dist] [Rank 0] step=110, skipped=6, lr=[4.999729351164122e-06, 4.999729351164122e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 109|ppo_ep: 1|act_loss: 0.06866455078125|cri_loss: 0.0279083251953125|unsuper_loss: 0.0
average reward score: -4.8359375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.55%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.52
epoch: 0|step: 110|ppo_ep: 1|act_loss: 0.04827880859375|cri_loss: 0.019989013671875|unsuper_loss: 0.0
average reward score: -4.7890625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.56%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.52
epoch: 0|step: 111|ppo_ep: 1|act_loss: 0.005802154541015625|cri_loss: 0.01776123046875|unsuper_loss: 0.0
average reward score: -5.4609375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.27%) |Training time=0.81s (31.77%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.52
epoch: 0|step: 112|ppo_ep: 1|act_loss: 0.027587890625|cri_loss: 0.01424407958984375|unsuper_loss: 0.0
average reward score: -3.73046875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.81s (31.62%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.52
epoch: 0|step: 113|ppo_ep: 1|act_loss: -0.01116180419921875|cri_loss: 0.01424407958984375|unsuper_loss: 0.0
average reward score: -4.125
-------------------------------------------------------------------------------------
|E2E latency=2.56s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.25%) |Training time=0.81s (31.76%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.52 |AvgSamplesPerSec=12.52
epoch: 0|step: 114|ppo_ep: 1|act_loss: -0.0208282470703125|cri_loss: 0.0262603759765625|unsuper_loss: 0.0
average reward score: -4.13671875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.45%) |Training time=0.81s (31.63%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.52
epoch: 0|step: 115|ppo_ep: 1|act_loss: -0.02191162109375|cri_loss: 0.0148773193359375|unsuper_loss: 0.0
average reward score: -4.265625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.81s (31.72%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.52
epoch: 0|step: 116|ppo_ep: 1|act_loss: -0.08856201171875|cri_loss: 0.0667724609375|unsuper_loss: 0.0
average reward score: -6.31640625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.28%) |Training time=0.81s (31.76%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.52
epoch: 0|step: 117|ppo_ep: 1|act_loss: -0.0440673828125|cri_loss: 0.05584716796875|unsuper_loss: 0.0
average reward score: -4.23046875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.80s (31.64%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.52
epoch: 0|step: 118|ppo_ep: 1|act_loss: 0.055267333984375|cri_loss: 0.0250244140625|unsuper_loss: 0.0
average reward score: -5.40625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.67%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.52
[2023-07-01 08:12:38,266] [INFO] [logging.py:96:log_dist] [Rank 0] step=120, skipped=7, lr=[9.644483606235295e-06, 9.644483606235295e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:12:38,443] [INFO] [timer.py:215:stop] epoch=0/micro_step=120/global_step=120, RunningAvgSamplesPerSec=51.22001477278903, CurrSamplesPerSec=51.161201298453776, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:12:38,609] [INFO] [logging.py:96:log_dist] [Rank 0] step=120, skipped=6, lr=[4.996685224712077e-06, 4.996685224712077e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 119|ppo_ep: 1|act_loss: -0.058380126953125|cri_loss: 0.051177978515625|unsuper_loss: 0.0
average reward score: -4.32421875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.69%) |Training time=0.80s (31.43%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.52
epoch: 0|step: 120|ppo_ep: 1|act_loss: 0.051849365234375|cri_loss: 0.0214385986328125|unsuper_loss: 0.0
average reward score: -4.51171875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.71%) |Training time=0.80s (31.37%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.52
epoch: 0|step: 121|ppo_ep: 1|act_loss: 0.07427978515625|cri_loss: 0.0521240234375|unsuper_loss: 0.0
average reward score: -3.826171875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.64%) |Training time=0.80s (31.43%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.52
epoch: 0|step: 122|ppo_ep: 1|act_loss: 0.035064697265625|cri_loss: 0.04278564453125|unsuper_loss: 0.0
average reward score: -4.81640625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.60%) |Training time=0.80s (31.49%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.52
epoch: 0|step: 123|ppo_ep: 1|act_loss: 0.066162109375|cri_loss: 0.0307159423828125|unsuper_loss: 0.0
average reward score: -3.54296875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.81s (31.65%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.52
epoch: 0|step: 124|ppo_ep: 1|act_loss: 0.1104736328125|cri_loss: 0.08404541015625|unsuper_loss: 0.0
average reward score: -5.171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.63%) |Training time=0.80s (31.44%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.52
epoch: 0|step: 125|ppo_ep: 1|act_loss: 0.05743408203125|cri_loss: 0.034271240234375|unsuper_loss: 0.0
average reward score: -6.00390625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.81s (31.66%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.53
epoch: 0|step: 126|ppo_ep: 1|act_loss: -0.002399444580078125|cri_loss: 0.039337158203125|unsuper_loss: 0.0
average reward score: -5.828125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.46%) |Training time=0.80s (31.58%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.53
epoch: 0|step: 127|ppo_ep: 1|act_loss: -0.0227508544921875|cri_loss: 0.00868988037109375|unsuper_loss: 0.0
average reward score: -4.015625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.45%) |Training time=0.81s (31.58%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.53
epoch: 0|step: 128|ppo_ep: 1|act_loss: -0.04278564453125|cri_loss: 0.042572021484375|unsuper_loss: 0.0
average reward score: -4.2890625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.50%) |Training time=0.81s (31.57%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.53
[2023-07-01 08:13:03,731] [INFO] [logging.py:96:log_dist] [Rank 0] step=130, skipped=7, lr=[9.632739717588912e-06, 9.632739717588912e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:13:03,914] [INFO] [timer.py:215:stop] epoch=0/micro_step=130/global_step=130, RunningAvgSamplesPerSec=51.19020786678154, CurrSamplesPerSec=50.84000996968942, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:13:04,079] [INFO] [logging.py:96:log_dist] [Rank 0] step=130, skipped=6, lr=[4.99026279355402e-06, 4.99026279355402e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 129|ppo_ep: 1|act_loss: 0.01258087158203125|cri_loss: 0.01308441162109375|unsuper_loss: 0.0
average reward score: -5.953125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.53%) |Training time=0.80s (31.53%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.53
epoch: 0|step: 130|ppo_ep: 1|act_loss: -0.0027561187744140625|cri_loss: 0.01007080078125|unsuper_loss: 0.0
average reward score: -6.265625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.53%) |Training time=0.80s (31.47%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.53
epoch: 0|step: 131|ppo_ep: 1|act_loss: -0.0166015625|cri_loss: 0.00750732421875|unsuper_loss: 0.0
average reward score: -4.7734375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.81s (31.70%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.53
epoch: 0|step: 132|ppo_ep: 1|act_loss: 0.039825439453125|cri_loss: 0.0167694091796875|unsuper_loss: 0.0
average reward score: -5.4921875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.41%) |Training time=0.81s (31.65%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.53
epoch: 0|step: 133|ppo_ep: 1|act_loss: 0.0194854736328125|cri_loss: 0.00791168212890625|unsuper_loss: 0.0
average reward score: -5.390625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.57%) |Training time=0.80s (31.52%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.53
epoch: 0|step: 134|ppo_ep: 1|act_loss: 0.0176239013671875|cri_loss: 0.018646240234375|unsuper_loss: 0.0
average reward score: -3.875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.60%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.53
epoch: 0|step: 135|ppo_ep: 1|act_loss: 0.01038360595703125|cri_loss: 0.0234375|unsuper_loss: 0.0
average reward score: -4.828125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.24%) |Training time=0.81s (31.80%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.53
epoch: 0|step: 136|ppo_ep: 1|act_loss: 0.035369873046875|cri_loss: 0.005126953125|unsuper_loss: 0.0
average reward score: -3.6015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.66%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.53
epoch: 0|step: 137|ppo_ep: 1|act_loss: 0.025634765625|cri_loss: 0.0036468505859375|unsuper_loss: 0.0
average reward score: -3.943359375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.81s (31.78%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.53
epoch: 0|step: 138|ppo_ep: 1|act_loss: 0.0025787353515625|cri_loss: 0.0016832351684570312|unsuper_loss: 0.0
average reward score: -4.96875
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.58%) |Training time=0.80s (31.51%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.53
[2023-07-01 08:13:29,179] [INFO] [logging.py:96:log_dist] [Rank 0] step=140, skipped=7, lr=[9.61449039944247e-06, 9.61449039944247e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:13:29,356] [INFO] [timer.py:215:stop] epoch=0/micro_step=140/global_step=140, RunningAvgSamplesPerSec=51.15801467130955, CurrSamplesPerSec=51.184418805615664, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:13:29,522] [INFO] [logging.py:96:log_dist] [Rank 0] step=140, skipped=6, lr=[4.980470747984265e-06, 4.980470747984265e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 139|ppo_ep: 1|act_loss: 0.023345947265625|cri_loss: 0.00701141357421875|unsuper_loss: 0.0
average reward score: -4.34375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.57%) |Training time=0.80s (31.52%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.53
epoch: 0|step: 140|ppo_ep: 1|act_loss: -0.0290374755859375|cri_loss: 0.0136871337890625|unsuper_loss: 0.0
average reward score: -4.765625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.80s (31.67%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.53
epoch: 0|step: 141|ppo_ep: 1|act_loss: -0.04620361328125|cri_loss: 0.0184783935546875|unsuper_loss: 0.0
average reward score: -4.296875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.53%) |Training time=0.80s (31.55%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.53
epoch: 0|step: 142|ppo_ep: 1|act_loss: -0.06597900390625|cri_loss: 0.0247344970703125|unsuper_loss: 0.0
average reward score: -3.38671875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.63%) |Training time=0.80s (31.45%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.53
epoch: 0|step: 143|ppo_ep: 1|act_loss: -0.005218505859375|cri_loss: 0.0203857421875|unsuper_loss: 0.0
average reward score: -4.80859375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.60%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.53
epoch: 0|step: 144|ppo_ep: 1|act_loss: -0.04986572265625|cri_loss: 0.0144195556640625|unsuper_loss: 0.0
average reward score: -4.14453125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.80s (31.62%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.53
epoch: 0|step: 145|ppo_ep: 1|act_loss: -0.057769775390625|cri_loss: 0.03375244140625|unsuper_loss: 0.0
average reward score: -4.15625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.68%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.53
epoch: 0|step: 146|ppo_ep: 1|act_loss: 0.0311126708984375|cri_loss: 0.0077056884765625|unsuper_loss: 0.0
average reward score: -5.078125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.70%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.53
epoch: 0|step: 147|ppo_ep: 1|act_loss: 0.042816162109375|cri_loss: 0.009368896484375|unsuper_loss: 0.0
average reward score: -4.25
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.50%) |Training time=0.80s (31.54%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.53
epoch: 0|step: 148|ppo_ep: 1|act_loss: 0.0704345703125|cri_loss: 0.0168609619140625|unsuper_loss: 0.0
average reward score: -4.3125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.58%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.53
[2023-07-01 08:13:54,640] [INFO] [logging.py:96:log_dist] [Rank 0] step=150, skipped=7, lr=[9.589760345240206e-06, 9.589760345240206e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:13:54,821] [INFO] [timer.py:215:stop] epoch=0/micro_step=150/global_step=150, RunningAvgSamplesPerSec=51.12761146442748, CurrSamplesPerSec=50.58425311398798, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:13:54,987] [INFO] [logging.py:96:log_dist] [Rank 0] step=150, skipped=6, lr=[4.967322337776272e-06, 4.967322337776272e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 149|ppo_ep: 1|act_loss: 0.0423583984375|cri_loss: 0.01122283935546875|unsuper_loss: 0.0
average reward score: -5.63671875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.49%) |Training time=0.81s (31.59%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.53
epoch: 0|step: 150|ppo_ep: 1|act_loss: 0.03692626953125|cri_loss: 0.010467529296875|unsuper_loss: 0.0
average reward score: -4.05078125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.81s (31.52%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.52 |AvgSamplesPerSec=12.53
epoch: 0|step: 151|ppo_ep: 1|act_loss: 0.0020465850830078125|cri_loss: 0.006305694580078125|unsuper_loss: 0.0
average reward score: -4.80078125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.69%) |Training time=0.80s (31.44%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.53
epoch: 0|step: 152|ppo_ep: 1|act_loss: -0.006404876708984375|cri_loss: 0.0030059814453125|unsuper_loss: 0.0
average reward score: -4.875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.73%) |Training time=0.80s (31.33%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.53
epoch: 0|step: 153|ppo_ep: 1|act_loss: -0.00732421875|cri_loss: 0.001430511474609375|unsuper_loss: 0.0
average reward score: -4.1171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.65%) |Training time=0.80s (31.43%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.53
epoch: 0|step: 154|ppo_ep: 1|act_loss: 0.016937255859375|cri_loss: 0.0187530517578125|unsuper_loss: 0.0
average reward score: -4.4609375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.62%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.53
epoch: 0|step: 155|ppo_ep: 1|act_loss: 0.0008111000061035156|cri_loss: 0.0024356842041015625|unsuper_loss: 0.0
average reward score: -6.44140625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.50%) |Training time=0.80s (31.56%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.53
epoch: 0|step: 156|ppo_ep: 1|act_loss: -0.01509857177734375|cri_loss: 0.002330780029296875|unsuper_loss: 0.0
average reward score: -5.44140625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.70%) |Training time=0.80s (31.44%) |Others=0.23 (8.86%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.53
epoch: 0|step: 157|ppo_ep: 1|act_loss: 0.00215911865234375|cri_loss: 0.007144927978515625|unsuper_loss: 0.0
average reward score: -4.046875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.55%) |Training time=0.80s (31.54%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.53
epoch: 0|step: 158|ppo_ep: 1|act_loss: 0.016845703125|cri_loss: 0.003650665283203125|unsuper_loss: 0.0
average reward score: -4.1015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.67%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.53
[2023-07-01 08:14:20,074] [INFO] [logging.py:96:log_dist] [Rank 0] step=160, skipped=7, lr=[9.558583017613959e-06, 9.558583017613959e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:14:20,256] [INFO] [timer.py:215:stop] epoch=0/micro_step=160/global_step=160, RunningAvgSamplesPerSec=51.11356093341441, CurrSamplesPerSec=50.12746005738873, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:14:20,422] [INFO] [logging.py:96:log_dist] [Rank 0] step=160, skipped=6, lr=[4.950835354254168e-06, 4.950835354254168e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 159|ppo_ep: 1|act_loss: 0.041961669921875|cri_loss: 0.01123809814453125|unsuper_loss: 0.0
average reward score: -4.14453125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.20%) |Training time=0.81s (31.86%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.53
epoch: 0|step: 160|ppo_ep: 1|act_loss: 0.045501708984375|cri_loss: 0.007793426513671875|unsuper_loss: 0.0
average reward score: -4.58984375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.42%) |Training time=0.81s (31.68%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.53
epoch: 0|step: 161|ppo_ep: 1|act_loss: 0.056732177734375|cri_loss: 0.034698486328125|unsuper_loss: 0.0
average reward score: -5.8359375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.81s (31.65%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54
epoch: 0|step: 162|ppo_ep: 1|act_loss: 0.0287017822265625|cri_loss: 0.0207672119140625|unsuper_loss: 0.0
average reward score: -3.76953125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.80s (31.59%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54
epoch: 0|step: 163|ppo_ep: 1|act_loss: -0.0164031982421875|cri_loss: 0.0018892288208007812|unsuper_loss: 0.0
average reward score: -4.53125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.25%) |Training time=0.81s (31.77%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54
epoch: 0|step: 164|ppo_ep: 1|act_loss: -0.004817962646484375|cri_loss: 0.0056304931640625|unsuper_loss: 0.0
average reward score: -4.33984375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.41%) |Training time=0.81s (31.65%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.54
epoch: 0|step: 165|ppo_ep: 1|act_loss: -0.1483154296875|cri_loss: 0.184814453125|unsuper_loss: 0.0
average reward score: -5.6796875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.81s (31.70%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.54
epoch: 0|step: 166|ppo_ep: 1|act_loss: -0.0347900390625|cri_loss: 0.00630950927734375|unsuper_loss: 0.0
average reward score: -4.4375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.68%) |Training time=0.80s (31.42%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.54
epoch: 0|step: 167|ppo_ep: 1|act_loss: -0.017822265625|cri_loss: 0.03143310546875|unsuper_loss: 0.0
average reward score: -5.03125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.67%) |Training time=0.80s (31.38%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.54
epoch: 0|step: 168|ppo_ep: 1|act_loss: -0.0017986297607421875|cri_loss: 0.0015411376953125|unsuper_loss: 0.0
average reward score: -4.30078125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.49s (58.57%) |Training time=0.83s (32.49%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.54
[2023-07-01 08:14:45,543] [INFO] [logging.py:96:log_dist] [Rank 0] step=170, skipped=7, lr=[9.521000603104346e-06, 9.521000603104346e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:14:45,721] [INFO] [timer.py:215:stop] epoch=0/micro_step=170/global_step=170, RunningAvgSamplesPerSec=51.08043032722592, CurrSamplesPerSec=50.96938329495944, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:14:45,886] [INFO] [logging.py:96:log_dist] [Rank 0] step=170, skipped=6, lr=[4.931032106219029e-06, 4.931032106219029e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 169|ppo_ep: 1|act_loss: 0.029571533203125|cri_loss: 0.01090240478515625|unsuper_loss: 0.0
average reward score: -4.7265625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.64%) |Training time=0.80s (31.47%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54
epoch: 0|step: 170|ppo_ep: 1|act_loss: 0.008148193359375|cri_loss: 0.006717681884765625|unsuper_loss: 0.0
average reward score: -3.87890625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.77%) |Training time=0.80s (31.41%) |Others=0.22 (8.83%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.54
epoch: 0|step: 171|ppo_ep: 1|act_loss: 0.01207733154296875|cri_loss: 0.004581451416015625|unsuper_loss: 0.0
average reward score: -4.359375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.56%) |Training time=0.80s (31.56%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.54
epoch: 0|step: 172|ppo_ep: 1|act_loss: 0.040924072265625|cri_loss: 0.0110321044921875|unsuper_loss: 0.0
average reward score: -4.7421875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.57%) |Training time=0.80s (31.49%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.54
epoch: 0|step: 173|ppo_ep: 1|act_loss: 0.017120361328125|cri_loss: 0.006683349609375|unsuper_loss: 0.0
average reward score: -6.41796875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.68%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.54
epoch: 0|step: 174|ppo_ep: 1|act_loss: 0.0014066696166992188|cri_loss: 0.00324249267578125|unsuper_loss: 0.0
average reward score: -5.234375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.51%) |Training time=0.80s (31.58%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54
epoch: 0|step: 175|ppo_ep: 1|act_loss: -0.00894927978515625|cri_loss: 0.006313323974609375|unsuper_loss: 0.0
average reward score: -3.623046875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.51%) |Training time=0.80s (31.54%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54
epoch: 0|step: 176|ppo_ep: 1|act_loss: 0.014862060546875|cri_loss: 0.004730224609375|unsuper_loss: 0.0
average reward score: -3.515625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.53%) |Training time=0.80s (31.50%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54
epoch: 0|step: 177|ppo_ep: 1|act_loss: 0.00820159912109375|cri_loss: 0.00408172607421875|unsuper_loss: 0.0
average reward score: -4.8515625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.31%) |Training time=0.81s (31.69%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54
epoch: 0|step: 178|ppo_ep: 1|act_loss: -0.01119232177734375|cri_loss: 0.0016412734985351562|unsuper_loss: 0.0
average reward score: -3.703125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.47%) |Training time=0.81s (31.59%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.54
[2023-07-01 08:15:10,996] [INFO] [logging.py:96:log_dist] [Rank 0] step=180, skipped=7, lr=[9.47706395507748e-06, 9.47706395507748e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:15:11,178] [INFO] [timer.py:215:stop] epoch=0/micro_step=180/global_step=180, RunningAvgSamplesPerSec=51.0658335380177, CurrSamplesPerSec=50.65431599287461, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:15:11,342] [INFO] [logging.py:96:log_dist] [Rank 0] step=180, skipped=6, lr=[4.907939389762475e-06, 4.907939389762475e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 179|ppo_ep: 1|act_loss: -0.0067901611328125|cri_loss: 0.001873016357421875|unsuper_loss: 0.0
average reward score: -5.546875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.81s (31.67%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.54
epoch: 0|step: 180|ppo_ep: 1|act_loss: -0.056732177734375|cri_loss: 0.020721435546875|unsuper_loss: 0.0
average reward score: -4.26953125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.56%) |Training time=0.80s (31.52%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54
epoch: 0|step: 181|ppo_ep: 1|act_loss: -0.01043701171875|cri_loss: 0.0020351409912109375|unsuper_loss: 0.0
average reward score: -3.75390625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.49%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54
epoch: 0|step: 182|ppo_ep: 1|act_loss: -0.0273895263671875|cri_loss: 0.004871368408203125|unsuper_loss: 0.0
average reward score: -4.28125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.81s (31.77%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.54
epoch: 0|step: 183|ppo_ep: 1|act_loss: -0.0251617431640625|cri_loss: 0.0274505615234375|unsuper_loss: 0.0
average reward score: -3.509765625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.31%) |Training time=0.81s (31.74%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54
epoch: 0|step: 184|ppo_ep: 1|act_loss: -0.028076171875|cri_loss: 0.004650115966796875|unsuper_loss: 0.0
average reward score: -7.34375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.55%) |Training time=0.80s (31.55%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.54
epoch: 0|step: 185|ppo_ep: 1|act_loss: 0.06512451171875|cri_loss: 0.04296875|unsuper_loss: 0.0
average reward score: -4.69140625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.57%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.54
epoch: 0|step: 186|ppo_ep: 1|act_loss: 0.03607177734375|cri_loss: 0.01641845703125|unsuper_loss: 0.0
average reward score: -3.08984375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.57%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.54
epoch: 0|step: 187|ppo_ep: 1|act_loss: 0.027374267578125|cri_loss: 0.01361083984375|unsuper_loss: 0.0
average reward score: -3.88671875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.63%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.54
epoch: 0|step: 188|ppo_ep: 1|act_loss: -0.0076751708984375|cri_loss: 0.01186370849609375|unsuper_loss: 0.0
average reward score: -4.859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.49s (58.59%) |Training time=0.82s (32.47%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.54
[2023-07-01 08:15:36,423] [INFO] [logging.py:96:log_dist] [Rank 0] step=190, skipped=7, lr=[9.426832524914468e-06, 9.426832524914468e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:15:36,600] [INFO] [timer.py:215:stop] epoch=0/micro_step=190/global_step=190, RunningAvgSamplesPerSec=51.04633948371713, CurrSamplesPerSec=51.249168265605164, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:15:36,765] [INFO] [logging.py:96:log_dist] [Rank 0] step=190, skipped=6, lr=[4.881588452008457e-06, 4.881588452008457e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 189|ppo_ep: 1|act_loss: 0.0013113021850585938|cri_loss: 0.0013742446899414062|unsuper_loss: 0.0
average reward score: -5.7890625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.69%) |Training time=0.80s (31.42%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.54
epoch: 0|step: 190|ppo_ep: 1|act_loss: 0.01316070556640625|cri_loss: 0.013458251953125|unsuper_loss: 0.0
average reward score: -3.8359375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.43%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.54
epoch: 0|step: 191|ppo_ep: 1|act_loss: -0.0120086669921875|cri_loss: 0.0018033981323242188|unsuper_loss: 0.0
average reward score: -5.875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.33%) |Training time=0.81s (31.69%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.54
epoch: 0|step: 192|ppo_ep: 1|act_loss: 0.00186920166015625|cri_loss: 0.00511932373046875|unsuper_loss: 0.0
average reward score: -4.94921875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.26%) |Training time=0.81s (31.77%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.54
epoch: 0|step: 193|ppo_ep: 1|act_loss: -0.0286712646484375|cri_loss: 0.004795074462890625|unsuper_loss: 0.0
average reward score: -5.140625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.56%) |Training time=0.80s (31.52%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.54
epoch: 0|step: 194|ppo_ep: 1|act_loss: -0.01007843017578125|cri_loss: 0.005573272705078125|unsuper_loss: 0.0
average reward score: -3.66796875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.80s (31.58%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54
epoch: 0|step: 195|ppo_ep: 1|act_loss: 0.0251922607421875|cri_loss: 0.00690460205078125|unsuper_loss: 0.0
average reward score: -4.0625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.81s (31.61%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54
epoch: 0|step: 196|ppo_ep: 1|act_loss: 0.003810882568359375|cri_loss: 0.0014600753784179688|unsuper_loss: 0.0
average reward score: -4.02734375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.48%) |Training time=0.81s (31.57%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.54
epoch: 0|step: 197|ppo_ep: 1|act_loss: -0.031768798828125|cri_loss: 0.008697509765625|unsuper_loss: 0.0
average reward score: -5.1640625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.43%) |Training time=0.81s (31.61%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.54
epoch: 0|step: 198|ppo_ep: 1|act_loss: -0.00102996826171875|cri_loss: 0.007480621337890625|unsuper_loss: 0.0
average reward score: -6.10546875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.58%) |Training time=0.80s (31.54%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54
[2023-07-01 08:16:01,904] [INFO] [logging.py:96:log_dist] [Rank 0] step=200, skipped=7, lr=[9.370374281566792e-06, 9.370374281566792e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:16:02,081] [INFO] [timer.py:215:stop] epoch=0/micro_step=200/global_step=200, RunningAvgSamplesPerSec=51.03230639238851, CurrSamplesPerSec=51.198730500227924, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:16:02,246] [INFO] [logging.py:96:log_dist] [Rank 0] step=200, skipped=6, lr=[4.852014948832268e-06, 4.852014948832268e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 199|ppo_ep: 1|act_loss: 0.0186004638671875|cri_loss: 0.003108978271484375|unsuper_loss: 0.0
average reward score: -3.7265625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.69%) |Training time=0.80s (31.44%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.54
epoch: 0|step: 200|ppo_ep: 1|act_loss: 0.003887176513671875|cri_loss: 0.004302978515625|unsuper_loss: 0.0
average reward score: -3.634765625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.53%) |Training time=0.80s (31.51%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.54
epoch: 0|step: 201|ppo_ep: 1|act_loss: -0.005275726318359375|cri_loss: 0.0028781890869140625|unsuper_loss: 0.0
average reward score: -3.716796875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.54%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54
epoch: 0|step: 202|ppo_ep: 1|act_loss: 0.0159454345703125|cri_loss: 0.003612518310546875|unsuper_loss: 0.0
average reward score: -4.38671875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.44%) |Training time=0.80s (31.63%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.54
epoch: 0|step: 203|ppo_ep: 1|act_loss: 0.00446319580078125|cri_loss: 0.001026153564453125|unsuper_loss: 0.0
average reward score: -4.640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.61%) |Training time=0.80s (31.47%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.54
epoch: 0|step: 204|ppo_ep: 1|act_loss: -0.0184326171875|cri_loss: 0.00092315673828125|unsuper_loss: 0.0
average reward score: -4.31640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.59%) |Training time=0.80s (31.40%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.54
epoch: 0|step: 205|ppo_ep: 1|act_loss: -0.0272369384765625|cri_loss: 0.0034389495849609375|unsuper_loss: 0.0
average reward score: -4.15625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.20%) |Training time=0.81s (31.85%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.54
epoch: 0|step: 206|ppo_ep: 1|act_loss: -0.0035800933837890625|cri_loss: 0.00862884521484375|unsuper_loss: 0.0
average reward score: -4.3671875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.57%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54
epoch: 0|step: 207|ppo_ep: 1|act_loss: -0.0309906005859375|cri_loss: 0.00408935546875|unsuper_loss: 0.0
average reward score: -3.22265625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.28%) |Training time=0.81s (31.78%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54
epoch: 0|step: 208|ppo_ep: 1|act_loss: -0.0180206298828125|cri_loss: 0.007373809814453125|unsuper_loss: 0.0
average reward score: -4.63671875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.80s (31.64%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.54
[2023-07-01 08:16:27,348] [INFO] [logging.py:96:log_dist] [Rank 0] step=210, skipped=7, lr=[9.30776561958644e-06, 9.30776561958644e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:16:27,531] [INFO] [timer.py:215:stop] epoch=0/micro_step=210/global_step=210, RunningAvgSamplesPerSec=51.01971803804604, CurrSamplesPerSec=50.701119734787326, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:16:27,697] [INFO] [logging.py:96:log_dist] [Rank 0] step=210, skipped=6, lr=[4.819258896614014e-06, 4.819258896614014e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 209|ppo_ep: 1|act_loss: -0.00366973876953125|cri_loss: 0.0026416778564453125|unsuper_loss: 0.0
average reward score: -3.982421875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.46%) |Training time=0.81s (31.58%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.54
epoch: 0|step: 210|ppo_ep: 1|act_loss: -0.038726806640625|cri_loss: 0.022308349609375|unsuper_loss: 0.0
average reward score: -4.2734375
-------------------------------------------------------------------------------------
|E2E latency=2.56s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.36%) |Training time=0.81s (31.69%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.52 |AvgSamplesPerSec=12.54
epoch: 0|step: 211|ppo_ep: 1|act_loss: 0.0160980224609375|cri_loss: 0.001773834228515625|unsuper_loss: 0.0
average reward score: -3.685546875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.72%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.54
epoch: 0|step: 212|ppo_ep: 1|act_loss: 0.0457763671875|cri_loss: 0.01435089111328125|unsuper_loss: 0.0
average reward score: -5.4609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.58%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54
epoch: 0|step: 213|ppo_ep: 1|act_loss: 0.02874755859375|cri_loss: 0.0050201416015625|unsuper_loss: 0.0
average reward score: -4.97265625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.70%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.54
epoch: 0|step: 214|ppo_ep: 1|act_loss: 0.01007843017578125|cri_loss: 0.001079559326171875|unsuper_loss: 0.0
average reward score: -6.3671875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.47%) |Training time=0.81s (31.59%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54
epoch: 0|step: 215|ppo_ep: 1|act_loss: 0.01255035400390625|cri_loss: 0.01186370849609375|unsuper_loss: 0.0
average reward score: -4.1015625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.50%) |Training time=0.80s (31.55%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54
epoch: 0|step: 216|ppo_ep: 1|act_loss: -0.0134124755859375|cri_loss: 0.001556396484375|unsuper_loss: 0.0
average reward score: -4.82421875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.58%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.54
epoch: 0|step: 217|ppo_ep: 1|act_loss: -0.036376953125|cri_loss: 0.0057830810546875|unsuper_loss: 0.0
average reward score: -4.06640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.73%) |Training time=0.80s (31.36%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.54
epoch: 0|step: 218|ppo_ep: 1|act_loss: -0.036834716796875|cri_loss: 0.00553131103515625|unsuper_loss: 0.0
average reward score: -5.6796875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.67%) |Training time=0.80s (31.36%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.54
[2023-07-01 08:16:52,825] [INFO] [logging.py:96:log_dist] [Rank 0] step=220, skipped=7, lr=[9.239091255755212e-06, 9.239091255755212e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:16:53,003] [INFO] [timer.py:215:stop] epoch=0/micro_step=220/global_step=220, RunningAvgSamplesPerSec=51.0095211309395, CurrSamplesPerSec=50.965996272607, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:16:53,169] [INFO] [logging.py:96:log_dist] [Rank 0] step=220, skipped=6, lr=[4.783364618091804e-06, 4.783364618091804e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 219|ppo_ep: 1|act_loss: -0.02386474609375|cri_loss: 0.0040283203125|unsuper_loss: 0.0
average reward score: -4.2421875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.60%) |Training time=0.80s (31.51%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.54
epoch: 0|step: 220|ppo_ep: 1|act_loss: -0.0226593017578125|cri_loss: 0.0017938613891601562|unsuper_loss: 0.0
average reward score: -3.9765625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.54%) |Training time=0.80s (31.51%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.54
epoch: 0|step: 221|ppo_ep: 1|act_loss: 0.0114593505859375|cri_loss: 0.010986328125|unsuper_loss: 0.0
average reward score: -5.4453125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.57%) |Training time=0.80s (31.55%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.54
epoch: 0|step: 222|ppo_ep: 1|act_loss: 0.00458526611328125|cri_loss: 0.0015325546264648438|unsuper_loss: 0.0
average reward score: -4.70703125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.41%) |Training time=0.81s (31.64%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55
epoch: 0|step: 223|ppo_ep: 1|act_loss: 0.0121612548828125|cri_loss: 0.007373809814453125|unsuper_loss: 0.0
average reward score: -5.44921875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.81s (31.67%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.55
epoch: 0|step: 224|ppo_ep: 1|act_loss: 0.02984619140625|cri_loss: 0.005184173583984375|unsuper_loss: 0.0
average reward score: -6.2109375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.41%) |Training time=0.81s (31.61%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.54
epoch: 0|step: 225|ppo_ep: 1|act_loss: 0.0408935546875|cri_loss: 0.00785064697265625|unsuper_loss: 0.0
average reward score: -6.0625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.68%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.54
epoch: 0|step: 226|ppo_ep: 1|act_loss: -0.003871917724609375|cri_loss: 0.00443267822265625|unsuper_loss: 0.0
average reward score: -5.19921875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.81s (31.71%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55
epoch: 0|step: 227|ppo_ep: 1|act_loss: -0.005840301513671875|cri_loss: 0.0009756088256835938|unsuper_loss: 0.0
average reward score: -3.80859375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.56%) |Training time=0.80s (31.50%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55
epoch: 0|step: 228|ppo_ep: 1|act_loss: -0.02532958984375|cri_loss: 0.0066375732421875|unsuper_loss: 0.0
average reward score: -3.87890625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.71%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.55
[2023-07-01 08:17:18,305] [INFO] [logging.py:96:log_dist] [Rank 0] step=230, skipped=7, lr=[9.16444411445309e-06, 9.16444411445309e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:17:18,485] [INFO] [timer.py:215:stop] epoch=0/micro_step=230/global_step=230, RunningAvgSamplesPerSec=50.99276075623739, CurrSamplesPerSec=50.51208347618738, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:17:18,651] [INFO] [logging.py:96:log_dist] [Rank 0] step=230, skipped=6, lr=[4.74438068238795e-06, 4.74438068238795e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 229|ppo_ep: 1|act_loss: -0.01898193359375|cri_loss: 0.0040130615234375|unsuper_loss: 0.0
average reward score: -4.7421875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.81s (31.67%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.55
epoch: 0|step: 230|ppo_ep: 1|act_loss: -0.0140533447265625|cri_loss: 0.0019969940185546875|unsuper_loss: 0.0
average reward score: -3.306640625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.52%) |Training time=0.80s (31.60%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55
epoch: 0|step: 231|ppo_ep: 1|act_loss: 0.0028781890869140625|cri_loss: 0.0003573894500732422|unsuper_loss: 0.0
average reward score: -4.6015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.60%) |Training time=0.80s (31.45%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.55
epoch: 0|step: 232|ppo_ep: 1|act_loss: 0.025909423828125|cri_loss: 0.007793426513671875|unsuper_loss: 0.0
average reward score: -4.59375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.53%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55
epoch: 0|step: 233|ppo_ep: 1|act_loss: -0.019439697265625|cri_loss: 0.0020503997802734375|unsuper_loss: 0.0
average reward score: -4.5703125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.53%) |Training time=0.80s (31.52%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55
epoch: 0|step: 234|ppo_ep: 1|act_loss: -0.0200347900390625|cri_loss: 0.0025844573974609375|unsuper_loss: 0.0
average reward score: -3.76953125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.70%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55
epoch: 0|step: 235|ppo_ep: 1|act_loss: -0.037872314453125|cri_loss: 0.0152587890625|unsuper_loss: 0.0
average reward score: -5.0625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.67%) |Training time=0.80s (31.43%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55
epoch: 0|step: 236|ppo_ep: 1|act_loss: 0.0111846923828125|cri_loss: 0.00437164306640625|unsuper_loss: 0.0
average reward score: -4.6875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.70%) |Training time=0.80s (31.42%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.55
epoch: 0|step: 237|ppo_ep: 1|act_loss: 0.0167388916015625|cri_loss: 0.007404327392578125|unsuper_loss: 0.0
average reward score: -3.662109375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.53%) |Training time=0.80s (31.53%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55
epoch: 0|step: 238|ppo_ep: 1|act_loss: -0.00798797607421875|cri_loss: 0.0012331008911132812|unsuper_loss: 0.0
average reward score: -3.802734375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.44%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55
[2023-07-01 08:17:43,740] [INFO] [logging.py:96:log_dist] [Rank 0] step=240, skipped=7, lr=[9.083925201920767e-06, 9.083925201920767e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:17:43,922] [INFO] [timer.py:215:stop] epoch=0/micro_step=240/global_step=240, RunningAvgSamplesPerSec=50.9921219056515, CurrSamplesPerSec=50.55043660844593, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:17:44,088] [INFO] [logging.py:96:log_dist] [Rank 0] step=240, skipped=6, lr=[4.702359839289306e-06, 4.702359839289306e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 239|ppo_ep: 1|act_loss: 0.053131103515625|cri_loss: 0.01371002197265625|unsuper_loss: 0.0
average reward score: -4.88671875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.40%) |Training time=0.81s (31.68%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.55
epoch: 0|step: 240|ppo_ep: 1|act_loss: 0.0006341934204101562|cri_loss: 0.0028324127197265625|unsuper_loss: 0.0
average reward score: -4.39453125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.56%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55
epoch: 0|step: 241|ppo_ep: 1|act_loss: -0.0220184326171875|cri_loss: 0.003536224365234375|unsuper_loss: 0.0
average reward score: -4.1484375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.80s (31.55%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55
epoch: 0|step: 242|ppo_ep: 1|act_loss: 0.0154876708984375|cri_loss: 0.003978729248046875|unsuper_loss: 0.0
average reward score: -3.69921875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.27%) |Training time=0.81s (31.77%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.55
epoch: 0|step: 243|ppo_ep: 1|act_loss: -0.0217742919921875|cri_loss: 0.0021533966064453125|unsuper_loss: 0.0
average reward score: -3.67578125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.22%) |Training time=0.81s (31.81%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.55
epoch: 0|step: 244|ppo_ep: 1|act_loss: 0.0015211105346679688|cri_loss: 0.006099700927734375|unsuper_loss: 0.0
average reward score: -4.1328125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.80s (31.69%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55
epoch: 0|step: 245|ppo_ep: 1|act_loss: 0.0012369155883789062|cri_loss: 0.0012674331665039062|unsuper_loss: 0.0
average reward score: -3.587890625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.71%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55
epoch: 0|step: 246|ppo_ep: 1|act_loss: -0.0059661865234375|cri_loss: 0.0011720657348632812|unsuper_loss: 0.0
average reward score: -4.5234375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.59%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55
epoch: 0|step: 247|ppo_ep: 1|act_loss: -0.0251312255859375|cri_loss: 0.00565338134765625|unsuper_loss: 0.0
average reward score: -4.16796875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.53%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55
epoch: 0|step: 248|ppo_ep: 1|act_loss: 0.01308441162109375|cri_loss: 0.01374053955078125|unsuper_loss: 0.0
average reward score: -5.1015625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.54%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55
[2023-07-01 08:18:09,191] [INFO] [logging.py:96:log_dist] [Rank 0] step=250, skipped=7, lr=[8.9976434695865e-06, 8.9976434695865e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:18:09,368] [INFO] [timer.py:215:stop] epoch=0/micro_step=250/global_step=250, RunningAvgSamplesPerSec=50.9825013275233, CurrSamplesPerSec=51.27168224923838, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:18:09,533] [INFO] [logging.py:96:log_dist] [Rank 0] step=250, skipped=6, lr=[4.657358947870691e-06, 4.657358947870691e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 249|ppo_ep: 1|act_loss: 0.0078582763671875|cri_loss: 0.0013494491577148438|unsuper_loss: 0.0
average reward score: -4.93359375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.64%) |Training time=0.80s (31.45%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.55
epoch: 0|step: 250|ppo_ep: 1|act_loss: -0.00982666015625|cri_loss: 0.002044677734375|unsuper_loss: 0.0
average reward score: -5.76171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.56%) |Training time=0.80s (31.53%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55
epoch: 0|step: 251|ppo_ep: 1|act_loss: -0.002544403076171875|cri_loss: 0.00104522705078125|unsuper_loss: 0.0
average reward score: -5.546875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.71%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55
epoch: 0|step: 252|ppo_ep: 1|act_loss: 0.0118865966796875|cri_loss: 0.0028934478759765625|unsuper_loss: 0.0
average reward score: -3.361328125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.27%) |Training time=0.81s (31.77%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55
[2023-07-01 08:18:19,349] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 253|ppo_ep: 1|act_loss: 0.0092315673828125|cri_loss: 0.0016069412231445312|unsuper_loss: 0.0
average reward score: -5.7734375
-------------------------------------------------------------------------------------
|E2E latency=2.35s |Gather latency=0.00s (0.00%) |Generate time=1.51s (64.20%) |Training time=0.62s (26.19%) |Others=0.23 (9.61%)|CurSamplesPerSec=13.61 |AvgSamplesPerSec=12.55
epoch: 0|step: 254|ppo_ep: 1|act_loss: 0.03448486328125|cri_loss: 0.005481719970703125|unsuper_loss: 0.0
average reward score: -3.416015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.65%) |Training time=0.80s (31.43%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.55
epoch: 0|step: 255|ppo_ep: 1|act_loss: 0.01197052001953125|cri_loss: 0.0013523101806640625|unsuper_loss: 0.0
average reward score: -4.12890625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.45%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55
epoch: 0|step: 256|ppo_ep: 1|act_loss: -0.0228424072265625|cri_loss: 0.00531768798828125|unsuper_loss: 0.0
average reward score: -4.41796875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.47%) |Training time=0.81s (31.57%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.55
epoch: 0|step: 257|ppo_ep: 1|act_loss: 0.0013647079467773438|cri_loss: 0.0010805130004882812|unsuper_loss: 0.0
average reward score: -5.484375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.66%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55
epoch: 0|step: 258|ppo_ep: 1|act_loss: 0.003753662109375|cri_loss: 0.0027256011962890625|unsuper_loss: 0.0
average reward score: -5.03515625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.57%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55
[2023-07-01 08:18:34,445] [INFO] [logging.py:96:log_dist] [Rank 0] step=260, skipped=8, lr=[8.915159034156106e-06, 8.915159034156106e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:18:34,627] [INFO] [timer.py:215:stop] epoch=0/micro_step=260/global_step=260, RunningAvgSamplesPerSec=51.033755874636995, CurrSamplesPerSec=50.62441532185822, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:18:34,793] [INFO] [logging.py:96:log_dist] [Rank 0] step=260, skipped=6, lr=[4.609438899557964e-06, 4.609438899557964e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 259|ppo_ep: 1|act_loss: -0.06292724609375|cri_loss: 0.046142578125|unsuper_loss: 0.0
average reward score: -3.861328125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.68%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.55
epoch: 0|step: 260|ppo_ep: 1|act_loss: -0.01197052001953125|cri_loss: 0.003261566162109375|unsuper_loss: 0.0
average reward score: -4.21484375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.45%) |Training time=0.81s (31.56%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.55
epoch: 0|step: 261|ppo_ep: 1|act_loss: 0.002353668212890625|cri_loss: 0.0032176971435546875|unsuper_loss: 0.0
average reward score: -3.95703125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.41%) |Training time=0.80s (31.61%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55
epoch: 0|step: 262|ppo_ep: 1|act_loss: -0.00551605224609375|cri_loss: 0.0008602142333984375|unsuper_loss: 0.0
average reward score: -5.80859375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.52%) |Training time=0.80s (31.57%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55
epoch: 0|step: 263|ppo_ep: 1|act_loss: 0.01611328125|cri_loss: 0.002994537353515625|unsuper_loss: 0.0
average reward score: -4.23046875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.66%) |Training time=0.80s (31.43%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55
epoch: 0|step: 264|ppo_ep: 1|act_loss: 0.01163482666015625|cri_loss: 0.0012903213500976562|unsuper_loss: 0.0
average reward score: -5.0390625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.53%) |Training time=0.80s (31.50%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55
epoch: 0|step: 265|ppo_ep: 1|act_loss: 0.01558685302734375|cri_loss: 0.0017652511596679688|unsuper_loss: 0.0
average reward score: -4.65234375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.52%) |Training time=0.80s (31.54%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55
epoch: 0|step: 266|ppo_ep: 1|act_loss: 0.05804443359375|cri_loss: 0.04339599609375|unsuper_loss: 0.0
average reward score: -4.12109375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.41%) |Training time=0.80s (31.55%) |Others=0.23 (9.04%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.55
epoch: 0|step: 267|ppo_ep: 1|act_loss: 0.0238189697265625|cri_loss: 0.0033740997314453125|unsuper_loss: 0.0
average reward score: -3.916015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.48s (58.42%) |Training time=0.83s (32.67%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55
epoch: 0|step: 268|ppo_ep: 1|act_loss: -0.029754638671875|cri_loss: 0.010528564453125|unsuper_loss: 0.0
average reward score: -3.60546875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.69%) |Training time=0.80s (31.41%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55
[2023-07-01 08:18:59,890] [INFO] [logging.py:96:log_dist] [Rank 0] step=270, skipped=8, lr=[8.818255905938371e-06, 8.818255905938371e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:19:00,067] [INFO] [timer.py:215:stop] epoch=0/micro_step=270/global_step=270, RunningAvgSamplesPerSec=51.022117875042674, CurrSamplesPerSec=50.931628955549996, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:19:00,234] [INFO] [logging.py:96:log_dist] [Rank 0] step=270, skipped=6, lr=[4.558664535734864e-06, 4.558664535734864e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 269|ppo_ep: 1|act_loss: 0.025909423828125|cri_loss: 0.00881195068359375|unsuper_loss: 0.0
average reward score: -3.75390625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.61%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55
epoch: 0|step: 270|ppo_ep: 1|act_loss: -0.041473388671875|cri_loss: 0.005794525146484375|unsuper_loss: 0.0
average reward score: -4.72265625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.56%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55
epoch: 0|step: 271|ppo_ep: 1|act_loss: -0.0340576171875|cri_loss: 0.003322601318359375|unsuper_loss: 0.0
average reward score: -3.287109375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.41%) |Training time=0.80s (31.61%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.55
epoch: 0|step: 272|ppo_ep: 1|act_loss: -0.01102447509765625|cri_loss: 0.0026416778564453125|unsuper_loss: 0.0
average reward score: -5.94921875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.70%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55
epoch: 0|step: 273|ppo_ep: 1|act_loss: -0.007747650146484375|cri_loss: 0.004848480224609375|unsuper_loss: 0.0
average reward score: -5.1015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.76%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.55
epoch: 0|step: 274|ppo_ep: 1|act_loss: 0.04083251953125|cri_loss: 0.007534027099609375|unsuper_loss: 0.0
average reward score: -6.27734375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.81s (31.70%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55
epoch: 0|step: 275|ppo_ep: 1|act_loss: 0.0180511474609375|cri_loss: 0.00904083251953125|unsuper_loss: 0.0
average reward score: -3.54296875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.31%) |Training time=0.81s (31.67%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55
epoch: 0|step: 276|ppo_ep: 1|act_loss: 0.0172882080078125|cri_loss: 0.0026454925537109375|unsuper_loss: 0.0
average reward score: -4.68359375
-------------------------------------------------------------------------------------
|E2E latency=2.56s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.48%) |Training time=0.81s (31.59%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.51 |AvgSamplesPerSec=12.55
epoch: 0|step: 277|ppo_ep: 1|act_loss: -0.0078582763671875|cri_loss: 0.002429962158203125|unsuper_loss: 0.0
average reward score: -3.79296875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.57%) |Training time=0.80s (31.49%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.55
epoch: 0|step: 278|ppo_ep: 1|act_loss: 0.00759124755859375|cri_loss: 0.0021152496337890625|unsuper_loss: 0.0
average reward score: -5.26171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.53%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.55
[2023-07-01 08:19:25,346] [INFO] [logging.py:96:log_dist] [Rank 0] step=280, skipped=8, lr=[8.715949439291823e-06, 8.715949439291823e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:19:25,525] [INFO] [timer.py:215:stop] epoch=0/micro_step=280/global_step=280, RunningAvgSamplesPerSec=51.010485936504146, CurrSamplesPerSec=50.63415543236121, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:19:25,691] [INFO] [logging.py:96:log_dist] [Rank 0] step=280, skipped=6, lr=[4.5051045600050906e-06, 4.5051045600050906e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 279|ppo_ep: 1|act_loss: -0.0251617431640625|cri_loss: 0.005886077880859375|unsuper_loss: 0.0
average reward score: -4.421875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.71%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.55
epoch: 0|step: 280|ppo_ep: 1|act_loss: -0.01441192626953125|cri_loss: 0.006072998046875|unsuper_loss: 0.0
average reward score: -4.984375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.50%) |Training time=0.80s (31.54%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.55
epoch: 0|step: 281|ppo_ep: 1|act_loss: -0.037933349609375|cri_loss: 0.0198516845703125|unsuper_loss: 0.0
average reward score: -4.2890625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.56%) |Training time=0.80s (31.54%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55
epoch: 0|step: 282|ppo_ep: 1|act_loss: -0.0017490386962890625|cri_loss: 0.001552581787109375|unsuper_loss: 0.0
average reward score: -3.99609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.61%) |Training time=0.80s (31.45%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.55
epoch: 0|step: 283|ppo_ep: 1|act_loss: -0.0034580230712890625|cri_loss: 0.0011396408081054688|unsuper_loss: 0.0
average reward score: -3.802734375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.58%) |Training time=0.80s (31.43%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55
epoch: 0|step: 284|ppo_ep: 1|act_loss: -0.00234222412109375|cri_loss: 0.004055023193359375|unsuper_loss: 0.0
average reward score: -2.943359375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.51%) |Training time=0.80s (31.52%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.55
epoch: 0|step: 285|ppo_ep: 1|act_loss: 0.005573272705078125|cri_loss: 0.002197265625|unsuper_loss: 0.0
average reward score: -6.40625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.51%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.55
epoch: 0|step: 286|ppo_ep: 1|act_loss: 0.02398681640625|cri_loss: 0.006076812744140625|unsuper_loss: 0.0
average reward score: -3.583984375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.69%) |Training time=0.80s (31.44%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55
epoch: 0|step: 287|ppo_ep: 1|act_loss: 0.01427459716796875|cri_loss: 0.001983642578125|unsuper_loss: 0.0
average reward score: -4.96875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.53%) |Training time=0.80s (31.47%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55
epoch: 0|step: 288|ppo_ep: 1|act_loss: -0.00508880615234375|cri_loss: 0.004119873046875|unsuper_loss: 0.0
average reward score: -3.2890625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.20%) |Training time=0.81s (31.81%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.55
[2023-07-01 08:19:50,788] [INFO] [logging.py:96:log_dist] [Rank 0] step=290, skipped=8, lr=[8.608378066732629e-06, 8.608378066732629e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:19:50,970] [INFO] [timer.py:215:stop] epoch=0/micro_step=290/global_step=290, RunningAvgSamplesPerSec=51.00704543891052, CurrSamplesPerSec=50.64934602292882, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:19:51,136] [INFO] [logging.py:96:log_dist] [Rank 0] step=290, skipped=6, lr=[4.448831445228368e-06, 4.448831445228368e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 289|ppo_ep: 1|act_loss: -0.01418304443359375|cri_loss: 0.0033626556396484375|unsuper_loss: 0.0
average reward score: -4.8359375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.70%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55
epoch: 0|step: 290|ppo_ep: 1|act_loss: -0.0270233154296875|cri_loss: 0.00750732421875|unsuper_loss: 0.0
average reward score: -4.3671875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.81s (31.71%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.55
epoch: 0|step: 291|ppo_ep: 1|act_loss: -0.026336669921875|cri_loss: 0.00798797607421875|unsuper_loss: 0.0
average reward score: -4.0546875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.54%) |Training time=0.80s (31.50%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55
epoch: 0|step: 292|ppo_ep: 1|act_loss: -0.0012111663818359375|cri_loss: 0.00342559814453125|unsuper_loss: 0.0
average reward score: -5.90234375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.57%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.55
epoch: 0|step: 293|ppo_ep: 1|act_loss: -0.0221099853515625|cri_loss: 0.005115509033203125|unsuper_loss: 0.0
average reward score: -5.4375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.81s (31.72%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56
epoch: 0|step: 294|ppo_ep: 1|act_loss: 0.0111846923828125|cri_loss: 0.0013399124145507812|unsuper_loss: 0.0
average reward score: -3.54296875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.66%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56
epoch: 0|step: 295|ppo_ep: 1|act_loss: -0.004726409912109375|cri_loss: 0.001857757568359375|unsuper_loss: 0.0
average reward score: -4.54296875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.69%) |Training time=0.80s (31.45%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56
epoch: 0|step: 296|ppo_ep: 1|act_loss: 0.0183868408203125|cri_loss: 0.00313568115234375|unsuper_loss: 0.0
average reward score: -4.5
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.53%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.56
epoch: 0|step: 297|ppo_ep: 1|act_loss: 0.016632080078125|cri_loss: 0.003253936767578125|unsuper_loss: 0.0
average reward score: -4.171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.51%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56
epoch: 0|step: 298|ppo_ep: 1|act_loss: 0.00775909423828125|cri_loss: 0.0005536079406738281|unsuper_loss: 0.0
average reward score: -3.337890625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.58%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56
[2023-07-01 08:20:16,226] [INFO] [logging.py:96:log_dist] [Rank 0] step=300, skipped=8, lr=[8.495687344805339e-06, 8.495687344805339e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:20:16,404] [INFO] [timer.py:215:stop] epoch=0/micro_step=300/global_step=300, RunningAvgSamplesPerSec=51.00193515656933, CurrSamplesPerSec=50.82687970238014, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:20:16,570] [INFO] [logging.py:96:log_dist] [Rank 0] step=300, skipped=6, lr=[4.389921335456253e-06, 4.389921335456253e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 299|ppo_ep: 1|act_loss: 0.01059722900390625|cri_loss: 0.004077911376953125|unsuper_loss: 0.0
average reward score: -4.43359375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.59%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56
[2023-07-01 08:20:19,107] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 300|ppo_ep: 1|act_loss: -0.0200347900390625|cri_loss: 0.0028228759765625|unsuper_loss: 0.0
average reward score: -3.787109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.52s (60.84%) |Training time=0.80s (31.95%) |Others=0.18 (7.21%)|CurSamplesPerSec=12.82 |AvgSamplesPerSec=12.56
epoch: 0|step: 301|ppo_ep: 1|act_loss: -0.0256195068359375|cri_loss: 0.004558563232421875|unsuper_loss: 0.0
average reward score: -2.978515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.64%) |Training time=0.80s (31.45%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56
epoch: 0|step: 302|ppo_ep: 1|act_loss: -0.0184783935546875|cri_loss: 0.0016565322875976562|unsuper_loss: 0.0
average reward score: -3.796875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.55%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.56
epoch: 0|step: 303|ppo_ep: 1|act_loss: -0.0037708282470703125|cri_loss: 0.003780364990234375|unsuper_loss: 0.0
average reward score: -6.62890625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.52%) |Training time=0.80s (31.46%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.56
epoch: 0|step: 304|ppo_ep: 1|act_loss: -0.03582763671875|cri_loss: 0.00493621826171875|unsuper_loss: 0.0
average reward score: -3.51953125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.45%) |Training time=0.81s (31.62%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.56
epoch: 0|step: 305|ppo_ep: 1|act_loss: 0.00661468505859375|cri_loss: 0.0184326171875|unsuper_loss: 0.0
average reward score: -3.390625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.60%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56
epoch: 0|step: 306|ppo_ep: 1|act_loss: 0.0426025390625|cri_loss: 0.006927490234375|unsuper_loss: 0.0
average reward score: -5.01953125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.45%) |Training time=0.80s (31.58%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.56
epoch: 0|step: 307|ppo_ep: 1|act_loss: 0.003116607666015625|cri_loss: 0.0016851425170898438|unsuper_loss: 0.0
average reward score: -4.0859375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.70%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.56
epoch: 0|step: 308|ppo_ep: 1|act_loss: 0.0036067962646484375|cri_loss: 0.01727294921875|unsuper_loss: 0.0
average reward score: -5.296875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.81s (31.70%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.56
[2023-07-01 08:20:41,657] [INFO] [logging.py:96:log_dist] [Rank 0] step=310, skipped=8, lr=[8.37802975712801e-06, 8.37802975712801e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:20:41,840] [INFO] [timer.py:215:stop] epoch=0/micro_step=310/global_step=310, RunningAvgSamplesPerSec=50.99398029145669, CurrSamplesPerSec=50.74482505177784, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:20:42,004] [INFO] [logging.py:96:log_dist] [Rank 0] step=310, skipped=7, lr=[4.334713416080498e-06, 4.334713416080498e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 309|ppo_ep: 1|act_loss: -0.002155303955078125|cri_loss: 0.0010766983032226562|unsuper_loss: 0.0
average reward score: -4.15625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.63%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56
epoch: 0|step: 310|ppo_ep: 1|act_loss: -0.01367950439453125|cri_loss: 0.0010881423950195312|unsuper_loss: 0.0
average reward score: -3.21875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.54%) |Training time=0.80s (31.54%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56
epoch: 0|step: 311|ppo_ep: 1|act_loss: -0.005771636962890625|cri_loss: 0.0018682479858398438|unsuper_loss: 0.0
average reward score: -4.875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.71%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56
epoch: 0|step: 312|ppo_ep: 1|act_loss: -0.01544952392578125|cri_loss: 0.0027294158935546875|unsuper_loss: 0.0
average reward score: -4.171875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.58%) |Training time=0.80s (31.46%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56
epoch: 0|step: 313|ppo_ep: 1|act_loss: 0.0285797119140625|cri_loss: 0.01299285888671875|unsuper_loss: 0.0
average reward score: -5.2265625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.81s (31.71%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56
epoch: 0|step: 314|ppo_ep: 1|act_loss: 0.00305938720703125|cri_loss: 0.002655029296875|unsuper_loss: 0.0
average reward score: -6.28515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.68%) |Training time=0.80s (31.42%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.56
epoch: 0|step: 315|ppo_ep: 1|act_loss: -0.0283660888671875|cri_loss: 0.003993988037109375|unsuper_loss: 0.0
average reward score: -3.17578125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.64%) |Training time=0.80s (31.47%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56
epoch: 0|step: 316|ppo_ep: 1|act_loss: -0.013427734375|cri_loss: 0.0017595291137695312|unsuper_loss: 0.0
average reward score: -4.53515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.50%) |Training time=0.80s (31.58%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56
epoch: 0|step: 317|ppo_ep: 1|act_loss: -0.0115814208984375|cri_loss: 0.0017938613891601562|unsuper_loss: 0.0
average reward score: -3.041015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.61%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56
epoch: 0|step: 318|ppo_ep: 1|act_loss: 0.007427215576171875|cri_loss: 0.0013904571533203125|unsuper_loss: 0.0
average reward score: -4.140625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.62%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56
[2023-07-01 08:21:07,074] [INFO] [logging.py:96:log_dist] [Rank 0] step=320, skipped=8, lr=[8.25556450806418e-06, 8.25556450806418e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:21:07,252] [INFO] [timer.py:215:stop] epoch=0/micro_step=320/global_step=320, RunningAvgSamplesPerSec=50.99215107673227, CurrSamplesPerSec=51.20947440913713, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:21:07,416] [INFO] [logging.py:96:log_dist] [Rank 0] step=320, skipped=7, lr=[4.271015485202956e-06, 4.271015485202956e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 319|ppo_ep: 1|act_loss: 0.0148773193359375|cri_loss: 0.001949310302734375|unsuper_loss: 0.0
average reward score: -4.453125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.54%) |Training time=0.80s (31.52%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56
epoch: 0|step: 320|ppo_ep: 1|act_loss: -0.01023101806640625|cri_loss: 0.002716064453125|unsuper_loss: 0.0
average reward score: -4.55859375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.54%) |Training time=0.80s (31.50%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.56
epoch: 0|step: 321|ppo_ep: 1|act_loss: 0.02069091796875|cri_loss: 0.004039764404296875|unsuper_loss: 0.0
average reward score: -4.16015625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.46%) |Training time=0.81s (31.60%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.56
epoch: 0|step: 322|ppo_ep: 1|act_loss: -0.01363372802734375|cri_loss: 0.0022792816162109375|unsuper_loss: 0.0
average reward score: -4.55078125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.43%) |Training time=0.81s (31.63%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.56
epoch: 0|step: 323|ppo_ep: 1|act_loss: 0.001903533935546875|cri_loss: 0.0011644363403320312|unsuper_loss: 0.0
average reward score: -4.76953125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.62%) |Training time=0.80s (31.48%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56
epoch: 0|step: 324|ppo_ep: 1|act_loss: -0.005157470703125|cri_loss: 0.01177215576171875|unsuper_loss: 0.0
average reward score: -4.0859375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.56%) |Training time=0.80s (31.51%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.56
epoch: 0|step: 325|ppo_ep: 1|act_loss: 0.01084136962890625|cri_loss: 0.0009832382202148438|unsuper_loss: 0.0
average reward score: -2.939453125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.81s (31.60%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56
epoch: 0|step: 326|ppo_ep: 1|act_loss: 0.0060577392578125|cri_loss: 0.0016355514526367188|unsuper_loss: 0.0
average reward score: -5.07421875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.81s (31.59%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56
epoch: 0|step: 327|ppo_ep: 1|act_loss: 0.01934814453125|cri_loss: 0.0032520294189453125|unsuper_loss: 0.0
average reward score: -5.1640625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.71%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56
epoch: 0|step: 328|ppo_ep: 1|act_loss: 0.015380859375|cri_loss: 0.0012655258178710938|unsuper_loss: 0.0
average reward score: -4.75
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.65%) |Training time=0.80s (31.41%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56
[2023-07-01 08:21:32,548] [INFO] [logging.py:96:log_dist] [Rank 0] step=330, skipped=8, lr=[8.12845730730089e-06, 8.12845730730089e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:21:32,725] [INFO] [timer.py:215:stop] epoch=0/micro_step=330/global_step=330, RunningAvgSamplesPerSec=50.98691248485327, CurrSamplesPerSec=51.31480764267895, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:21:32,891] [INFO] [logging.py:96:log_dist] [Rank 0] step=330, skipped=7, lr=[4.204921164949269e-06, 4.204921164949269e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 329|ppo_ep: 1|act_loss: 0.0101776123046875|cri_loss: 0.0007042884826660156|unsuper_loss: 0.0
average reward score: -3.66796875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.64%) |Training time=0.80s (31.42%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56
epoch: 0|step: 330|ppo_ep: 1|act_loss: -0.0103912353515625|cri_loss: 0.0004925727844238281|unsuper_loss: 0.0
average reward score: -4.38671875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.53%) |Training time=0.80s (31.53%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56
epoch: 0|step: 331|ppo_ep: 1|act_loss: 0.011505126953125|cri_loss: 0.005535125732421875|unsuper_loss: 0.0
average reward score: -4.19921875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.57%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56
epoch: 0|step: 332|ppo_ep: 1|act_loss: -0.0077362060546875|cri_loss: 0.0012226104736328125|unsuper_loss: 0.0
average reward score: -5.90625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.65%) |Training time=0.80s (31.46%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56
epoch: 0|step: 333|ppo_ep: 1|act_loss: 0.0028171539306640625|cri_loss: 0.0014162063598632812|unsuper_loss: 0.0
average reward score: -4.64453125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.56%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.56
epoch: 0|step: 334|ppo_ep: 1|act_loss: -0.0133514404296875|cri_loss: 0.0021381378173828125|unsuper_loss: 0.0
average reward score: -3.041015625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.61%) |Training time=0.80s (31.43%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56
epoch: 0|step: 335|ppo_ep: 1|act_loss: -0.01678466796875|cri_loss: 0.00934600830078125|unsuper_loss: 0.0
average reward score: -4.55078125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.80s (31.62%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56
epoch: 0|step: 336|ppo_ep: 1|act_loss: 0.0301971435546875|cri_loss: 0.005153656005859375|unsuper_loss: 0.0
average reward score: -4.515625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.72%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56
epoch: 0|step: 337|ppo_ep: 1|act_loss: 0.0266571044921875|cri_loss: 0.0028934478759765625|unsuper_loss: 0.0
average reward score: -3.13671875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.22%) |Training time=0.81s (31.85%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56
epoch: 0|step: 338|ppo_ep: 1|act_loss: 0.006450653076171875|cri_loss: 0.0003998279571533203|unsuper_loss: 0.0
average reward score: -3.95703125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.79%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56
[2023-07-01 08:21:57,993] [INFO] [logging.py:96:log_dist] [Rank 0] step=340, skipped=8, lr=[7.996880145624267e-06, 7.996880145624267e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:21:58,175] [INFO] [timer.py:215:stop] epoch=0/micro_step=340/global_step=340, RunningAvgSamplesPerSec=50.97940516109481, CurrSamplesPerSec=50.43255724297354, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:21:58,341] [INFO] [logging.py:96:log_dist] [Rank 0] step=340, skipped=7, lr=[4.136519888601191e-06, 4.136519888601191e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 339|ppo_ep: 1|act_loss: 0.0111236572265625|cri_loss: 0.00136566162109375|unsuper_loss: 0.0
average reward score: -4.484375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.81s (31.73%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.56
epoch: 0|step: 340|ppo_ep: 1|act_loss: -0.007747650146484375|cri_loss: 0.0022678375244140625|unsuper_loss: 0.0
average reward score: -4.28125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.54%) |Training time=0.80s (31.47%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.56
epoch: 0|step: 341|ppo_ep: 1|act_loss: -0.0238037109375|cri_loss: 0.006744384765625|unsuper_loss: 0.0
average reward score: -3.775390625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.47%) |Training time=0.81s (31.63%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56
epoch: 0|step: 342|ppo_ep: 1|act_loss: -0.08868408203125|cri_loss: 0.07049560546875|unsuper_loss: 0.0
average reward score: -3.43359375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.68%) |Training time=0.80s (31.42%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56
epoch: 0|step: 343|ppo_ep: 1|act_loss: -0.0002334117889404297|cri_loss: 0.00783538818359375|unsuper_loss: 0.0
average reward score: -3.173828125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.56%) |Training time=0.80s (31.52%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56
epoch: 0|step: 344|ppo_ep: 1|act_loss: -0.0144195556640625|cri_loss: 0.0038890838623046875|unsuper_loss: 0.0
average reward score: -5.69140625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.50%) |Training time=0.80s (31.56%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56
epoch: 0|step: 345|ppo_ep: 1|act_loss: -0.027252197265625|cri_loss: 0.002620697021484375|unsuper_loss: 0.0
average reward score: -3.32421875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.48s (58.12%) |Training time=0.83s (32.85%) |Others=0.23 (9.03%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56
epoch: 0|step: 346|ppo_ep: 1|act_loss: 0.0034809112548828125|cri_loss: 0.0004315376281738281|unsuper_loss: 0.0
average reward score: -5.625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.85%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56
epoch: 0|step: 347|ppo_ep: 1|act_loss: 0.03582763671875|cri_loss: 0.005672454833984375|unsuper_loss: 0.0
average reward score: -2.66015625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.39%) |Training time=0.80s (31.68%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.56
epoch: 0|step: 348|ppo_ep: 1|act_loss: 0.032073974609375|cri_loss: 0.007068634033203125|unsuper_loss: 0.0
average reward score: -3.88671875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.21%) |Training time=0.81s (31.80%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.56
[2023-07-01 08:22:23,415] [INFO] [logging.py:96:log_dist] [Rank 0] step=350, skipped=8, lr=[7.861011062196035e-06, 7.861011062196035e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:22:23,593] [INFO] [timer.py:215:stop] epoch=0/micro_step=350/global_step=350, RunningAvgSamplesPerSec=50.96653744295602, CurrSamplesPerSec=50.626515813038566, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:22:23,759] [INFO] [logging.py:96:log_dist] [Rank 0] step=350, skipped=7, lr=[4.0659042110196635e-06, 4.0659042110196635e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 349|ppo_ep: 1|act_loss: 0.0369873046875|cri_loss: 0.00560760498046875|unsuper_loss: 0.0
average reward score: -4.5859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.75%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56
epoch: 0|step: 350|ppo_ep: 1|act_loss: 0.031951904296875|cri_loss: 0.0038471221923828125|unsuper_loss: 0.0
average reward score: -5.29296875
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.37%) |Training time=0.80s (31.66%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.56
epoch: 0|step: 351|ppo_ep: 1|act_loss: 0.03955078125|cri_loss: 0.006744384765625|unsuper_loss: 0.0
average reward score: -4.9375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.25%) |Training time=0.81s (31.77%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.56
epoch: 0|step: 352|ppo_ep: 1|act_loss: 0.01468658447265625|cri_loss: 0.0015878677368164062|unsuper_loss: 0.0
average reward score: -3.06640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.29%) |Training time=0.80s (31.73%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.56
epoch: 0|step: 353|ppo_ep: 1|act_loss: 0.01084136962890625|cri_loss: 0.001827239990234375|unsuper_loss: 0.0
average reward score: -5.45703125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (31.93%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56
epoch: 0|step: 354|ppo_ep: 1|act_loss: 0.0125274658203125|cri_loss: 0.0015544891357421875|unsuper_loss: 0.0
average reward score: -4.9296875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.73%) |Training time=0.82s (32.30%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.56
epoch: 0|step: 355|ppo_ep: 1|act_loss: -0.02801513671875|cri_loss: 0.00262451171875|unsuper_loss: 0.0
average reward score: -5.140625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.85%) |Training time=0.82s (32.22%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56
epoch: 0|step: 356|ppo_ep: 1|act_loss: -0.0174407958984375|cri_loss: 0.0144805908203125|unsuper_loss: 0.0
average reward score: -4.74609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.00%) |Training time=0.82s (32.05%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56
epoch: 0|step: 357|ppo_ep: 1|act_loss: -0.053009033203125|cri_loss: 0.01143646240234375|unsuper_loss: 0.0
average reward score: -4.86328125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.03%) |Training time=0.81s (31.99%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56
epoch: 0|step: 358|ppo_ep: 1|act_loss: -0.052337646484375|cri_loss: 0.0169219970703125|unsuper_loss: 0.0
average reward score: -3.61328125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (31.96%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56
[2023-07-01 08:22:48,837] [INFO] [logging.py:96:log_dist] [Rank 0] step=360, skipped=8, lr=[7.721033903645878e-06, 7.721033903645878e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:22:49,017] [INFO] [timer.py:215:stop] epoch=0/micro_step=360/global_step=360, RunningAvgSamplesPerSec=50.94331245280376, CurrSamplesPerSec=50.46872104636192, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:22:49,183] [INFO] [logging.py:96:log_dist] [Rank 0] step=360, skipped=7, lr=[3.993169683407347e-06, 3.993169683407347e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 359|ppo_ep: 1|act_loss: -0.08758544921875|cri_loss: 0.039337158203125|unsuper_loss: 0.0
average reward score: -4.328125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.24%) |Training time=0.81s (31.85%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56
epoch: 0|step: 360|ppo_ep: 1|act_loss: -0.01544952392578125|cri_loss: 0.002696990966796875|unsuper_loss: 0.0
average reward score: -6.09765625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.38%) |Training time=0.80s (31.68%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.56
epoch: 0|step: 361|ppo_ep: 1|act_loss: -0.0236053466796875|cri_loss: 0.01218414306640625|unsuper_loss: 0.0
average reward score: -4.5078125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.41%) |Training time=0.80s (31.66%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56
epoch: 0|step: 362|ppo_ep: 1|act_loss: 0.01107025146484375|cri_loss: 0.0016889572143554688|unsuper_loss: 0.0
average reward score: -5.65234375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.80s (31.64%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56
epoch: 0|step: 363|ppo_ep: 1|act_loss: 0.02008056640625|cri_loss: 0.0062103271484375|unsuper_loss: 0.0
average reward score: -5.75390625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.28%) |Training time=0.81s (31.78%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56
epoch: 0|step: 364|ppo_ep: 1|act_loss: -0.0191650390625|cri_loss: 0.0195770263671875|unsuper_loss: 0.0
average reward score: -3.40234375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.25%) |Training time=0.81s (31.83%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56
epoch: 0|step: 365|ppo_ep: 1|act_loss: 0.028167724609375|cri_loss: 0.00469970703125|unsuper_loss: 0.0
average reward score: -4.2265625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.49%) |Training time=0.80s (31.55%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.66 |AvgSamplesPerSec=12.56
epoch: 0|step: 366|ppo_ep: 1|act_loss: 0.0152435302734375|cri_loss: 0.0066375732421875|unsuper_loss: 0.0
average reward score: -3.33203125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.44%) |Training time=0.80s (31.58%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.56
epoch: 0|step: 367|ppo_ep: 1|act_loss: 0.0253143310546875|cri_loss: 0.00949859619140625|unsuper_loss: 0.0
average reward score: -4.7890625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.16%) |Training time=0.81s (31.82%) |Others=0.23 (9.02%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56
epoch: 0|step: 368|ppo_ep: 1|act_loss: -0.0552978515625|cri_loss: 0.061614990234375|unsuper_loss: 0.0
average reward score: -3.328125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.28%) |Training time=0.81s (31.72%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56
[2023-07-01 08:23:14,206] [INFO] [logging.py:96:log_dist] [Rank 0] step=370, skipped=8, lr=[7.5771380753056264e-06, 7.5771380753056264e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:23:14,388] [INFO] [timer.py:215:stop] epoch=0/micro_step=370/global_step=370, RunningAvgSamplesPerSec=50.93774701231196, CurrSamplesPerSec=50.07629021757177, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:23:14,554] [INFO] [logging.py:96:log_dist] [Rank 0] step=370, skipped=7, lr=[3.918414724016767e-06, 3.918414724016767e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 369|ppo_ep: 1|act_loss: 0.0063323974609375|cri_loss: 0.0034465789794921875|unsuper_loss: 0.0
average reward score: -4.95703125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.07%) |Training time=0.81s (32.00%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56
epoch: 0|step: 370|ppo_ep: 1|act_loss: -0.0552978515625|cri_loss: 0.0390625|unsuper_loss: 0.0
average reward score: -4.578125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.97%) |Training time=0.81s (32.04%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56
epoch: 0|step: 371|ppo_ep: 1|act_loss: 0.01149749755859375|cri_loss: 0.0013828277587890625|unsuper_loss: 0.0
average reward score: -4.96484375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.80s (31.66%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56
epoch: 0|step: 372|ppo_ep: 1|act_loss: -0.0081939697265625|cri_loss: 0.0030345916748046875|unsuper_loss: 0.0
average reward score: -4.76171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.24%) |Training time=0.81s (31.77%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56
epoch: 0|step: 373|ppo_ep: 1|act_loss: 0.0308380126953125|cri_loss: 0.01311492919921875|unsuper_loss: 0.0
average reward score: -3.8203125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.02%) |Training time=0.82s (32.05%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56
epoch: 0|step: 374|ppo_ep: 1|act_loss: -0.01399993896484375|cri_loss: 0.005680084228515625|unsuper_loss: 0.0
average reward score: -4.66015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.15%) |Training time=0.81s (31.92%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56
epoch: 0|step: 375|ppo_ep: 1|act_loss: -0.01334381103515625|cri_loss: 0.002254486083984375|unsuper_loss: 0.0
average reward score: -4.1875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.12%) |Training time=0.81s (32.01%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56
epoch: 0|step: 376|ppo_ep: 1|act_loss: -0.0399169921875|cri_loss: 0.006500244140625|unsuper_loss: 0.0
average reward score: -4.72265625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.17%) |Training time=0.81s (31.85%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56
epoch: 0|step: 377|ppo_ep: 1|act_loss: -0.066162109375|cri_loss: 0.05963134765625|unsuper_loss: 0.0
average reward score: -3.533203125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.14%) |Training time=0.81s (31.87%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56
epoch: 0|step: 378|ppo_ep: 1|act_loss: 0.0028820037841796875|cri_loss: 0.0013561248779296875|unsuper_loss: 0.0
average reward score: -5.28515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.88%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56
[2023-07-01 08:23:39,613] [INFO] [logging.py:96:log_dist] [Rank 0] step=380, skipped=8, lr=[7.429518284921874e-06, 7.429518284921874e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:23:39,791] [INFO] [timer.py:215:stop] epoch=0/micro_step=380/global_step=380, RunningAvgSamplesPerSec=50.92096092837207, CurrSamplesPerSec=50.579792461817675, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:23:39,955] [INFO] [logging.py:96:log_dist] [Rank 0] step=380, skipped=7, lr=[3.841740484979002e-06, 3.841740484979002e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 379|ppo_ep: 1|act_loss: 0.0008597373962402344|cri_loss: 0.0011796951293945312|unsuper_loss: 0.0
average reward score: -4.34765625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.26%) |Training time=0.81s (31.85%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.56
epoch: 0|step: 380|ppo_ep: 1|act_loss: 0.01435089111328125|cri_loss: 0.003143310546875|unsuper_loss: 0.0
average reward score: -4.78515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.58%) |Training time=0.80s (31.48%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.56
epoch: 0|step: 381|ppo_ep: 1|act_loss: 0.0233306884765625|cri_loss: 0.0116119384765625|unsuper_loss: 0.0
average reward score: -6.07421875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.16%) |Training time=0.81s (31.93%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56
epoch: 0|step: 382|ppo_ep: 1|act_loss: 0.047271728515625|cri_loss: 0.0091552734375|unsuper_loss: 0.0
average reward score: -3.08203125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.80s (31.69%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.56
epoch: 0|step: 383|ppo_ep: 1|act_loss: 0.009613037109375|cri_loss: 0.0009813308715820312|unsuper_loss: 0.0
average reward score: -5.50390625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.31%) |Training time=0.80s (31.72%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.66 |AvgSamplesPerSec=12.56
epoch: 0|step: 384|ppo_ep: 1|act_loss: -0.0247039794921875|cri_loss: 0.005100250244140625|unsuper_loss: 0.0
average reward score: -3.05859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.87%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56
epoch: 0|step: 385|ppo_ep: 1|act_loss: -0.032440185546875|cri_loss: 0.0112762451171875|unsuper_loss: 0.0
average reward score: -6.33203125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.80s (31.62%) |Others=0.23 (9.02%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56
epoch: 0|step: 386|ppo_ep: 1|act_loss: -0.008636474609375|cri_loss: 0.0224456787109375|unsuper_loss: 0.0
average reward score: -5.6640625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.19%) |Training time=0.81s (31.82%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56
epoch: 0|step: 387|ppo_ep: 1|act_loss: -0.01513671875|cri_loss: 0.005428314208984375|unsuper_loss: 0.0
average reward score: -5.59375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.18%) |Training time=0.81s (31.86%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.56
epoch: 0|step: 388|ppo_ep: 1|act_loss: -0.0235748291015625|cri_loss: 0.0228729248046875|unsuper_loss: 0.0
average reward score: -4.10546875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.22%) |Training time=0.81s (31.87%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56
[2023-07-01 08:24:05,006] [INFO] [logging.py:96:log_dist] [Rank 0] step=390, skipped=8, lr=[7.278374279193815e-06, 7.278374279193815e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:24:05,189] [INFO] [timer.py:215:stop] epoch=0/micro_step=390/global_step=390, RunningAvgSamplesPerSec=50.91174566441504, CurrSamplesPerSec=50.332623194574396, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:24:05,354] [INFO] [logging.py:96:log_dist] [Rank 0] step=390, skipped=7, lr=[3.763250715433111e-06, 3.763250715433111e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 389|ppo_ep: 1|act_loss: 0.004604339599609375|cri_loss: 0.01288604736328125|unsuper_loss: 0.0
average reward score: -3.111328125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.89%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56
epoch: 0|step: 390|ppo_ep: 1|act_loss: 0.01189422607421875|cri_loss: 0.0021514892578125|unsuper_loss: 0.0
average reward score: -4.796875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.81s (31.72%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56
epoch: 0|step: 391|ppo_ep: 1|act_loss: 0.009765625|cri_loss: 0.00048041343688964844|unsuper_loss: 0.0
average reward score: -4.7578125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.21%) |Training time=0.81s (31.84%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56
epoch: 0|step: 392|ppo_ep: 1|act_loss: 0.021026611328125|cri_loss: 0.0011301040649414062|unsuper_loss: 0.0
average reward score: -3.669921875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.08%) |Training time=0.81s (31.97%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56
epoch: 0|step: 393|ppo_ep: 1|act_loss: 0.0030460357666015625|cri_loss: 0.0036773681640625|unsuper_loss: 0.0
average reward score: -3.9609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.17%) |Training time=0.81s (31.94%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56
epoch: 0|step: 394|ppo_ep: 1|act_loss: -0.05999755859375|cri_loss: 0.07647705078125|unsuper_loss: 0.0
average reward score: -3.859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.14%) |Training time=0.81s (31.95%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.56
epoch: 0|step: 395|ppo_ep: 1|act_loss: -0.00983428955078125|cri_loss: 0.0021266937255859375|unsuper_loss: 0.0
average reward score: -5.46875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.16%) |Training time=0.81s (31.91%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56
epoch: 0|step: 396|ppo_ep: 1|act_loss: -0.056915283203125|cri_loss: 0.0280914306640625|unsuper_loss: 0.0
average reward score: -3.609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.56%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56
epoch: 0|step: 397|ppo_ep: 1|act_loss: 0.005809783935546875|cri_loss: 0.003498077392578125|unsuper_loss: 0.0
average reward score: -3.99609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.58%) |Training time=0.80s (31.54%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.56
epoch: 0|step: 398|ppo_ep: 1|act_loss: -0.027435302734375|cri_loss: 0.005130767822265625|unsuper_loss: 0.0
average reward score: -3.90625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.60%) |Training time=0.80s (31.48%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.56
[2023-07-01 08:24:30,402] [INFO] [logging.py:96:log_dist] [Rank 0] step=400, skipped=8, lr=[7.1239105734927765e-06, 7.1239105734927765e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:24:30,579] [INFO] [timer.py:215:stop] epoch=0/micro_step=400/global_step=400, RunningAvgSamplesPerSec=50.90512310167677, CurrSamplesPerSec=51.14447434464079, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:24:30,746] [INFO] [logging.py:96:log_dist] [Rank 0] step=400, skipped=7, lr=[3.6830516211415224e-06, 3.6830516211415224e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 399|ppo_ep: 1|act_loss: -0.00937652587890625|cri_loss: 0.0011844635009765625|unsuper_loss: 0.0
average reward score: -5.375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.58%) |Training time=0.80s (31.44%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56
epoch: 0|step: 400|ppo_ep: 1|act_loss: -0.01446533203125|cri_loss: 0.0028743743896484375|unsuper_loss: 0.0
average reward score: -3.265625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.43%) |Training time=0.81s (31.61%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.56
epoch: 0|step: 401|ppo_ep: 1|act_loss: 0.0172882080078125|cri_loss: 0.01392364501953125|unsuper_loss: 0.0
average reward score: -4.96875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.43%) |Training time=0.81s (31.65%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.56
epoch: 0|step: 402|ppo_ep: 1|act_loss: 0.0172271728515625|cri_loss: 0.010528564453125|unsuper_loss: 0.0
average reward score: -4.82421875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.62%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56
epoch: 0|step: 403|ppo_ep: 1|act_loss: -0.031341552734375|cri_loss: 0.017852783203125|unsuper_loss: 0.0
average reward score: -2.9453125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.60%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56
epoch: 0|step: 404|ppo_ep: 1|act_loss: 0.0006022453308105469|cri_loss: 0.0010471343994140625|unsuper_loss: 0.0
average reward score: -4.4609375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.66%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56
epoch: 0|step: 405|ppo_ep: 1|act_loss: 0.0167999267578125|cri_loss: 0.0020847320556640625|unsuper_loss: 0.0
average reward score: -4.83203125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.64%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.56
epoch: 0|step: 406|ppo_ep: 1|act_loss: 0.0004553794860839844|cri_loss: 0.0025634765625|unsuper_loss: 0.0
average reward score: -5.453125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.81s (31.64%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56
epoch: 0|step: 407|ppo_ep: 1|act_loss: 0.004215240478515625|cri_loss: 0.0004992485046386719|unsuper_loss: 0.0
average reward score: -3.6875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.62%) |Training time=0.80s (31.47%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56
epoch: 0|step: 408|ppo_ep: 1|act_loss: 0.0067291259765625|cri_loss: 0.0009660720825195312|unsuper_loss: 0.0
average reward score: -5.34765625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.60%) |Training time=0.80s (31.41%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56
[2023-07-01 08:24:55,873] [INFO] [logging.py:96:log_dist] [Rank 0] step=410, skipped=8, lr=[6.966336175129223e-06, 6.966336175129223e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:24:56,050] [INFO] [timer.py:215:stop] epoch=0/micro_step=410/global_step=410, RunningAvgSamplesPerSec=50.90245252307474, CurrSamplesPerSec=50.837121499065205, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:24:56,217] [INFO] [logging.py:96:log_dist] [Rank 0] step=410, skipped=7, lr=[3.6012517207813124e-06, 3.6012517207813124e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 409|ppo_ep: 1|act_loss: 0.013397216796875|cri_loss: 0.0010662078857421875|unsuper_loss: 0.0
average reward score: -4.85546875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.53%) |Training time=0.80s (31.53%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.56
epoch: 0|step: 410|ppo_ep: 1|act_loss: 0.00982666015625|cri_loss: 0.0018157958984375|unsuper_loss: 0.0
average reward score: -4.09375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.54%) |Training time=0.80s (31.51%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.56
epoch: 0|step: 411|ppo_ep: 1|act_loss: 0.01397705078125|cri_loss: 0.002384185791015625|unsuper_loss: 0.0
average reward score: -5.1171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.65%) |Training time=0.80s (31.47%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56
epoch: 0|step: 412|ppo_ep: 1|act_loss: -0.0032558441162109375|cri_loss: 0.0014801025390625|unsuper_loss: 0.0
average reward score: -5.265625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.65%) |Training time=0.80s (31.49%) |Others=0.23 (8.86%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56
epoch: 0|step: 413|ppo_ep: 1|act_loss: -0.0190277099609375|cri_loss: 0.001567840576171875|unsuper_loss: 0.0
average reward score: -4.859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.60%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.56
epoch: 0|step: 414|ppo_ep: 1|act_loss: -0.006626129150390625|cri_loss: 0.0016469955444335938|unsuper_loss: 0.0
average reward score: -4.3515625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.57%) |Training time=0.80s (31.48%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56
epoch: 0|step: 415|ppo_ep: 1|act_loss: 0.008636474609375|cri_loss: 0.002010345458984375|unsuper_loss: 0.0
average reward score: -4.42578125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.61%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56
epoch: 0|step: 416|ppo_ep: 1|act_loss: 0.0136260986328125|cri_loss: 0.00044918060302734375|unsuper_loss: 0.0
average reward score: -4.2734375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.58%) |Training time=0.80s (31.50%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56
epoch: 0|step: 417|ppo_ep: 1|act_loss: 0.0255889892578125|cri_loss: 0.0036296844482421875|unsuper_loss: 0.0
average reward score: -3.576171875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.71%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56
epoch: 0|step: 418|ppo_ep: 1|act_loss: 0.02801513671875|cri_loss: 0.0038089752197265625|unsuper_loss: 0.0
average reward score: -3.234375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.81s (31.71%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.56
[2023-07-01 08:25:21,325] [INFO] [logging.py:96:log_dist] [Rank 0] step=420, skipped=8, lr=[6.805864300541598e-06, 6.805864300541598e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:25:21,507] [INFO] [timer.py:215:stop] epoch=0/micro_step=420/global_step=420, RunningAvgSamplesPerSec=50.90029937105875, CurrSamplesPerSec=50.523415940629654, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:25:21,673] [INFO] [logging.py:96:log_dist] [Rank 0] step=420, skipped=7, lr=[3.5179616991058513e-06, 3.5179616991058513e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 419|ppo_ep: 1|act_loss: 0.029449462890625|cri_loss: 0.00270843505859375|unsuper_loss: 0.0
average reward score: -4.0625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.81s (31.72%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.56
epoch: 0|step: 420|ppo_ep: 1|act_loss: 0.0204925537109375|cri_loss: 0.00555419921875|unsuper_loss: 0.0
average reward score: -4.6328125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.80s (31.64%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56
epoch: 0|step: 421|ppo_ep: 1|act_loss: 0.0174560546875|cri_loss: 0.0080413818359375|unsuper_loss: 0.0
average reward score: -4.3359375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.54%) |Training time=0.80s (31.55%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56
epoch: 0|step: 422|ppo_ep: 1|act_loss: 0.055419921875|cri_loss: 0.02764892578125|unsuper_loss: 0.0
average reward score: -4.83203125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.56%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.56
epoch: 0|step: 423|ppo_ep: 1|act_loss: -0.01543426513671875|cri_loss: 0.0007390975952148438|unsuper_loss: 0.0
average reward score: -4.95703125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.44%) |Training time=0.80s (31.62%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.56
epoch: 0|step: 424|ppo_ep: 1|act_loss: 0.0019207000732421875|cri_loss: 0.023956298828125|unsuper_loss: 0.0
average reward score: -4.03125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.60%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.56
[2023-07-01 08:25:36,587] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 425|ppo_ep: 1|act_loss: -0.0114898681640625|cri_loss: 0.0016374588012695312|unsuper_loss: 0.0
average reward score: -3.88671875
-------------------------------------------------------------------------------------
|E2E latency=2.36s |Gather latency=0.00s (0.00%) |Generate time=1.52s (64.44%) |Training time=0.61s (26.02%) |Others=0.22 (9.54%)|CurSamplesPerSec=13.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 426|ppo_ep: 1|act_loss: -0.022979736328125|cri_loss: 0.00567626953125|unsuper_loss: 0.0
average reward score: -4.6640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.72%) |Training time=0.80s (31.38%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 427|ppo_ep: 1|act_loss: -0.0080718994140625|cri_loss: 0.0008549690246582031|unsuper_loss: 0.0
average reward score: -4.078125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.61%) |Training time=0.80s (31.49%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 428|ppo_ep: 1|act_loss: -0.01349639892578125|cri_loss: 0.0019235610961914062|unsuper_loss: 0.0
average reward score: -6.19921875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.66%) |Training time=0.80s (31.42%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57
[2023-07-01 08:25:46,595] [INFO] [logging.py:96:log_dist] [Rank 0] step=430, skipped=9, lr=[6.659141658731728e-06, 6.659141658731728e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:25:46,773] [INFO] [timer.py:215:stop] epoch=0/micro_step=430/global_step=430, RunningAvgSamplesPerSec=50.936617309465866, CurrSamplesPerSec=50.9904122573574, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:25:46,938] [INFO] [logging.py:96:log_dist] [Rank 0] step=430, skipped=7, lr=[3.43329425717549e-06, 3.43329425717549e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 429|ppo_ep: 1|act_loss: -0.012054443359375|cri_loss: 0.0003750324249267578|unsuper_loss: 0.0
average reward score: -3.53125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.54%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 430|ppo_ep: 1|act_loss: -0.00502777099609375|cri_loss: 0.004634857177734375|unsuper_loss: 0.0
average reward score: -5.0703125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.87%) |Training time=0.79s (31.26%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
epoch: 0|step: 431|ppo_ep: 1|act_loss: 0.0088653564453125|cri_loss: 0.0036773681640625|unsuper_loss: 0.0
average reward score: -4.76953125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.69%) |Training time=0.80s (31.38%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 432|ppo_ep: 1|act_loss: 0.0074310302734375|cri_loss: 0.0007243156433105469|unsuper_loss: 0.0
average reward score: -4.0546875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.46%) |Training time=0.80s (31.58%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 433|ppo_ep: 1|act_loss: 0.0251007080078125|cri_loss: 0.0047454833984375|unsuper_loss: 0.0
average reward score: -4.16015625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.81s (31.66%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 434|ppo_ep: 1|act_loss: 0.0069580078125|cri_loss: 0.0017385482788085938|unsuper_loss: 0.0
average reward score: -5.0078125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.57%) |Training time=0.80s (31.51%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 435|ppo_ep: 1|act_loss: -0.0178985595703125|cri_loss: 0.0014028549194335938|unsuper_loss: 0.0
average reward score: -4.87890625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.55%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
epoch: 0|step: 436|ppo_ep: 1|act_loss: -0.015350341796875|cri_loss: 0.00339508056640625|unsuper_loss: 0.0
average reward score: -4.8984375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.80s (31.56%) |Others=0.23 (9.04%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
epoch: 0|step: 437|ppo_ep: 1|act_loss: 0.01360321044921875|cri_loss: 0.0025768280029296875|unsuper_loss: 0.0
average reward score: -4.46875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.80s (31.59%) |Others=0.23 (9.02%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 438|ppo_ep: 1|act_loss: 0.01806640625|cri_loss: 0.0021915435791015625|unsuper_loss: 0.0
average reward score: -3.462890625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.66%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
[2023-07-01 08:26:12,029] [INFO] [logging.py:96:log_dist] [Rank 0] step=440, skipped=9, lr=[6.493765795627752e-06, 6.493765795627752e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:26:12,209] [INFO] [timer.py:215:stop] epoch=0/micro_step=440/global_step=440, RunningAvgSamplesPerSec=50.93680599438725, CurrSamplesPerSec=51.115958488051035, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:26:12,373] [INFO] [logging.py:96:log_dist] [Rank 0] step=440, skipped=7, lr=[3.3473639598599567e-06, 3.3473639598599567e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 439|ppo_ep: 1|act_loss: 0.025421142578125|cri_loss: 0.00484466552734375|unsuper_loss: 0.0
average reward score: -7.24609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.57%) |Training time=0.80s (31.51%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57
epoch: 0|step: 440|ppo_ep: 1|act_loss: 0.0246124267578125|cri_loss: 0.0032405853271484375|unsuper_loss: 0.0
average reward score: -3.83984375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.61%) |Training time=0.80s (31.47%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.57
epoch: 0|step: 441|ppo_ep: 1|act_loss: 0.0003368854522705078|cri_loss: 0.005702972412109375|unsuper_loss: 0.0
average reward score: -5.40625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.59%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
epoch: 0|step: 442|ppo_ep: 1|act_loss: -0.00249481201171875|cri_loss: 0.0018873214721679688|unsuper_loss: 0.0
average reward score: -4.359375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.62%) |Training time=0.80s (31.45%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 443|ppo_ep: 1|act_loss: 0.003620147705078125|cri_loss: 0.00045180320739746094|unsuper_loss: 0.0
average reward score: -4.2578125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.61%) |Training time=0.80s (31.50%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 444|ppo_ep: 1|act_loss: 0.008819580078125|cri_loss: 0.00811004638671875|unsuper_loss: 0.0
average reward score: -6.05859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.64%) |Training time=0.80s (31.43%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57
epoch: 0|step: 445|ppo_ep: 1|act_loss: -0.057220458984375|cri_loss: 0.039154052734375|unsuper_loss: 0.0
average reward score: -4.6015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.60%) |Training time=0.80s (31.40%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57
epoch: 0|step: 446|ppo_ep: 1|act_loss: 0.0115966796875|cri_loss: 0.002429962158203125|unsuper_loss: 0.0
average reward score: -6.0625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.57%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
epoch: 0|step: 447|ppo_ep: 1|act_loss: 0.051116943359375|cri_loss: 0.021514892578125|unsuper_loss: 0.0
average reward score: -4.27734375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.81s (31.63%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 448|ppo_ep: 1|act_loss: 0.04486083984375|cri_loss: 0.0269927978515625|unsuper_loss: 0.0
average reward score: -4.484375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.50%) |Training time=0.80s (31.54%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
[2023-07-01 08:26:37,454] [INFO] [logging.py:96:log_dist] [Rank 0] step=450, skipped=9, lr=[6.326131898837833e-06, 6.326131898837833e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:26:37,636] [INFO] [timer.py:215:stop] epoch=0/micro_step=450/global_step=450, RunningAvgSamplesPerSec=50.939151605231494, CurrSamplesPerSec=51.02418006191282, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:26:37,801] [INFO] [logging.py:96:log_dist] [Rank 0] step=450, skipped=7, lr=[3.2602870808187955e-06, 3.2602870808187955e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 449|ppo_ep: 1|act_loss: -0.01153564453125|cri_loss: 0.0019741058349609375|unsuper_loss: 0.0
average reward score: -3.67578125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.55%) |Training time=0.80s (31.51%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
epoch: 0|step: 450|ppo_ep: 1|act_loss: -0.0297088623046875|cri_loss: 0.00568389892578125|unsuper_loss: 0.0
average reward score: -4.6328125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.81s (31.60%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 451|ppo_ep: 1|act_loss: 0.006378173828125|cri_loss: 0.0003757476806640625|unsuper_loss: 0.0
average reward score: -5.140625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.81s (31.64%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57
epoch: 0|step: 452|ppo_ep: 1|act_loss: 0.006011962890625|cri_loss: 0.0011262893676757812|unsuper_loss: 0.0
average reward score: -3.2734375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.81s (31.68%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57
epoch: 0|step: 453|ppo_ep: 1|act_loss: 0.22802734375|cri_loss: 0.79931640625|unsuper_loss: 0.0
average reward score: -3.765625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.49%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 454|ppo_ep: 1|act_loss: -0.008758544921875|cri_loss: 0.0018739700317382812|unsuper_loss: 0.0
average reward score: -3.94921875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.48%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57
epoch: 0|step: 455|ppo_ep: 1|act_loss: -0.015228271484375|cri_loss: 0.0011606216430664062|unsuper_loss: 0.0
average reward score: -4.4609375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.50%) |Training time=0.80s (31.56%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 456|ppo_ep: 1|act_loss: -0.0037593841552734375|cri_loss: 0.002017974853515625|unsuper_loss: 0.0
average reward score: -3.4296875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.52%) |Training time=0.80s (31.52%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 457|ppo_ep: 1|act_loss: -0.00904083251953125|cri_loss: 0.0014896392822265625|unsuper_loss: 0.0
average reward score: -5.22265625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.80s (31.60%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57
epoch: 0|step: 458|ppo_ep: 1|act_loss: 0.00726318359375|cri_loss: 0.0018491744995117188|unsuper_loss: 0.0
average reward score: -3.9140625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.75%) |Training time=0.80s (31.36%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
[2023-07-01 08:27:02,920] [INFO] [logging.py:96:log_dist] [Rank 0] step=460, skipped=9, lr=[6.1564667964686156e-06, 6.1564667964686156e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:27:03,098] [INFO] [timer.py:215:stop] epoch=0/micro_step=460/global_step=460, RunningAvgSamplesPerSec=50.93740045981754, CurrSamplesPerSec=51.238622863303334, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:27:03,264] [INFO] [logging.py:96:log_dist] [Rank 0] step=460, skipped=7, lr=[3.1721814451696215e-06, 3.1721814451696215e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 459|ppo_ep: 1|act_loss: 0.01074981689453125|cri_loss: 0.0009255409240722656|unsuper_loss: 0.0
average reward score: -3.005859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.62%) |Training time=0.80s (31.44%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57
epoch: 0|step: 460|ppo_ep: 1|act_loss: 0.0157012939453125|cri_loss: 0.0010919570922851562|unsuper_loss: 0.0
average reward score: -4.234375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.58%) |Training time=0.80s (31.44%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57
epoch: 0|step: 461|ppo_ep: 1|act_loss: 0.023651123046875|cri_loss: 0.0022792816162109375|unsuper_loss: 0.0
average reward score: -4.36328125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.52%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 462|ppo_ep: 1|act_loss: 0.0156097412109375|cri_loss: 0.000743865966796875|unsuper_loss: 0.0
average reward score: -4.14453125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.62%) |Training time=0.80s (31.46%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
epoch: 0|step: 463|ppo_ep: 1|act_loss: -0.00859832763671875|cri_loss: 0.0013189315795898438|unsuper_loss: 0.0
average reward score: -4.00390625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.63%) |Training time=0.80s (31.46%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57
epoch: 0|step: 464|ppo_ep: 1|act_loss: -0.00014662742614746094|cri_loss: 0.0007381439208984375|unsuper_loss: 0.0
average reward score: -3.0
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.56%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 465|ppo_ep: 1|act_loss: -0.020477294921875|cri_loss: 0.00197601318359375|unsuper_loss: 0.0
average reward score: -4.125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.43%) |Training time=0.81s (31.60%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57
epoch: 0|step: 466|ppo_ep: 1|act_loss: -0.004871368408203125|cri_loss: 0.0019664764404296875|unsuper_loss: 0.0
average reward score: -4.66015625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.81s (31.72%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 467|ppo_ep: 1|act_loss: -0.0090179443359375|cri_loss: 0.0015735626220703125|unsuper_loss: 0.0
average reward score: -4.17578125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.53%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 468|ppo_ep: 1|act_loss: 0.00807952880859375|cri_loss: 0.0014333724975585938|unsuper_loss: 0.0
average reward score: -2.8203125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.45%) |Training time=0.80s (31.58%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
[2023-07-01 08:27:28,380] [INFO] [logging.py:96:log_dist] [Rank 0] step=470, skipped=9, lr=[5.9850000650835e-06, 5.9850000650835e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:27:28,562] [INFO] [timer.py:215:stop] epoch=0/micro_step=470/global_step=470, RunningAvgSamplesPerSec=50.93503363221945, CurrSamplesPerSec=50.447987364861795, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:27:28,728] [INFO] [logging.py:96:log_dist] [Rank 0] step=470, skipped=7, lr=[3.0831662700570695e-06, 3.0831662700570695e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 469|ppo_ep: 1|act_loss: -0.00909423828125|cri_loss: 0.0036182403564453125|unsuper_loss: 0.0
average reward score: -3.822265625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.34%) |Training time=0.81s (31.73%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.57
epoch: 0|step: 470|ppo_ep: 1|act_loss: 0.0087432861328125|cri_loss: 0.0015554428100585938|unsuper_loss: 0.0
average reward score: -4.84375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.46%) |Training time=0.81s (31.57%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57
epoch: 0|step: 471|ppo_ep: 1|act_loss: 0.0158233642578125|cri_loss: 0.0010433197021484375|unsuper_loss: 0.0
average reward score: -3.849609375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.54%) |Training time=0.80s (31.54%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 472|ppo_ep: 1|act_loss: 0.01094818115234375|cri_loss: 0.0017919540405273438|unsuper_loss: 0.0
average reward score: -4.44140625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.67%) |Training time=0.80s (31.43%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 473|ppo_ep: 1|act_loss: 0.004558563232421875|cri_loss: 0.002330780029296875|unsuper_loss: 0.0
average reward score: -5.546875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.48%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 474|ppo_ep: 1|act_loss: -0.01329803466796875|cri_loss: 0.00305938720703125|unsuper_loss: 0.0
average reward score: -4.5625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.50%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57
epoch: 0|step: 475|ppo_ep: 1|act_loss: -0.0081024169921875|cri_loss: 0.0024166107177734375|unsuper_loss: 0.0
average reward score: -5.08984375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.81s (31.63%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 476|ppo_ep: 1|act_loss: -0.006214141845703125|cri_loss: 0.0010242462158203125|unsuper_loss: 0.0
average reward score: -2.517578125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.58%) |Training time=0.80s (31.50%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 477|ppo_ep: 1|act_loss: -0.01202392578125|cri_loss: 0.000881195068359375|unsuper_loss: 0.0
average reward score: -5.2734375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.75%) |Training time=0.80s (31.38%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 478|ppo_ep: 1|act_loss: -0.0006918907165527344|cri_loss: 0.00208282470703125|unsuper_loss: 0.0
average reward score: -4.828125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.49%) |Training time=0.81s (31.59%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
[2023-07-01 08:27:53,860] [INFO] [logging.py:96:log_dist] [Rank 0] step=480, skipped=9, lr=[5.81196371905892e-06, 5.81196371905892e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:27:54,038] [INFO] [timer.py:215:stop] epoch=0/micro_step=480/global_step=480, RunningAvgSamplesPerSec=50.9348513563475, CurrSamplesPerSec=50.923126479449614, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:27:54,203] [INFO] [logging.py:96:log_dist] [Rank 0] step=480, skipped=7, lr=[2.993362003338167e-06, 2.993362003338167e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 479|ppo_ep: 1|act_loss: -0.0224151611328125|cri_loss: 0.003749847412109375|unsuper_loss: 0.0
average reward score: -4.49609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.55%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 480|ppo_ep: 1|act_loss: 0.00965118408203125|cri_loss: 0.00128173828125|unsuper_loss: 0.0
average reward score: -5.75
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.68%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 481|ppo_ep: 1|act_loss: 0.0204010009765625|cri_loss: 0.00751495361328125|unsuper_loss: 0.0
average reward score: -3.404296875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.57%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
epoch: 0|step: 482|ppo_ep: 1|act_loss: -0.0557861328125|cri_loss: 0.09521484375|unsuper_loss: 0.0
average reward score: -4.0390625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.62%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 483|ppo_ep: 1|act_loss: 0.01477813720703125|cri_loss: 0.002376556396484375|unsuper_loss: 0.0
average reward score: -5.59765625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.38%) |Training time=0.81s (31.72%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57
epoch: 0|step: 484|ppo_ep: 1|act_loss: -0.055084228515625|cri_loss: 0.04400634765625|unsuper_loss: 0.0
average reward score: -5.10546875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.66%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57
epoch: 0|step: 485|ppo_ep: 1|act_loss: -0.001514434814453125|cri_loss: 0.0013093948364257812|unsuper_loss: 0.0
average reward score: -4.3671875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.80s (31.60%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
epoch: 0|step: 486|ppo_ep: 1|act_loss: -0.004360198974609375|cri_loss: 0.0009250640869140625|unsuper_loss: 0.0
average reward score: -2.64453125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.53%) |Training time=0.80s (31.56%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 487|ppo_ep: 1|act_loss: -0.006107330322265625|cri_loss: 0.0053558349609375|unsuper_loss: 0.0
average reward score: -6.04296875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.52%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 488|ppo_ep: 1|act_loss: -0.025787353515625|cri_loss: 0.003879547119140625|unsuper_loss: 0.0
average reward score: -4.171875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.81s (31.63%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
[2023-07-01 08:28:19,322] [INFO] [logging.py:96:log_dist] [Rank 0] step=490, skipped=9, lr=[5.637591896641978e-06, 5.637591896641978e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:28:19,499] [INFO] [timer.py:215:stop] epoch=0/micro_step=490/global_step=490, RunningAvgSamplesPerSec=50.93059549496716, CurrSamplesPerSec=50.77863326665892, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:28:19,664] [INFO] [logging.py:96:log_dist] [Rank 0] step=490, skipped=7, lr=[2.902890160602413e-06, 2.902890160602413e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 489|ppo_ep: 1|act_loss: -0.0225677490234375|cri_loss: 0.01079559326171875|unsuper_loss: 0.0
average reward score: -3.841796875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.63%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 490|ppo_ep: 1|act_loss: -0.012939453125|cri_loss: 0.00162506103515625|unsuper_loss: 0.0
average reward score: -3.4921875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.53%) |Training time=0.80s (31.57%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57
epoch: 0|step: 491|ppo_ep: 1|act_loss: 0.0004925727844238281|cri_loss: 0.0001983642578125|unsuper_loss: 0.0
average reward score: -4.359375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.58%) |Training time=0.80s (31.52%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57
epoch: 0|step: 492|ppo_ep: 1|act_loss: -0.011993408203125|cri_loss: 0.003360748291015625|unsuper_loss: 0.0
average reward score: -2.62109375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.44%) |Training time=0.80s (31.62%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 493|ppo_ep: 1|act_loss: 0.014007568359375|cri_loss: 0.00295257568359375|unsuper_loss: 0.0
average reward score: -3.912109375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.47%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 494|ppo_ep: 1|act_loss: 0.02337646484375|cri_loss: 0.00600433349609375|unsuper_loss: 0.0
average reward score: -4.39453125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.54%) |Training time=0.80s (31.56%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 495|ppo_ep: 1|act_loss: 0.005413055419921875|cri_loss: 0.0016698837280273438|unsuper_loss: 0.0
average reward score: -4.7421875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.63%) |Training time=0.80s (31.48%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
epoch: 0|step: 496|ppo_ep: 1|act_loss: 0.018341064453125|cri_loss: 0.0029125213623046875|unsuper_loss: 0.0
average reward score: -4.15234375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.56%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 497|ppo_ep: 1|act_loss: -0.00753021240234375|cri_loss: 0.002887725830078125|unsuper_loss: 0.0
average reward score: -5.26171875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.81s (31.65%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 498|ppo_ep: 1|act_loss: -0.01403045654296875|cri_loss: 0.0005021095275878906|unsuper_loss: 0.0
average reward score: -5.21875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.45%) |Training time=0.81s (31.58%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.57
[2023-07-01 08:28:44,760] [INFO] [logging.py:96:log_dist] [Rank 0] step=500, skipped=9, lr=[5.462120543134245e-06, 5.462120543134245e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:28:44,942] [INFO] [timer.py:215:stop] epoch=0/micro_step=500/global_step=500, RunningAvgSamplesPerSec=50.92924885306585, CurrSamplesPerSec=50.780919489957526, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:28:45,106] [INFO] [logging.py:96:log_dist] [Rank 0] step=500, skipped=7, lr=[2.811873160747093e-06, 2.811873160747093e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 499|ppo_ep: 1|act_loss: -0.031036376953125|cri_loss: 0.00760650634765625|unsuper_loss: 0.0
average reward score: -4.28515625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.61%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 500|ppo_ep: 1|act_loss: -0.00673675537109375|cri_loss: 0.0014362335205078125|unsuper_loss: 0.0
average reward score: -4.22265625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.49%) |Training time=0.81s (31.57%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57
epoch: 0|step: 501|ppo_ep: 1|act_loss: -0.0005049705505371094|cri_loss: 0.0007486343383789062|unsuper_loss: 0.0
average reward score: -4.8125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.51%) |Training time=0.80s (31.51%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57
epoch: 0|step: 502|ppo_ep: 1|act_loss: -0.00399017333984375|cri_loss: 0.0021114349365234375|unsuper_loss: 0.0
average reward score: -4.7109375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.46%) |Training time=0.81s (31.59%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57
epoch: 0|step: 503|ppo_ep: 1|act_loss: 0.01094818115234375|cri_loss: 0.00034165382385253906|unsuper_loss: 0.0
average reward score: -3.673828125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.81s (31.67%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 504|ppo_ep: 1|act_loss: 0.0125732421875|cri_loss: 0.002452850341796875|unsuper_loss: 0.0
average reward score: -5.03125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.63%) |Training time=0.80s (31.50%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
[2023-07-01 08:29:00,036] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048
epoch: 0|step: 505|ppo_ep: 1|act_loss: 0.005146026611328125|cri_loss: 0.0011034011840820312|unsuper_loss: 0.0
average reward score: -4.06640625
-------------------------------------------------------------------------------------
|E2E latency=2.35s |Gather latency=0.00s (0.00%) |Generate time=1.51s (64.21%) |Training time=0.62s (26.19%) |Others=0.23 (9.60%)|CurSamplesPerSec=13.62 |AvgSamplesPerSec=12.57
epoch: 0|step: 506|ppo_ep: 1|act_loss: 0.00792694091796875|cri_loss: 0.0005488395690917969|unsuper_loss: 0.0
average reward score: -5.7578125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.58%) |Training time=0.80s (31.50%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 507|ppo_ep: 1|act_loss: 0.0279541015625|cri_loss: 0.0025577545166015625|unsuper_loss: 0.0
average reward score: -4.765625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.47%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 508|ppo_ep: 1|act_loss: 0.0224609375|cri_loss: 0.001071929931640625|unsuper_loss: 0.0
average reward score: -5.28515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.59%) |Training time=0.80s (31.51%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
[2023-07-01 08:29:10,025] [INFO] [logging.py:96:log_dist] [Rank 0] step=510, skipped=10, lr=[5.30345243877873e-06, 5.30345243877873e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:29:10,202] [INFO] [timer.py:215:stop] epoch=0/micro_step=510/global_step=510, RunningAvgSamplesPerSec=50.95858837805958, CurrSamplesPerSec=51.39470244980967, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:29:10,367] [INFO] [logging.py:96:log_dist] [Rank 0] step=510, skipped=7, lr=[2.720434160330307e-06, 2.720434160330307e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 509|ppo_ep: 1|act_loss: -0.0016717910766601562|cri_loss: 0.00014770030975341797|unsuper_loss: 0.0
average reward score: -3.9296875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.69%) |Training time=0.80s (31.41%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.57
epoch: 0|step: 510|ppo_ep: 1|act_loss: -0.0111541748046875|cri_loss: 0.0006170272827148438|unsuper_loss: 0.0
average reward score: -3.5234375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.66%) |Training time=0.80s (31.40%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57
epoch: 0|step: 511|ppo_ep: 1|act_loss: -0.011077880859375|cri_loss: 0.0015201568603515625|unsuper_loss: 0.0
average reward score: -5.00390625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.54%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
epoch: 0|step: 512|ppo_ep: 1|act_loss: -0.0259552001953125|cri_loss: 0.005645751953125|unsuper_loss: 0.0
average reward score: -3.05859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.80s (31.61%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57
epoch: 0|step: 513|ppo_ep: 1|act_loss: 0.0014553070068359375|cri_loss: 0.0010042190551757812|unsuper_loss: 0.0
average reward score: -3.88671875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.63%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 514|ppo_ep: 1|act_loss: 0.0207061767578125|cri_loss: 0.002655029296875|unsuper_loss: 0.0
average reward score: -3.0
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.51%) |Training time=0.80s (31.57%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 515|ppo_ep: 1|act_loss: 0.01091766357421875|cri_loss: 0.0009918212890625|unsuper_loss: 0.0
average reward score: -5.46484375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.44%) |Training time=0.80s (31.56%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 516|ppo_ep: 1|act_loss: 0.019775390625|cri_loss: 0.00229644775390625|unsuper_loss: 0.0
average reward score: -4.6015625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.40%) |Training time=0.81s (31.64%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.57
epoch: 0|step: 517|ppo_ep: 1|act_loss: 0.0154876708984375|cri_loss: 0.0035495758056640625|unsuper_loss: 0.0
average reward score: -3.611328125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.42%) |Training time=0.81s (31.66%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57
epoch: 0|step: 518|ppo_ep: 1|act_loss: 0.0266876220703125|cri_loss: 0.00441741943359375|unsuper_loss: 0.0
average reward score: -4.6953125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.53%) |Training time=0.80s (31.56%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
[2023-07-01 08:29:35,481] [INFO] [logging.py:96:log_dist] [Rank 0] step=520, skipped=10, lr=[5.126547075166989e-06, 5.126547075166989e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:29:35,661] [INFO] [timer.py:215:stop] epoch=0/micro_step=520/global_step=520, RunningAvgSamplesPerSec=50.956135782198245, CurrSamplesPerSec=50.88289419461918, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:29:35,828] [INFO] [logging.py:96:log_dist] [Rank 0] step=520, skipped=7, lr=[2.6286968869258666e-06, 2.6286968869258666e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 519|ppo_ep: 1|act_loss: 0.018218994140625|cri_loss: 0.0027561187744140625|unsuper_loss: 0.0
average reward score: -3.3984375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.52%) |Training time=0.80s (31.51%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57
epoch: 0|step: 520|ppo_ep: 1|act_loss: 0.01316070556640625|cri_loss: 0.0021457672119140625|unsuper_loss: 0.0
average reward score: -4.69921875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.81s (31.71%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 521|ppo_ep: 1|act_loss: 0.002925872802734375|cri_loss: 0.0039520263671875|unsuper_loss: 0.0
average reward score: -3.34765625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.50%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57
epoch: 0|step: 522|ppo_ep: 1|act_loss: 0.0268402099609375|cri_loss: 0.007488250732421875|unsuper_loss: 0.0
average reward score: -4.5703125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.61%) |Training time=0.80s (31.49%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
epoch: 0|step: 523|ppo_ep: 1|act_loss: 0.0007557868957519531|cri_loss: 0.0024166107177734375|unsuper_loss: 0.0
average reward score: -4.74609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.60%) |Training time=0.80s (31.52%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
epoch: 0|step: 524|ppo_ep: 1|act_loss: -0.0075225830078125|cri_loss: 0.0025806427001953125|unsuper_loss: 0.0
average reward score: -4.14453125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.57%) |Training time=0.80s (31.47%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 525|ppo_ep: 1|act_loss: -0.0187225341796875|cri_loss: 0.005298614501953125|unsuper_loss: 0.0
average reward score: -4.40625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.53%) |Training time=0.80s (31.58%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 526|ppo_ep: 1|act_loss: -0.0318603515625|cri_loss: 0.00554656982421875|unsuper_loss: 0.0
average reward score: -2.92578125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.60%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57
epoch: 0|step: 527|ppo_ep: 1|act_loss: -0.02935791015625|cri_loss: 0.004497528076171875|unsuper_loss: 0.0
average reward score: -4.2265625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.50%) |Training time=0.80s (31.61%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.57
epoch: 0|step: 528|ppo_ep: 1|act_loss: -0.0176849365234375|cri_loss: 0.002193450927734375|unsuper_loss: 0.0
average reward score: -4.234375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.67%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
[2023-07-01 08:30:00,913] [INFO] [logging.py:96:log_dist] [Rank 0] step=530, skipped=10, lr=[4.949233683385321e-06, 4.949233683385321e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:30:01,095] [INFO] [timer.py:215:stop] epoch=0/micro_step=530/global_step=530, RunningAvgSamplesPerSec=50.95421906134459, CurrSamplesPerSec=50.46234547477968, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:30:01,261] [INFO] [logging.py:96:log_dist] [Rank 0] step=530, skipped=7, lr=[2.5367854717055305e-06, 2.5367854717055305e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 529|ppo_ep: 1|act_loss: -0.005146026611328125|cri_loss: 0.001399993896484375|unsuper_loss: 0.0
average reward score: -2.9296875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.73%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57
epoch: 0|step: 530|ppo_ep: 1|act_loss: -0.00983428955078125|cri_loss: 0.0006809234619140625|unsuper_loss: 0.0
average reward score: -5.578125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.70%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 531|ppo_ep: 1|act_loss: 0.0021076202392578125|cri_loss: 0.00106048583984375|unsuper_loss: 0.0
average reward score: -3.740234375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.79%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 532|ppo_ep: 1|act_loss: -0.01316070556640625|cri_loss: 0.006561279296875|unsuper_loss: 0.0
average reward score: -5.375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.80s (31.61%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57
epoch: 0|step: 533|ppo_ep: 1|act_loss: -0.01398468017578125|cri_loss: 0.0008006095886230469|unsuper_loss: 0.0
average reward score: -5.0859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.62%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57
epoch: 0|step: 534|ppo_ep: 1|act_loss: 0.0012187957763671875|cri_loss: 0.0017404556274414062|unsuper_loss: 0.0
average reward score: -5.42578125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.81s (31.73%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 535|ppo_ep: 1|act_loss: -0.003604888916015625|cri_loss: 0.0013761520385742188|unsuper_loss: 0.0
average reward score: -4.6171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.65%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 536|ppo_ep: 1|act_loss: 0.01251983642578125|cri_loss: 0.001312255859375|unsuper_loss: 0.0
average reward score: -5.01171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.55%) |Training time=0.80s (31.55%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57
epoch: 0|step: 537|ppo_ep: 1|act_loss: 0.0182342529296875|cri_loss: 0.0034637451171875|unsuper_loss: 0.0
average reward score: -4.8203125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.72%) |Training time=0.80s (31.40%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 538|ppo_ep: 1|act_loss: 0.0146484375|cri_loss: 0.0020542144775390625|unsuper_loss: 0.0
average reward score: -3.134765625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.62%) |Training time=0.80s (31.46%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
[2023-07-01 08:30:26,353] [INFO] [logging.py:96:log_dist] [Rank 0] step=540, skipped=10, lr=[4.771752189019846e-06, 4.771752189019846e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:30:26,530] [INFO] [timer.py:215:stop] epoch=0/micro_step=540/global_step=540, RunningAvgSamplesPerSec=50.9513624248314, CurrSamplesPerSec=50.93793034873452, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:30:26,696] [INFO] [logging.py:96:log_dist] [Rank 0] step=540, skipped=7, lr=[2.4448242814751353e-06, 2.4448242814751353e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 539|ppo_ep: 1|act_loss: 0.0062103271484375|cri_loss: 0.0006608963012695312|unsuper_loss: 0.0
average reward score: -3.763671875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.54%) |Training time=0.80s (31.55%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 540|ppo_ep: 1|act_loss: 0.02484130859375|cri_loss: 0.0029354095458984375|unsuper_loss: 0.0
average reward score: -6.2265625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.81s (31.64%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 541|ppo_ep: 1|act_loss: -0.00518035888671875|cri_loss: 0.0001233816146850586|unsuper_loss: 0.0
average reward score: -5.3984375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.65%) |Training time=0.80s (31.46%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.57
epoch: 0|step: 542|ppo_ep: 1|act_loss: 0.0022792816162109375|cri_loss: 0.0005393028259277344|unsuper_loss: 0.0
average reward score: -6.28515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.62%) |Training time=0.80s (31.48%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57
epoch: 0|step: 543|ppo_ep: 1|act_loss: -0.008697509765625|cri_loss: 0.0011472702026367188|unsuper_loss: 0.0
average reward score: -3.859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.80s (31.61%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 544|ppo_ep: 1|act_loss: -0.01306915283203125|cri_loss: 0.0025920867919921875|unsuper_loss: 0.0
average reward score: -5.07421875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.57%) |Training time=0.80s (31.49%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 545|ppo_ep: 1|act_loss: -0.0258636474609375|cri_loss: 0.00591278076171875|unsuper_loss: 0.0
average reward score: -5.7421875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.52%) |Training time=0.81s (31.57%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57
epoch: 0|step: 546|ppo_ep: 1|act_loss: -0.007015228271484375|cri_loss: 0.0006990432739257812|unsuper_loss: 0.0
average reward score: -4.92578125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.52%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 547|ppo_ep: 1|act_loss: -0.007061004638671875|cri_loss: 0.0008282661437988281|unsuper_loss: 0.0
average reward score: -4.7265625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.52%) |Training time=0.80s (31.50%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57
epoch: 0|step: 548|ppo_ep: 1|act_loss: -0.0014696121215820312|cri_loss: 0.0002682209014892578|unsuper_loss: 0.0
average reward score: -3.357421875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.81s (31.72%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
[2023-07-01 08:30:51,790] [INFO] [logging.py:96:log_dist] [Rank 0] step=550, skipped=10, lr=[4.594342745118979e-06, 4.594342745118979e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:30:51,972] [INFO] [timer.py:215:stop] epoch=0/micro_step=550/global_step=550, RunningAvgSamplesPerSec=50.94977825227625, CurrSamplesPerSec=50.522579141899634, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:30:52,138] [INFO] [logging.py:96:log_dist] [Rank 0] step=550, skipped=7, lr=[2.352937750391878e-06, 2.352937750391878e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 549|ppo_ep: 1|act_loss: 0.00908660888671875|cri_loss: 0.00087738037109375|unsuper_loss: 0.0
average reward score: -5.015625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.81s (31.74%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 550|ppo_ep: 1|act_loss: 0.0301055908203125|cri_loss: 0.005985260009765625|unsuper_loss: 0.0
average reward score: -4.109375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.58%) |Training time=0.80s (31.52%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 551|ppo_ep: 1|act_loss: 0.0014286041259765625|cri_loss: 0.001789093017578125|unsuper_loss: 0.0
average reward score: -4.1796875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.55%) |Training time=0.80s (31.55%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 552|ppo_ep: 1|act_loss: 0.01788330078125|cri_loss: 0.0032596588134765625|unsuper_loss: 0.0
average reward score: -4.9140625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.61%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 553|ppo_ep: 1|act_loss: 0.004871368408203125|cri_loss: 0.006023406982421875|unsuper_loss: 0.0
average reward score: -3.810546875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.44%) |Training time=0.80s (31.62%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 554|ppo_ep: 1|act_loss: -0.00543975830078125|cri_loss: 0.0013408660888671875|unsuper_loss: 0.0
average reward score: -4.1015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.60%) |Training time=0.80s (31.53%) |Others=0.23 (8.86%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
epoch: 0|step: 555|ppo_ep: 1|act_loss: -0.00647735595703125|cri_loss: 0.001308441162109375|unsuper_loss: 0.0
average reward score: -3.7578125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.67%) |Training time=0.80s (31.44%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.57
epoch: 0|step: 556|ppo_ep: 1|act_loss: 0.0111541748046875|cri_loss: 0.004581451416015625|unsuper_loss: 0.0
average reward score: -3.681640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.57%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57
epoch: 0|step: 557|ppo_ep: 1|act_loss: -0.01230621337890625|cri_loss: 0.0010156631469726562|unsuper_loss: 0.0
average reward score: -3.62890625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.80s (31.66%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57
epoch: 0|step: 558|ppo_ep: 1|act_loss: -0.0097808837890625|cri_loss: 0.0012044906616210938|unsuper_loss: 0.0
average reward score: -4.1015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.58%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
[2023-07-01 08:31:17,211] [INFO] [logging.py:96:log_dist] [Rank 0] step=560, skipped=10, lr=[4.417245407238497e-06, 4.417245407238497e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:31:17,388] [INFO] [timer.py:215:stop] epoch=0/micro_step=560/global_step=560, RunningAvgSamplesPerSec=50.94992960737238, CurrSamplesPerSec=51.052651838755786, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:31:17,553] [INFO] [logging.py:96:log_dist] [Rank 0] step=560, skipped=7, lr=[2.261250211590471e-06, 2.261250211590471e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 559|ppo_ep: 1|act_loss: -0.0055694580078125|cri_loss: 0.0004534721374511719|unsuper_loss: 0.0
average reward score: -3.7265625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.53%) |Training time=0.80s (31.56%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57
epoch: 0|step: 560|ppo_ep: 1|act_loss: 0.0032482147216796875|cri_loss: 0.0005564689636230469|unsuper_loss: 0.0
average reward score: -3.51953125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.67%) |Training time=0.80s (31.39%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57
epoch: 0|step: 561|ppo_ep: 1|act_loss: 0.004161834716796875|cri_loss: 0.0013608932495117188|unsuper_loss: 0.0
average reward score: -4.4921875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.57%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 562|ppo_ep: 1|act_loss: -0.0016088485717773438|cri_loss: 0.0002428293228149414|unsuper_loss: 0.0
average reward score: -3.6328125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.48%) |Training time=0.81s (31.59%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57
epoch: 0|step: 563|ppo_ep: 1|act_loss: -0.0179595947265625|cri_loss: 0.0032367706298828125|unsuper_loss: 0.0
average reward score: -5.16015625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.40%) |Training time=0.81s (31.66%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.57
epoch: 0|step: 564|ppo_ep: 1|act_loss: 0.003131866455078125|cri_loss: 0.00016367435455322266|unsuper_loss: 0.0
average reward score: -5.4296875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.62%) |Training time=0.80s (31.49%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 565|ppo_ep: 1|act_loss: 0.0007748603820800781|cri_loss: 0.0004220008850097656|unsuper_loss: 0.0
average reward score: -4.3515625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.59%) |Training time=0.80s (31.52%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57
epoch: 0|step: 566|ppo_ep: 1|act_loss: -0.0001862049102783203|cri_loss: 0.0009765625|unsuper_loss: 0.0
average reward score: -3.986328125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.49%) |Training time=0.81s (31.57%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.57
epoch: 0|step: 567|ppo_ep: 1|act_loss: -0.0013790130615234375|cri_loss: 0.0017213821411132812|unsuper_loss: 0.0
average reward score: -3.708984375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.46%) |Training time=0.81s (31.59%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.57
epoch: 0|step: 568|ppo_ep: 1|act_loss: -0.00388336181640625|cri_loss: 0.0007061958312988281|unsuper_loss: 0.0
average reward score: -4.8671875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.49%) |Training time=0.81s (31.62%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
[2023-07-01 08:31:42,687] [INFO] [logging.py:96:log_dist] [Rank 0] step=570, skipped=10, lr=[4.2406998086185315e-06, 4.2406998086185315e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:31:42,865] [INFO] [timer.py:215:stop] epoch=0/micro_step=570/global_step=570, RunningAvgSamplesPerSec=50.94778923762274, CurrSamplesPerSec=51.40765520514193, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:31:43,029] [INFO] [logging.py:96:log_dist] [Rank 0] step=570, skipped=7, lr=[2.1698857289459872e-06, 2.1698857289459872e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 569|ppo_ep: 1|act_loss: -0.01505279541015625|cri_loss: 0.0009059906005859375|unsuper_loss: 0.0
average reward score: -3.9609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.78%) |Training time=0.80s (31.36%) |Others=0.22 (8.86%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57
epoch: 0|step: 570|ppo_ep: 1|act_loss: -0.0038356781005859375|cri_loss: 0.0009098052978515625|unsuper_loss: 0.0
average reward score: -4.7109375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.74%) |Training time=0.80s (31.37%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 571|ppo_ep: 1|act_loss: -0.0022602081298828125|cri_loss: 0.0014858245849609375|unsuper_loss: 0.0
average reward score: -4.484375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.60%) |Training time=0.80s (31.47%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57
epoch: 0|step: 572|ppo_ep: 1|act_loss: -0.00765228271484375|cri_loss: 0.0003981590270996094|unsuper_loss: 0.0
average reward score: -3.5
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.49s (58.83%) |Training time=0.82s (32.15%) |Others=0.23 (9.02%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.57
epoch: 0|step: 573|ppo_ep: 1|act_loss: -0.0052032470703125|cri_loss: 0.0011606216430664062|unsuper_loss: 0.0
average reward score: -5.69140625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.38%) |Training time=0.80s (31.70%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.57
epoch: 0|step: 574|ppo_ep: 1|act_loss: 0.014617919921875|cri_loss: 0.0018224716186523438|unsuper_loss: 0.0
average reward score: -6.078125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.59%) |Training time=0.80s (31.46%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.66 |AvgSamplesPerSec=12.57
epoch: 0|step: 575|ppo_ep: 1|act_loss: 0.010711669921875|cri_loss: 0.00122833251953125|unsuper_loss: 0.0
average reward score: -6.5078125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.80s (31.75%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.57
epoch: 0|step: 576|ppo_ep: 1|act_loss: -0.0308074951171875|cri_loss: 0.02569580078125|unsuper_loss: 0.0
average reward score: -3.326171875
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.08%) |Training time=0.81s (31.95%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.57
epoch: 0|step: 577|ppo_ep: 1|act_loss: 0.0074462890625|cri_loss: 0.0010700225830078125|unsuper_loss: 0.0
average reward score: -4.09375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.04%) |Training time=0.81s (32.00%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
[2023-07-01 08:32:05,492] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024
epoch: 0|step: 578|ppo_ep: 1|act_loss: -0.0223846435546875|cri_loss: 0.002262115478515625|unsuper_loss: 0.0
average reward score: -3.275390625
-------------------------------------------------------------------------------------
|E2E latency=2.34s |Gather latency=0.00s (0.00%) |Generate time=1.50s (64.10%) |Training time=0.61s (26.24%) |Others=0.23 (9.66%)|CurSamplesPerSec=13.66 |AvgSamplesPerSec=12.57
[2023-07-01 08:32:07,846] [INFO] [logging.py:96:log_dist] [Rank 0] step=580, skipped=11, lr=[4.082477967402902e-06, 4.082477967402902e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:32:08,027] [INFO] [timer.py:215:stop] epoch=0/micro_step=580/global_step=580, RunningAvgSamplesPerSec=50.97001604666447, CurrSamplesPerSec=50.35826863975969, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:32:08,192] [INFO] [logging.py:96:log_dist] [Rank 0] step=580, skipped=7, lr=[2.0789679292010483e-06, 2.0789679292010483e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 579|ppo_ep: 1|act_loss: 0.01136016845703125|cri_loss: 0.0022869110107421875|unsuper_loss: 0.0
average reward score: -5.0390625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.11%) |Training time=0.81s (31.94%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.57
epoch: 0|step: 580|ppo_ep: 1|act_loss: 0.018829345703125|cri_loss: 0.0036773681640625|unsuper_loss: 0.0
average reward score: -4.0234375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.28%) |Training time=0.81s (31.74%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.57
epoch: 0|step: 581|ppo_ep: 1|act_loss: -0.001506805419921875|cri_loss: 0.002010345458984375|unsuper_loss: 0.0
average reward score: -5.78515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.83%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
epoch: 0|step: 582|ppo_ep: 1|act_loss: -0.0027065277099609375|cri_loss: 0.0007162094116210938|unsuper_loss: 0.0
average reward score: -3.376953125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.81s (31.73%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57
epoch: 0|step: 583|ppo_ep: 1|act_loss: -0.00507354736328125|cri_loss: 0.0033092498779296875|unsuper_loss: 0.0
average reward score: -3.1640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.81%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.57
epoch: 0|step: 584|ppo_ep: 1|act_loss: -0.0029296875|cri_loss: 0.0007195472717285156|unsuper_loss: 0.0
average reward score: -5.140625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.26%) |Training time=0.81s (31.75%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57
epoch: 0|step: 585|ppo_ep: 1|act_loss: -0.020263671875|cri_loss: 0.0017242431640625|unsuper_loss: 0.0
average reward score: -4.1875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.08%) |Training time=0.81s (31.99%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.57
epoch: 0|step: 586|ppo_ep: 1|act_loss: 0.00838470458984375|cri_loss: 0.002552032470703125|unsuper_loss: 0.0
average reward score: -3.76171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.20%) |Training time=0.81s (31.84%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57
epoch: 0|step: 587|ppo_ep: 1|act_loss: 0.0025005340576171875|cri_loss: 0.001628875732421875|unsuper_loss: 0.0
average reward score: -7.19921875
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.31%) |Training time=0.80s (31.81%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.57
epoch: 0|step: 588|ppo_ep: 1|act_loss: 0.00838470458984375|cri_loss: 0.0011529922485351562|unsuper_loss: 0.0
average reward score: -5.2734375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.24%) |Training time=0.81s (31.82%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.57
[2023-07-01 08:32:33,230] [INFO] [logging.py:96:log_dist] [Rank 0] step=590, skipped=11, lr=[3.907637928621924e-06, 3.907637928621924e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:32:33,408] [INFO] [timer.py:215:stop] epoch=0/micro_step=590/global_step=590, RunningAvgSamplesPerSec=50.96221074638222, CurrSamplesPerSec=50.53928232401794, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:32:33,573] [INFO] [logging.py:96:log_dist] [Rank 0] step=590, skipped=7, lr=[1.988619834684499e-06, 1.988619834684499e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 589|ppo_ep: 1|act_loss: 0.00032448768615722656|cri_loss: 0.0007734298706054688|unsuper_loss: 0.0
average reward score: -3.94140625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.22%) |Training time=0.81s (31.84%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.57
epoch: 0|step: 590|ppo_ep: 1|act_loss: -0.009735107421875|cri_loss: 0.0010528564453125|unsuper_loss: 0.0
average reward score: -3.126953125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.78%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57
epoch: 0|step: 591|ppo_ep: 1|act_loss: 0.00969696044921875|cri_loss: 0.0013589859008789062|unsuper_loss: 0.0
average reward score: -6.26171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.28%) |Training time=0.81s (31.77%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.57
epoch: 0|step: 592|ppo_ep: 1|act_loss: -0.004749298095703125|cri_loss: 0.0004584789276123047|unsuper_loss: 0.0
average reward score: -5.48828125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.30%) |Training time=0.81s (31.79%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.57
epoch: 0|step: 593|ppo_ep: 1|act_loss: -0.006870269775390625|cri_loss: 0.0007867813110351562|unsuper_loss: 0.0
average reward score: -4.80078125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.24%) |Training time=0.81s (31.78%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.57
epoch: 0|step: 594|ppo_ep: 1|act_loss: -0.0190582275390625|cri_loss: 0.0018129348754882812|unsuper_loss: 0.0
average reward score: -3.66015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.13%) |Training time=0.81s (31.89%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
epoch: 0|step: 595|ppo_ep: 1|act_loss: -0.006244659423828125|cri_loss: 0.001369476318359375|unsuper_loss: 0.0
average reward score: -4.5234375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.79%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57
epoch: 0|step: 596|ppo_ep: 1|act_loss: -0.0032024383544921875|cri_loss: 0.0005779266357421875|unsuper_loss: 0.0
average reward score: -5.19921875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.27%) |Training time=0.81s (31.80%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
epoch: 0|step: 597|ppo_ep: 1|act_loss: -0.0027294158935546875|cri_loss: 0.0002753734588623047|unsuper_loss: 0.0
average reward score: -4.9609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.80s (31.70%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57
epoch: 0|step: 598|ppo_ep: 1|act_loss: 0.0278472900390625|cri_loss: 0.0089569091796875|unsuper_loss: 0.0
average reward score: -4.53515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.45s (57.30%) |Training time=0.85s (33.63%) |Others=0.23 (9.07%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.57
[2023-07-01 08:32:58,614] [INFO] [logging.py:96:log_dist] [Rank 0] step=600, skipped=11, lr=[3.734039187130717e-06, 3.734039187130717e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:32:58,796] [INFO] [timer.py:215:stop] epoch=0/micro_step=600/global_step=600, RunningAvgSamplesPerSec=50.948322092265094, CurrSamplesPerSec=49.88978045077008, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:32:58,961] [INFO] [logging.py:96:log_dist] [Rank 0] step=600, skipped=7, lr=[1.8989636968479282e-06, 1.8989636968479282e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 599|ppo_ep: 1|act_loss: 0.0027713775634765625|cri_loss: 0.0027790069580078125|unsuper_loss: 0.0
average reward score: -3.732421875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.05%) |Training time=0.82s (32.02%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.57
epoch: 0|step: 600|ppo_ep: 1|act_loss: -0.00141143798828125|cri_loss: 0.000244140625|unsuper_loss: 0.0
average reward score: -3.482421875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.84%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.57
[2023-07-01 08:33:04,034] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 601|ppo_ep: 1|act_loss: -0.0098876953125|cri_loss: 0.0004558563232421875|unsuper_loss: 0.0
average reward score: -4.42578125
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.35%) |Training time=0.81s (32.53%) |Others=0.18 (7.12%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.57
[2023-07-01 08:33:06,517] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768
epoch: 0|step: 602|ppo_ep: 1|act_loss: -0.005390167236328125|cri_loss: 0.0046844482421875|unsuper_loss: 0.0
average reward score: -4.44140625
-------------------------------------------------------------------------------------
|E2E latency=2.48s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.43%) |Training time=0.80s (32.42%) |Others=0.18 (7.15%)|CurSamplesPerSec=12.89 |AvgSamplesPerSec=12.57
epoch: 0|step: 603|ppo_ep: 1|act_loss: 0.0072479248046875|cri_loss: 0.0007405281066894531|unsuper_loss: 0.0
average reward score: -6.33203125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.82%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58
epoch: 0|step: 604|ppo_ep: 1|act_loss: -0.039642333984375|cri_loss: 0.0145111083984375|unsuper_loss: 0.0
average reward score: -3.31640625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.38%) |Training time=0.80s (31.66%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58
epoch: 0|step: 605|ppo_ep: 1|act_loss: 0.00989532470703125|cri_loss: 0.0006589889526367188|unsuper_loss: 0.0
average reward score: -4.94921875
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.33%) |Training time=0.80s (31.78%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
[2023-07-01 08:33:16,592] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384
epoch: 0|step: 606|ppo_ep: 1|act_loss: -0.01064300537109375|cri_loss: 0.002391815185546875|unsuper_loss: 0.0
average reward score: -2.67578125
-------------------------------------------------------------------------------------
|E2E latency=2.48s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.57%) |Training time=0.80s (32.23%) |Others=0.18 (7.21%)|CurSamplesPerSec=12.91 |AvgSamplesPerSec=12.58
epoch: 0|step: 607|ppo_ep: 1|act_loss: -0.01898193359375|cri_loss: 0.00228118896484375|unsuper_loss: 0.0
average reward score: -3.490234375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.55%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58
epoch: 0|step: 608|ppo_ep: 1|act_loss: 0.01812744140625|cri_loss: 0.0009660720825195312|unsuper_loss: 0.0
average reward score: -4.953125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.80s (31.69%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
[2023-07-01 08:33:23,816] [INFO] [logging.py:96:log_dist] [Rank 0] step=610, skipped=11, lr=[3.5619166421626894e-06, 3.5619166421626894e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:33:23,994] [INFO] [timer.py:215:stop] epoch=0/micro_step=610/global_step=610, RunningAvgSamplesPerSec=50.94524731646184, CurrSamplesPerSec=50.586941322752935, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:33:24,160] [INFO] [logging.py:96:log_dist] [Rank 0] step=610, skipped=10, lr=[1.8366811213437092e-06, 1.8366811213437092e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 609|ppo_ep: 1|act_loss: 0.010986328125|cri_loss: 0.00179290771484375|unsuper_loss: 0.0
average reward score: -5.8125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.26%) |Training time=0.81s (31.79%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 610|ppo_ep: 1|act_loss: 0.0160675048828125|cri_loss: 0.0012903213500976562|unsuper_loss: 0.0
average reward score: -3.703125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.53%) |Training time=0.80s (31.54%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 611|ppo_ep: 1|act_loss: -0.011260986328125|cri_loss: 0.00144195556640625|unsuper_loss: 0.0
average reward score: -5.78515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.83%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 612|ppo_ep: 1|act_loss: 0.018280029296875|cri_loss: 0.00244903564453125|unsuper_loss: 0.0
average reward score: -4.6640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (31.93%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 613|ppo_ep: 1|act_loss: 0.02001953125|cri_loss: 0.002246856689453125|unsuper_loss: 0.0
average reward score: -4.7109375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.01%) |Training time=0.81s (32.00%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 614|ppo_ep: 1|act_loss: 0.023193359375|cri_loss: 0.0021038055419921875|unsuper_loss: 0.0
average reward score: -3.83984375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.10%) |Training time=0.81s (31.95%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 615|ppo_ep: 1|act_loss: 0.0146331787109375|cri_loss: 0.0013818740844726562|unsuper_loss: 0.0
average reward score: -4.68359375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.81s (31.71%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 616|ppo_ep: 1|act_loss: -0.01386260986328125|cri_loss: 0.0007419586181640625|unsuper_loss: 0.0
average reward score: -4.14453125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.32%) |Training time=0.80s (31.69%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 617|ppo_ep: 1|act_loss: -0.0098724365234375|cri_loss: 0.0007734298706054688|unsuper_loss: 0.0
average reward score: -5.48828125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.83%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 618|ppo_ep: 1|act_loss: -0.01043701171875|cri_loss: 0.0005331039428710938|unsuper_loss: 0.0
average reward score: -3.630859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.07%) |Training time=0.81s (31.98%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
[2023-07-01 08:33:49,227] [INFO] [logging.py:96:log_dist] [Rank 0] step=620, skipped=11, lr=[3.3915031954861193e-06, 3.3915031954861193e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:33:49,405] [INFO] [timer.py:215:stop] epoch=0/micro_step=620/global_step=620, RunningAvgSamplesPerSec=50.93655557448013, CurrSamplesPerSec=50.39513957647391, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:33:49,569] [INFO] [logging.py:96:log_dist] [Rank 0] step=620, skipped=10, lr=[1.7484791453998007e-06, 1.7484791453998007e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 619|ppo_ep: 1|act_loss: 0.0256500244140625|cri_loss: 0.01168060302734375|unsuper_loss: 0.0
average reward score: -4.5
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.31%) |Training time=0.81s (31.84%) |Others=0.23 (8.85%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 620|ppo_ep: 1|act_loss: -0.0011425018310546875|cri_loss: 0.00040268898010253906|unsuper_loss: 0.0
average reward score: -3.52734375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.40%) |Training time=0.80s (31.72%) |Others=0.22 (8.89%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.58
epoch: 0|step: 621|ppo_ep: 1|act_loss: 0.0155029296875|cri_loss: 0.0032711029052734375|unsuper_loss: 0.0
average reward score: -5.140625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.23%) |Training time=0.81s (31.85%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 622|ppo_ep: 1|act_loss: -0.042724609375|cri_loss: 0.027496337890625|unsuper_loss: 0.0
average reward score: -4.27734375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.32%) |Training time=0.80s (31.73%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 623|ppo_ep: 1|act_loss: 0.0227813720703125|cri_loss: 0.002086639404296875|unsuper_loss: 0.0
average reward score: -3.197265625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.22%) |Training time=0.81s (31.84%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 624|ppo_ep: 1|act_loss: -0.0032520294189453125|cri_loss: 0.0003867149353027344|unsuper_loss: 0.0
average reward score: -4.68359375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.81s (31.76%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 625|ppo_ep: 1|act_loss: 0.00916290283203125|cri_loss: 0.0017423629760742188|unsuper_loss: 0.0
average reward score: -3.10546875
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.80s (31.60%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58
epoch: 0|step: 626|ppo_ep: 1|act_loss: -0.0003085136413574219|cri_loss: 0.005645751953125|unsuper_loss: 0.0
average reward score: -2.931640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.80s (31.60%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 627|ppo_ep: 1|act_loss: -0.003177642822265625|cri_loss: 0.0014371871948242188|unsuper_loss: 0.0
average reward score: -5.79296875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.96%) |Training time=0.82s (32.09%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58
epoch: 0|step: 628|ppo_ep: 1|act_loss: -0.0007586479187011719|cri_loss: 0.0005908012390136719|unsuper_loss: 0.0
average reward score: -3.56640625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.07%) |Training time=0.81s (31.95%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
[2023-07-01 08:34:14,601] [INFO] [logging.py:96:log_dist] [Rank 0] step=630, skipped=11, lr=[3.223029436261057e-06, 3.223029436261057e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:34:14,783] [INFO] [timer.py:215:stop] epoch=0/micro_step=630/global_step=630, RunningAvgSamplesPerSec=50.926843009517505, CurrSamplesPerSec=48.30053792926741, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:34:14,947] [INFO] [logging.py:96:log_dist] [Rank 0] step=630, skipped=10, lr=[1.6612940643430136e-06, 1.6612940643430136e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 629|ppo_ep: 1|act_loss: 0.06500244140625|cri_loss: 0.02996826171875|unsuper_loss: 0.0
average reward score: -3.935546875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.48s (58.14%) |Training time=0.84s (32.93%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 630|ppo_ep: 1|act_loss: 0.01120758056640625|cri_loss: 0.0019474029541015625|unsuper_loss: 0.0
average reward score: -4.61328125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.24%) |Training time=0.81s (31.84%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 631|ppo_ep: 1|act_loss: 0.01428985595703125|cri_loss: 0.0036029815673828125|unsuper_loss: 0.0
average reward score: -5.41796875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.07%) |Training time=0.81s (31.93%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 632|ppo_ep: 1|act_loss: 0.003864288330078125|cri_loss: 0.00029730796813964844|unsuper_loss: 0.0
average reward score: -4.07421875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.10%) |Training time=0.81s (31.90%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 633|ppo_ep: 1|act_loss: -0.0056915283203125|cri_loss: 0.0002338886260986328|unsuper_loss: 0.0
average reward score: -5.984375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.11%) |Training time=0.81s (31.95%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 634|ppo_ep: 1|act_loss: 0.006618499755859375|cri_loss: 0.0003190040588378906|unsuper_loss: 0.0
average reward score: -5.23828125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.58%) |Training time=0.80s (31.51%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 635|ppo_ep: 1|act_loss: 0.0169830322265625|cri_loss: 0.0018148422241210938|unsuper_loss: 0.0
average reward score: -4.13671875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.80s (31.67%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 636|ppo_ep: 1|act_loss: -0.0096435546875|cri_loss: 0.0003941059112548828|unsuper_loss: 0.0
average reward score: -4.30078125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.28%) |Training time=0.81s (31.77%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 637|ppo_ep: 1|act_loss: -0.04632568359375|cri_loss: 0.0241241455078125|unsuper_loss: 0.0
average reward score: -3.640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.80s (31.64%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 638|ppo_ep: 1|act_loss: 0.024505615234375|cri_loss: 0.0047760009765625|unsuper_loss: 0.0
average reward score: -4.50390625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.54%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.58
[2023-07-01 08:34:39,992] [INFO] [logging.py:96:log_dist] [Rank 0] step=640, skipped=11, lr=[3.056723329025442e-06, 3.056723329025442e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:34:40,169] [INFO] [timer.py:215:stop] epoch=0/micro_step=640/global_step=640, RunningAvgSamplesPerSec=50.92260273955374, CurrSamplesPerSec=51.20208992905518, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:34:40,334] [INFO] [logging.py:96:log_dist] [Rank 0] step=640, skipped=10, lr=[1.5752438497008405e-06, 1.5752438497008405e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 639|ppo_ep: 1|act_loss: -0.00301361083984375|cri_loss: 0.0005254745483398438|unsuper_loss: 0.0
average reward score: -4.51953125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.45%) |Training time=0.80s (31.59%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58
epoch: 0|step: 640|ppo_ep: 1|act_loss: 0.0145721435546875|cri_loss: 0.0082244873046875|unsuper_loss: 0.0
average reward score: -7.609375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.80s (31.57%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 641|ppo_ep: 1|act_loss: 0.021484375|cri_loss: 0.0023136138916015625|unsuper_loss: 0.0
average reward score: -4.48046875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.28%) |Training time=0.81s (31.75%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 642|ppo_ep: 1|act_loss: 0.0064849853515625|cri_loss: 0.0008287429809570312|unsuper_loss: 0.0
average reward score: -4.3046875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.05%) |Training time=0.82s (32.03%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 643|ppo_ep: 1|act_loss: 0.006534576416015625|cri_loss: 0.000553131103515625|unsuper_loss: 0.0
average reward score: -3.30859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (31.98%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 644|ppo_ep: 1|act_loss: 0.0166015625|cri_loss: 0.00087738037109375|unsuper_loss: 0.0
average reward score: -3.4296875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.30%) |Training time=0.81s (31.75%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 645|ppo_ep: 1|act_loss: -0.0167694091796875|cri_loss: 0.0013189315795898438|unsuper_loss: 0.0
average reward score: -5.46875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.94%) |Training time=0.82s (32.07%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 646|ppo_ep: 1|act_loss: -0.041290283203125|cri_loss: 0.012176513671875|unsuper_loss: 0.0
average reward score: -3.31640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.93%) |Training time=0.82s (32.05%) |Others=0.23 (9.02%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 647|ppo_ep: 1|act_loss: -0.0020999908447265625|cri_loss: 0.00032782554626464844|unsuper_loss: 0.0
average reward score: -4.26953125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.00%) |Training time=0.82s (32.04%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 648|ppo_ep: 1|act_loss: 0.0006113052368164062|cri_loss: 0.00011986494064331055|unsuper_loss: 0.0
average reward score: -4.09765625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.11%) |Training time=0.81s (31.93%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
[2023-07-01 08:35:05,402] [INFO] [logging.py:96:log_dist] [Rank 0] step=650, skipped=11, lr=[2.8928099052326388e-06, 2.8928099052326388e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:35:05,582] [INFO] [timer.py:215:stop] epoch=0/micro_step=650/global_step=650, RunningAvgSamplesPerSec=50.9126533891848, CurrSamplesPerSec=50.71060198984486, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:35:05,748] [INFO] [logging.py:96:log_dist] [Rank 0] step=650, skipped=10, lr=[1.490444937394879e-06, 1.490444937394879e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 649|ppo_ep: 1|act_loss: -0.0006499290466308594|cri_loss: 0.0006866455078125|unsuper_loss: 0.0
average reward score: -5.53125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.71%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 650|ppo_ep: 1|act_loss: -0.0176544189453125|cri_loss: 0.005168914794921875|unsuper_loss: 0.0
average reward score: -4.953125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.81s (31.71%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 651|ppo_ep: 1|act_loss: 0.01107025146484375|cri_loss: 0.001972198486328125|unsuper_loss: 0.0
average reward score: -2.615234375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.80s (31.70%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 652|ppo_ep: 1|act_loss: -0.0716552734375|cri_loss: 0.04205322265625|unsuper_loss: 0.0
average reward score: -4.55078125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.60%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 653|ppo_ep: 1|act_loss: -0.00029969215393066406|cri_loss: 0.00037932395935058594|unsuper_loss: 0.0
average reward score: -4.4453125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.80s (31.61%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58
epoch: 0|step: 654|ppo_ep: 1|act_loss: 0.0179901123046875|cri_loss: 0.0026454925537109375|unsuper_loss: 0.0
average reward score: -4.171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.80s (31.67%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 655|ppo_ep: 1|act_loss: 0.0028209686279296875|cri_loss: 0.00038552284240722656|unsuper_loss: 0.0
average reward score: -4.82421875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.20%) |Training time=0.81s (31.80%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 656|ppo_ep: 1|act_loss: -0.0028934478759765625|cri_loss: 0.0019588470458984375|unsuper_loss: 0.0
average reward score: -4.00390625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.49s (58.72%) |Training time=0.82s (32.31%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 657|ppo_ep: 1|act_loss: -0.008941650390625|cri_loss: 0.0005779266357421875|unsuper_loss: 0.0
average reward score: -3.626953125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.31%) |Training time=0.80s (31.74%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 658|ppo_ep: 1|act_loss: -0.0026035308837890625|cri_loss: 0.0007138252258300781|unsuper_loss: 0.0
average reward score: -4.234375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.36%) |Training time=0.80s (31.66%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
[2023-07-01 08:35:30,780] [INFO] [logging.py:96:log_dist] [Rank 0] step=660, skipped=11, lr=[2.7315109587577825e-06, 2.7315109587577825e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:35:30,962] [INFO] [timer.py:215:stop] epoch=0/micro_step=660/global_step=660, RunningAvgSamplesPerSec=50.90763828588423, CurrSamplesPerSec=49.94958363477759, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:35:31,128] [INFO] [logging.py:96:log_dist] [Rank 0] step=660, skipped=10, lr=[1.407012070189524e-06, 1.407012070189524e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 659|ppo_ep: 1|act_loss: -0.003997802734375|cri_loss: 0.0002703666687011719|unsuper_loss: 0.0
average reward score: -2.380859375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.04%) |Training time=0.82s (32.00%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58
epoch: 0|step: 660|ppo_ep: 1|act_loss: -0.0269317626953125|cri_loss: 0.008026123046875|unsuper_loss: 0.0
average reward score: -4.421875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.89%) |Training time=0.82s (32.12%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58
epoch: 0|step: 661|ppo_ep: 1|act_loss: -0.0170135498046875|cri_loss: 0.0014743804931640625|unsuper_loss: 0.0
average reward score: -3.12109375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.22%) |Training time=0.81s (31.80%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 662|ppo_ep: 1|act_loss: 0.01515960693359375|cri_loss: 0.0032825469970703125|unsuper_loss: 0.0
average reward score: -3.9140625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.80s (31.62%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 663|ppo_ep: 1|act_loss: -0.00861358642578125|cri_loss: 0.002689361572265625|unsuper_loss: 0.0
average reward score: -3.32421875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.00%) |Training time=0.81s (31.99%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 664|ppo_ep: 1|act_loss: 0.0018520355224609375|cri_loss: 0.0006785392761230469|unsuper_loss: 0.0
average reward score: -3.875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.74%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.58
epoch: 0|step: 665|ppo_ep: 1|act_loss: 0.01678466796875|cri_loss: 0.0006666183471679688|unsuper_loss: 0.0
average reward score: -5.4921875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.90%) |Training time=0.82s (32.17%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58
epoch: 0|step: 666|ppo_ep: 1|act_loss: 0.01038360595703125|cri_loss: 0.0011816024780273438|unsuper_loss: 0.0
average reward score: -3.482421875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.81s (31.72%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 667|ppo_ep: 1|act_loss: -0.005947113037109375|cri_loss: 0.0005092620849609375|unsuper_loss: 0.0
average reward score: -2.95703125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.56%) |Training time=0.80s (31.52%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 668|ppo_ep: 1|act_loss: -0.020172119140625|cri_loss: 0.001651763916015625|unsuper_loss: 0.0
average reward score: -4.99609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.22%) |Training time=0.81s (31.79%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
[2023-07-01 08:35:56,209] [INFO] [logging.py:96:log_dist] [Rank 0] step=670, skipped=11, lr=[2.573044745784934e-06, 2.573044745784934e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:35:56,387] [INFO] [timer.py:215:stop] epoch=0/micro_step=670/global_step=670, RunningAvgSamplesPerSec=50.899503712786164, CurrSamplesPerSec=50.331698331698334, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:35:56,553] [INFO] [logging.py:96:log_dist] [Rank 0] step=670, skipped=10, lr=[1.3250581424317012e-06, 1.3250581424317012e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 669|ppo_ep: 1|act_loss: -0.00519561767578125|cri_loss: 0.00554656982421875|unsuper_loss: 0.0
average reward score: -3.53515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.17%) |Training time=0.81s (31.88%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 670|ppo_ep: 1|act_loss: -0.016693115234375|cri_loss: 0.00327301025390625|unsuper_loss: 0.0
average reward score: -4.60546875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.80s (31.70%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 671|ppo_ep: 1|act_loss: -0.0176544189453125|cri_loss: 0.0012159347534179688|unsuper_loss: 0.0
average reward score: -4.28515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (32.00%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 672|ppo_ep: 1|act_loss: -0.004608154296875|cri_loss: 0.0005364418029785156|unsuper_loss: 0.0
average reward score: -3.412109375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.46s (57.29%) |Training time=0.86s (33.73%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 673|ppo_ep: 1|act_loss: 0.0150604248046875|cri_loss: 0.00801849365234375|unsuper_loss: 0.0
average reward score: -3.40234375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.17%) |Training time=0.81s (31.90%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 674|ppo_ep: 1|act_loss: -0.00829315185546875|cri_loss: 0.0011472702026367188|unsuper_loss: 0.0
average reward score: -4.3984375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.24%) |Training time=0.81s (31.78%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 675|ppo_ep: 1|act_loss: 0.004817962646484375|cri_loss: 0.0008697509765625|unsuper_loss: 0.0
average reward score: -5.7578125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.75%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 676|ppo_ep: 1|act_loss: 0.005290985107421875|cri_loss: 0.0013189315795898438|unsuper_loss: 0.0
average reward score: -3.876953125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.15%) |Training time=0.81s (31.88%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 677|ppo_ep: 1|act_loss: 0.0023670196533203125|cri_loss: 0.0018939971923828125|unsuper_loss: 0.0
average reward score: -3.224609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.81s (31.69%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 678|ppo_ep: 1|act_loss: 0.007465362548828125|cri_loss: 0.0005240440368652344|unsuper_loss: 0.0
average reward score: -4.48046875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.93%) |Training time=0.82s (32.10%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
[2023-07-01 08:36:21,625] [INFO] [logging.py:96:log_dist] [Rank 0] step=680, skipped=11, lr=[2.4176256894811497e-06, 2.4176256894811497e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:36:21,807] [INFO] [timer.py:215:stop] epoch=0/micro_step=680/global_step=680, RunningAvgSamplesPerSec=50.88510715740166, CurrSamplesPerSec=50.29265861640763, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:36:21,974] [INFO] [logging.py:96:log_dist] [Rank 0] step=680, skipped=10, lr=[1.24469404729171e-06, 1.24469404729171e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 679|ppo_ep: 1|act_loss: 0.0241851806640625|cri_loss: 0.0012350082397460938|unsuper_loss: 0.0
average reward score: -5.125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.23%) |Training time=0.81s (31.81%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58
epoch: 0|step: 680|ppo_ep: 1|act_loss: -0.00664520263671875|cri_loss: 0.0006580352783203125|unsuper_loss: 0.0
average reward score: -3.478515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.21%) |Training time=0.81s (31.86%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 681|ppo_ep: 1|act_loss: 0.00940704345703125|cri_loss: 0.0007810592651367188|unsuper_loss: 0.0
average reward score: -3.75390625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.61%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 682|ppo_ep: 1|act_loss: 0.00467681884765625|cri_loss: 0.0007023811340332031|unsuper_loss: 0.0
average reward score: -5.06640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.87%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 683|ppo_ep: 1|act_loss: -0.0011186599731445312|cri_loss: 0.0002536773681640625|unsuper_loss: 0.0
average reward score: -4.2421875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.07%) |Training time=0.81s (32.00%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 684|ppo_ep: 1|act_loss: -0.0073089599609375|cri_loss: 0.0005397796630859375|unsuper_loss: 0.0
average reward score: -3.705078125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.27%) |Training time=0.81s (31.84%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 685|ppo_ep: 1|act_loss: -0.030120849609375|cri_loss: 0.0026912689208984375|unsuper_loss: 0.0
average reward score: -4.69140625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.41%) |Training time=0.80s (31.66%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58
epoch: 0|step: 686|ppo_ep: 1|act_loss: -0.0035877227783203125|cri_loss: 0.0005083084106445312|unsuper_loss: 0.0
average reward score: -4.41796875
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.36%) |Training time=0.80s (31.67%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58
epoch: 0|step: 687|ppo_ep: 1|act_loss: -0.0120391845703125|cri_loss: 0.004276275634765625|unsuper_loss: 0.0
average reward score: -3.0703125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (32.01%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 688|ppo_ep: 1|act_loss: 0.00687408447265625|cri_loss: 0.00045228004455566406|unsuper_loss: 0.0
average reward score: -5.22265625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.23%) |Training time=0.81s (31.80%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
[2023-07-01 08:36:47,009] [INFO] [logging.py:96:log_dist] [Rank 0] step=690, skipped=11, lr=[2.265464089857071e-06, 2.265464089857071e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:36:47,187] [INFO] [timer.py:215:stop] epoch=0/micro_step=690/global_step=690, RunningAvgSamplesPerSec=50.88019675638302, CurrSamplesPerSec=51.0109932793722, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:36:47,351] [INFO] [logging.py:96:log_dist] [Rank 0] step=690, skipped=10, lr=[1.1660285267119167e-06, 1.1660285267119167e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 689|ppo_ep: 1|act_loss: 0.010711669921875|cri_loss: 0.0005822181701660156|unsuper_loss: 0.0
average reward score: -5.53515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.60%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 690|ppo_ep: 1|act_loss: 0.004749298095703125|cri_loss: 0.0013265609741210938|unsuper_loss: 0.0
average reward score: -5.6171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.28%) |Training time=0.81s (31.77%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 691|ppo_ep: 1|act_loss: 0.0156707763671875|cri_loss: 0.001941680908203125|unsuper_loss: 0.0
average reward score: -3.5546875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.14%) |Training time=0.81s (31.86%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 692|ppo_ep: 1|act_loss: -0.0091552734375|cri_loss: 0.002437591552734375|unsuper_loss: 0.0
average reward score: -2.82421875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.93%) |Training time=0.82s (32.12%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58
epoch: 0|step: 693|ppo_ep: 1|act_loss: -0.0129852294921875|cri_loss: 0.002227783203125|unsuper_loss: 0.0
average reward score: -3.982421875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.94%) |Training time=0.82s (32.09%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 694|ppo_ep: 1|act_loss: -0.0082855224609375|cri_loss: 0.0012445449829101562|unsuper_loss: 0.0
average reward score: -4.4609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.28%) |Training time=0.81s (31.78%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 695|ppo_ep: 1|act_loss: -0.00780487060546875|cri_loss: 0.0009775161743164062|unsuper_loss: 0.0
average reward score: -4.7109375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.04%) |Training time=0.81s (32.01%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 696|ppo_ep: 1|act_loss: -0.0004851818084716797|cri_loss: 0.000576019287109375|unsuper_loss: 0.0
average reward score: -3.759765625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.13%) |Training time=0.81s (31.91%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.58
epoch: 0|step: 697|ppo_ep: 1|act_loss: -6.580352783203125e-05|cri_loss: 0.0011091232299804688|unsuper_loss: 0.0
average reward score: -4.53515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.49s (58.55%) |Training time=0.83s (32.48%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 698|ppo_ep: 1|act_loss: -0.017791748046875|cri_loss: 0.0018568038940429688|unsuper_loss: 0.0
average reward score: -4.72265625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.30%) |Training time=0.81s (31.78%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
[2023-07-01 08:37:12,426] [INFO] [logging.py:96:log_dist] [Rank 0] step=700, skipped=11, lr=[2.116765839206601e-06, 2.116765839206601e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:37:12,604] [INFO] [timer.py:215:stop] epoch=0/micro_step=700/global_step=700, RunningAvgSamplesPerSec=50.86950985719415, CurrSamplesPerSec=51.042187146445485, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:37:12,769] [INFO] [logging.py:96:log_dist] [Rank 0] step=700, skipped=10, lr=[1.0891680242662836e-06, 1.0891680242662836e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 699|ppo_ep: 1|act_loss: -0.007610321044921875|cri_loss: 0.00084686279296875|unsuper_loss: 0.0
average reward score: -3.6171875
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.40%) |Training time=0.80s (31.67%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.58
epoch: 0|step: 700|ppo_ep: 1|act_loss: 0.0019073486328125|cri_loss: 0.00036907196044921875|unsuper_loss: 0.0
average reward score: -5.52734375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.28%) |Training time=0.80s (31.74%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 701|ppo_ep: 1|act_loss: -0.0003910064697265625|cri_loss: 0.00396728515625|unsuper_loss: 0.0
average reward score: -3.80078125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.99%) |Training time=0.82s (32.09%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 702|ppo_ep: 1|act_loss: -0.00685882568359375|cri_loss: 0.0008416175842285156|unsuper_loss: 0.0
average reward score: -3.650390625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.00%) |Training time=0.81s (32.05%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 703|ppo_ep: 1|act_loss: -0.0419921875|cri_loss: 0.043670654296875|unsuper_loss: 0.0
average reward score: -4.02734375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.20%) |Training time=0.81s (31.87%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 704|ppo_ep: 1|act_loss: 0.0086212158203125|cri_loss: 0.0016756057739257812|unsuper_loss: 0.0
average reward score: -5.3828125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.80%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 705|ppo_ep: 1|act_loss: 0.009490966796875|cri_loss: 0.0016946792602539062|unsuper_loss: 0.0
average reward score: -4.59765625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.33%) |Training time=0.80s (31.69%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 706|ppo_ep: 1|act_loss: -0.007720947265625|cri_loss: 0.0004642009735107422|unsuper_loss: 0.0
average reward score: -4.6875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.41%) |Training time=0.80s (31.57%) |Others=0.23 (9.02%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 707|ppo_ep: 1|act_loss: -0.00814056396484375|cri_loss: 0.0027980804443359375|unsuper_loss: 0.0
average reward score: -3.408203125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.48s (58.03%) |Training time=0.84s (33.04%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58
epoch: 0|step: 708|ppo_ep: 1|act_loss: -0.0350341796875|cri_loss: 0.013916015625|unsuper_loss: 0.0
average reward score: -4.26171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.42%) |Training time=0.80s (31.65%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
[2023-07-01 08:37:37,811] [INFO] [logging.py:96:log_dist] [Rank 0] step=710, skipped=11, lr=[1.971732143510771e-06, 1.971732143510771e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:37:37,994] [INFO] [timer.py:215:stop] epoch=0/micro_step=710/global_step=710, RunningAvgSamplesPerSec=50.86127151176703, CurrSamplesPerSec=50.825936586546234, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:37:38,159] [INFO] [logging.py:96:log_dist] [Rank 0] step=710, skipped=10, lr=[1.0142165411298664e-06, 1.0142165411298664e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 709|ppo_ep: 1|act_loss: -0.00101470947265625|cri_loss: 0.00041413307189941406|unsuper_loss: 0.0
average reward score: -5.30859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.80s (31.66%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 710|ppo_ep: 1|act_loss: 0.0036792755126953125|cri_loss: 0.0013628005981445312|unsuper_loss: 0.0
average reward score: -3.939453125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.14%) |Training time=0.81s (31.87%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58
epoch: 0|step: 711|ppo_ep: 1|act_loss: 0.0159912109375|cri_loss: 0.0019550323486328125|unsuper_loss: 0.0
average reward score: -4.71875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.16%) |Training time=0.81s (31.88%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 712|ppo_ep: 1|act_loss: 0.01277923583984375|cri_loss: 0.0009074211120605469|unsuper_loss: 0.0
average reward score: -3.8125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.80%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 713|ppo_ep: 1|act_loss: -0.0029850006103515625|cri_loss: 0.00010502338409423828|unsuper_loss: 0.0
average reward score: -3.578125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.15%) |Training time=0.81s (31.91%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 714|ppo_ep: 1|act_loss: 0.00995635986328125|cri_loss: 0.001277923583984375|unsuper_loss: 0.0
average reward score: -4.32421875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.74%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 715|ppo_ep: 1|act_loss: -0.01531219482421875|cri_loss: 0.0022068023681640625|unsuper_loss: 0.0
average reward score: -5.65625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.85%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 716|ppo_ep: 1|act_loss: -0.004917144775390625|cri_loss: 0.0011529922485351562|unsuper_loss: 0.0
average reward score: -3.8515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.14%) |Training time=0.81s (31.92%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 717|ppo_ep: 1|act_loss: -0.00514984130859375|cri_loss: 0.0010976791381835938|unsuper_loss: 0.0
average reward score: -5.02734375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.48s (58.24%) |Training time=0.83s (32.87%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 718|ppo_ep: 1|act_loss: -0.001148223876953125|cri_loss: 0.000560760498046875|unsuper_loss: 0.0
average reward score: -4.53515625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.32%) |Training time=0.80s (31.75%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58
[2023-07-01 08:38:03,223] [INFO] [logging.py:96:log_dist] [Rank 0] step=720, skipped=11, lr=[1.830559250182685e-06, 1.830559250182685e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:38:03,401] [INFO] [timer.py:215:stop] epoch=0/micro_step=720/global_step=720, RunningAvgSamplesPerSec=50.851094873295665, CurrSamplesPerSec=49.903581317406946, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:38:03,567] [INFO] [logging.py:96:log_dist] [Rank 0] step=720, skipped=10, lr=[9.412754953531664e-07, 9.412754953531664e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 719|ppo_ep: 1|act_loss: -0.00861358642578125|cri_loss: 0.0006313323974609375|unsuper_loss: 0.0
average reward score: -4.81640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.97%) |Training time=0.82s (32.08%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 720|ppo_ep: 1|act_loss: 0.0033512115478515625|cri_loss: 0.00061798095703125|unsuper_loss: 0.0
average reward score: -4.5859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.01%) |Training time=0.81s (32.04%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 721|ppo_ep: 1|act_loss: -0.02392578125|cri_loss: 0.00179290771484375|unsuper_loss: 0.0
average reward score: -3.888671875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.97%) |Training time=0.82s (32.16%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 722|ppo_ep: 1|act_loss: 0.000568389892578125|cri_loss: 0.00021326541900634766|unsuper_loss: 0.0
average reward score: -3.6953125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.14%) |Training time=0.81s (31.85%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.58
epoch: 0|step: 723|ppo_ep: 1|act_loss: 0.0014905929565429688|cri_loss: 0.0007548332214355469|unsuper_loss: 0.0
average reward score: -4.53125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.25%) |Training time=0.81s (31.78%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 724|ppo_ep: 1|act_loss: -0.008270263671875|cri_loss: 0.00160980224609375|unsuper_loss: 0.0
average reward score: -4.4296875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.25%) |Training time=0.81s (31.79%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 725|ppo_ep: 1|act_loss: 0.00617218017578125|cri_loss: 0.0007119178771972656|unsuper_loss: 0.0
average reward score: -4.375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.11%) |Training time=0.81s (31.92%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 726|ppo_ep: 1|act_loss: 0.005893707275390625|cri_loss: 0.0007605552673339844|unsuper_loss: 0.0
average reward score: -6.3515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.78%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 727|ppo_ep: 1|act_loss: -0.0147247314453125|cri_loss: 0.0007014274597167969|unsuper_loss: 0.0
average reward score: -5.3359375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.26%) |Training time=0.81s (31.77%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 728|ppo_ep: 1|act_loss: 0.00397491455078125|cri_loss: 0.0003597736358642578|unsuper_loss: 0.0
average reward score: -4.04296875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.14%) |Training time=0.81s (31.87%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
[2023-07-01 08:38:28,605] [INFO] [logging.py:96:log_dist] [Rank 0] step=730, skipped=11, lr=[1.693438182522029e-06, 1.693438182522029e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:38:28,786] [INFO] [timer.py:215:stop] epoch=0/micro_step=730/global_step=730, RunningAvgSamplesPerSec=50.844022869884185, CurrSamplesPerSec=50.363465119088985, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:38:28,951] [INFO] [logging.py:96:log_dist] [Rank 0] step=730, skipped=10, lr=[8.704435846317385e-07, 8.704435846317385e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 729|ppo_ep: 1|act_loss: -0.00041675567626953125|cri_loss: 0.0006022453308105469|unsuper_loss: 0.0
average reward score: -4.05078125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.13%) |Training time=0.81s (31.90%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 730|ppo_ep: 1|act_loss: 4.595518112182617e-05|cri_loss: 0.002178192138671875|unsuper_loss: 0.0
average reward score: -5.625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.30%) |Training time=0.81s (31.75%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 731|ppo_ep: 1|act_loss: 0.0128021240234375|cri_loss: 0.001041412353515625|unsuper_loss: 0.0
average reward score: -3.54296875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.75%) |Others=0.22 (8.87%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 732|ppo_ep: 1|act_loss: -0.01190185546875|cri_loss: 0.0007181167602539062|unsuper_loss: 0.0
average reward score: -4.59765625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.31%) |Training time=0.81s (31.80%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 733|ppo_ep: 1|act_loss: 0.007289886474609375|cri_loss: 0.0006923675537109375|unsuper_loss: 0.0
average reward score: -3.78515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.80s (31.66%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 734|ppo_ep: 1|act_loss: -0.0007567405700683594|cri_loss: 0.000946044921875|unsuper_loss: 0.0
average reward score: -4.7265625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.80s (31.59%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 735|ppo_ep: 1|act_loss: -0.00862884521484375|cri_loss: 0.0006604194641113281|unsuper_loss: 0.0
average reward score: -3.1015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.21%) |Training time=0.81s (31.85%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 736|ppo_ep: 1|act_loss: -0.00714111328125|cri_loss: 0.0013666152954101562|unsuper_loss: 0.0
average reward score: -3.96875
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.52%) |Training time=0.80s (31.46%) |Others=0.23 (9.02%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 737|ppo_ep: 1|act_loss: -0.01313018798828125|cri_loss: 0.0015802383422851562|unsuper_loss: 0.0
average reward score: -3.1640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.48%) |Training time=0.80s (31.55%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 738|ppo_ep: 1|act_loss: -0.007770538330078125|cri_loss: 0.0007648468017578125|unsuper_loss: 0.0
average reward score: -4.6953125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.25%) |Training time=0.81s (31.76%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
[2023-07-01 08:38:53,998] [INFO] [logging.py:96:log_dist] [Rank 0] step=740, skipped=11, lr=[1.5605544812383717e-06, 1.5605544812383717e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:38:54,181] [INFO] [timer.py:215:stop] epoch=0/micro_step=740/global_step=740, RunningAvgSamplesPerSec=50.84138854080732, CurrSamplesPerSec=49.77470021379526, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:38:54,347] [INFO] [logging.py:96:log_dist] [Rank 0] step=740, skipped=10, lr=[8.018166527567672e-07, 8.018166527567672e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 739|ppo_ep: 1|act_loss: 0.0006642341613769531|cri_loss: 0.0006761550903320312|unsuper_loss: 0.0
average reward score: -4.125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.00%) |Training time=0.82s (32.03%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.58
epoch: 0|step: 740|ppo_ep: 1|act_loss: -0.019683837890625|cri_loss: 0.0009579658508300781|unsuper_loss: 0.0
average reward score: -4.09375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.26%) |Training time=0.81s (31.80%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 741|ppo_ep: 1|act_loss: 0.01412200927734375|cri_loss: 0.008697509765625|unsuper_loss: 0.0
average reward score: -4.5
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.28%) |Training time=0.81s (31.75%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 742|ppo_ep: 1|act_loss: -0.0027027130126953125|cri_loss: 0.0007004737854003906|unsuper_loss: 0.0
average reward score: -5.11328125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.81%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 743|ppo_ep: 1|act_loss: -0.0011072158813476562|cri_loss: 0.0011377334594726562|unsuper_loss: 0.0
average reward score: -4.46875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.92%) |Training time=0.82s (32.08%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58
epoch: 0|step: 744|ppo_ep: 1|act_loss: -0.010162353515625|cri_loss: 0.0020160675048828125|unsuper_loss: 0.0
average reward score: -3.71875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.49s (58.67%) |Training time=0.82s (32.42%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 745|ppo_ep: 1|act_loss: 0.001995086669921875|cri_loss: 0.0003790855407714844|unsuper_loss: 0.0
average reward score: -5.296875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.87%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 746|ppo_ep: 1|act_loss: -0.007068634033203125|cri_loss: 0.0006475448608398438|unsuper_loss: 0.0
average reward score: -4.09765625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.24%) |Training time=0.81s (31.84%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 747|ppo_ep: 1|act_loss: -0.01476287841796875|cri_loss: 0.0010480880737304688|unsuper_loss: 0.0
average reward score: -5.875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.80s (31.63%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 748|ppo_ep: 1|act_loss: -0.01001739501953125|cri_loss: 0.0020313262939453125|unsuper_loss: 0.0
average reward score: -3.400390625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.41%) |Training time=0.80s (31.63%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
[2023-07-01 08:39:19,397] [INFO] [logging.py:96:log_dist] [Rank 0] step=750, skipped=11, lr=[1.432087953393078e-06, 1.432087953393078e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:39:19,575] [INFO] [timer.py:215:stop] epoch=0/micro_step=750/global_step=750, RunningAvgSamplesPerSec=50.83529230409125, CurrSamplesPerSec=50.3979212664654, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:39:19,740] [INFO] [logging.py:96:log_dist] [Rank 0] step=750, skipped=10, lr=[7.354875599272929e-07, 7.354875599272929e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 749|ppo_ep: 1|act_loss: -0.006053924560546875|cri_loss: 0.00122833251953125|unsuper_loss: 0.0
average reward score: -4.73046875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.27%) |Training time=0.81s (31.84%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 750|ppo_ep: 1|act_loss: -0.0158843994140625|cri_loss: 0.00801849365234375|unsuper_loss: 0.0
average reward score: -4.4140625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.44%) |Training time=0.80s (31.65%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 751|ppo_ep: 1|act_loss: 0.0097198486328125|cri_loss: 0.00035858154296875|unsuper_loss: 0.0
average reward score: -3.943359375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.49%) |Training time=0.80s (31.53%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58
epoch: 0|step: 752|ppo_ep: 1|act_loss: -0.0006928443908691406|cri_loss: 0.0007891654968261719|unsuper_loss: 0.0
average reward score: -4.296875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.59%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 753|ppo_ep: 1|act_loss: 0.0091094970703125|cri_loss: 0.00044417381286621094|unsuper_loss: 0.0
average reward score: -4.78515625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.87%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 754|ppo_ep: 1|act_loss: -0.01537322998046875|cri_loss: 0.0004978179931640625|unsuper_loss: 0.0
average reward score: -4.9296875
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.48s (58.25%) |Training time=0.83s (32.79%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 755|ppo_ep: 1|act_loss: 0.009246826171875|cri_loss: 0.001781463623046875|unsuper_loss: 0.0
average reward score: -4.62890625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.28%) |Training time=0.81s (31.79%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 756|ppo_ep: 1|act_loss: -0.007114410400390625|cri_loss: 0.0013494491577148438|unsuper_loss: 0.0
average reward score: -4.40234375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.21%) |Training time=0.81s (31.82%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 757|ppo_ep: 1|act_loss: -0.002605438232421875|cri_loss: 0.0009036064147949219|unsuper_loss: 0.0
average reward score: -4.49609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.04%) |Training time=0.81s (31.97%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 758|ppo_ep: 1|act_loss: 0.004314422607421875|cri_loss: 0.0006737709045410156|unsuper_loss: 0.0
average reward score: -4.6875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.04%) |Training time=0.81s (32.03%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
[2023-07-01 08:39:44,777] [INFO] [logging.py:96:log_dist] [Rank 0] step=760, skipped=11, lr=[1.308212429099484e-06, 1.308212429099484e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:39:44,960] [INFO] [timer.py:215:stop] epoch=0/micro_step=760/global_step=760, RunningAvgSamplesPerSec=50.82850130253925, CurrSamplesPerSec=50.211905159171664, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:39:45,125] [INFO] [logging.py:96:log_dist] [Rank 0] step=760, skipped=10, lr=[6.715460570995988e-07, 6.715460570995988e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 759|ppo_ep: 1|act_loss: 4.684925079345703e-05|cri_loss: 0.0006060600280761719|unsuper_loss: 0.0
average reward score: -5.1171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.10%) |Training time=0.81s (31.94%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 760|ppo_ep: 1|act_loss: -0.001979827880859375|cri_loss: 0.0007104873657226562|unsuper_loss: 0.0
average reward score: -4.86328125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.16%) |Training time=0.81s (31.92%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 761|ppo_ep: 1|act_loss: 0.008544921875|cri_loss: 0.01041412353515625|unsuper_loss: 0.0
average reward score: -5.0546875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.94%) |Training time=0.82s (32.11%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 762|ppo_ep: 1|act_loss: 0.005626678466796875|cri_loss: 0.00041794776916503906|unsuper_loss: 0.0
average reward score: -4.87109375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.04%) |Training time=0.81s (32.00%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 763|ppo_ep: 1|act_loss: 0.0004391670227050781|cri_loss: 0.0004982948303222656|unsuper_loss: 0.0
average reward score: -3.91796875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.11%) |Training time=0.81s (31.94%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 764|ppo_ep: 1|act_loss: -0.015869140625|cri_loss: 0.006748199462890625|unsuper_loss: 0.0
average reward score: -3.833984375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.29%) |Training time=0.80s (31.77%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58
epoch: 0|step: 765|ppo_ep: 1|act_loss: 0.0147857666015625|cri_loss: 0.0009660720825195312|unsuper_loss: 0.0
average reward score: -4.5625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.82%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58
epoch: 0|step: 766|ppo_ep: 1|act_loss: 0.00940704345703125|cri_loss: 0.0006909370422363281|unsuper_loss: 0.0
average reward score: -3.091796875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.26%) |Training time=0.81s (31.79%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 767|ppo_ep: 1|act_loss: 0.024932861328125|cri_loss: 0.0034046173095703125|unsuper_loss: 0.0
average reward score: -4.703125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.23%) |Training time=0.81s (31.81%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 768|ppo_ep: 1|act_loss: -0.00765228271484375|cri_loss: 0.00034332275390625|unsuper_loss: 0.0
average reward score: -5.49609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.30%) |Training time=0.81s (31.79%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
[2023-07-01 08:40:10,166] [INFO] [logging.py:96:log_dist] [Rank 0] step=770, skipped=11, lr=[1.1890955263106013e-06, 1.1890955263106013e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:40:10,344] [INFO] [timer.py:215:stop] epoch=0/micro_step=770/global_step=770, RunningAvgSamplesPerSec=50.82270925096035, CurrSamplesPerSec=50.63325765755162, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:40:10,510] [INFO] [logging.py:96:log_dist] [Rank 0] step=770, skipped=10, lr=[6.100786645437481e-07, 6.100786645437481e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 769|ppo_ep: 1|act_loss: 0.0008969306945800781|cri_loss: 0.0002808570861816406|unsuper_loss: 0.0
average reward score: -4.953125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.29%) |Training time=0.81s (31.78%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 770|ppo_ep: 1|act_loss: -0.0043182373046875|cri_loss: 0.0003192424774169922|unsuper_loss: 0.0
average reward score: -5.0234375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.80s (31.62%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 771|ppo_ep: 1|act_loss: 0.007114410400390625|cri_loss: 0.0010290145874023438|unsuper_loss: 0.0
average reward score: -2.98828125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.15%) |Training time=0.81s (31.88%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 772|ppo_ep: 1|act_loss: 0.003932952880859375|cri_loss: 0.00020432472229003906|unsuper_loss: 0.0
average reward score: -6.9453125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.13%) |Training time=0.81s (31.94%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 773|ppo_ep: 1|act_loss: -0.0207061767578125|cri_loss: 0.0019989013671875|unsuper_loss: 0.0
average reward score: -6.3125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.89%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 774|ppo_ep: 1|act_loss: -0.00439453125|cri_loss: 0.0006737709045410156|unsuper_loss: 0.0
average reward score: -3.359375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.27%) |Training time=0.81s (31.78%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 775|ppo_ep: 1|act_loss: -0.00978851318359375|cri_loss: 0.0010223388671875|unsuper_loss: 0.0
average reward score: -4.4609375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.06%) |Training time=0.81s (32.01%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 776|ppo_ep: 1|act_loss: -0.002338409423828125|cri_loss: 0.00022780895233154297|unsuper_loss: 0.0
average reward score: -5.55078125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.27%) |Training time=0.81s (31.76%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 777|ppo_ep: 1|act_loss: 0.007259368896484375|cri_loss: 0.0019664764404296875|unsuper_loss: 0.0
average reward score: -4.9375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.80s (31.69%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 778|ppo_ep: 1|act_loss: 0.01006317138671875|cri_loss: 0.0007495880126953125|unsuper_loss: 0.0
average reward score: -3.859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.22%) |Training time=0.80s (31.74%) |Others=0.23 (9.04%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
[2023-07-01 08:40:35,570] [INFO] [logging.py:96:log_dist] [Rank 0] step=780, skipped=11, lr=[1.0748984240125836e-06, 1.0748984240125836e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:40:35,749] [INFO] [timer.py:215:stop] epoch=0/micro_step=780/global_step=780, RunningAvgSamplesPerSec=50.8188052236844, CurrSamplesPerSec=51.103794393412976, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:40:35,915] [INFO] [logging.py:96:log_dist] [Rank 0] step=780, skipped=10, lr=[5.511685547716328e-07, 5.511685547716328e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 779|ppo_ep: 1|act_loss: -0.0020656585693359375|cri_loss: 0.000621795654296875|unsuper_loss: 0.0
average reward score: -3.849609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.60%) |Training time=0.80s (31.44%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 780|ppo_ep: 1|act_loss: 0.0164031982421875|cri_loss: 0.0018396377563476562|unsuper_loss: 0.0
average reward score: -4.9140625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.58%) |Training time=0.80s (31.51%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58
epoch: 0|step: 781|ppo_ep: 1|act_loss: -0.0005102157592773438|cri_loss: 0.0007472038269042969|unsuper_loss: 0.0
average reward score: -3.876953125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.54%) |Training time=0.80s (31.54%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58
epoch: 0|step: 782|ppo_ep: 1|act_loss: 0.01511383056640625|cri_loss: 0.007110595703125|unsuper_loss: 0.0
average reward score: -3.287109375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.55%) |Training time=0.80s (31.55%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 783|ppo_ep: 1|act_loss: -0.011383056640625|cri_loss: 0.0003991127014160156|unsuper_loss: 0.0
average reward score: -6.77734375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.54%) |Training time=0.80s (31.55%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 784|ppo_ep: 1|act_loss: 0.004100799560546875|cri_loss: 0.0004754066467285156|unsuper_loss: 0.0
average reward score: -5.359375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.51%) |Training time=0.80s (31.59%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58
epoch: 0|step: 785|ppo_ep: 1|act_loss: 0.010894775390625|cri_loss: 0.0007996559143066406|unsuper_loss: 0.0
average reward score: -4.60546875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.47s (57.90%) |Training time=0.84s (33.12%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 786|ppo_ep: 1|act_loss: -0.0063629150390625|cri_loss: 0.0010843276977539062|unsuper_loss: 0.0
average reward score: -6.421875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.91%) |Training time=0.82s (32.13%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 787|ppo_ep: 1|act_loss: 0.01013946533203125|cri_loss: 0.00337982177734375|unsuper_loss: 0.0
average reward score: -4.49609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.06%) |Training time=0.81s (31.98%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 788|ppo_ep: 1|act_loss: -0.005931854248046875|cri_loss: 0.0003879070281982422|unsuper_loss: 0.0
average reward score: -6.0390625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.05%) |Training time=0.81s (31.93%) |Others=0.23 (9.02%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
[2023-07-01 08:41:00,990] [INFO] [logging.py:96:log_dist] [Rank 0] step=790, skipped=11, lr=[9.657756441308542e-07, 9.657756441308542e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:41:01,173] [INFO] [timer.py:215:stop] epoch=0/micro_step=790/global_step=790, RunningAvgSamplesPerSec=50.81225974869159, CurrSamplesPerSec=50.43987305261852, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:41:01,339] [INFO] [logging.py:96:log_dist] [Rank 0] step=790, skipped=10, lr=[4.948954399949105e-07, 4.948954399949105e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 789|ppo_ep: 1|act_loss: 0.0230560302734375|cri_loss: 0.00936126708984375|unsuper_loss: 0.0
average reward score: -3.810546875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.86%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
[2023-07-01 08:41:03,879] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 790|ppo_ep: 1|act_loss: -0.009185791015625|cri_loss: 0.00069427490234375|unsuper_loss: 0.0
average reward score: -4.87109375
-------------------------------------------------------------------------------------
|E2E latency=2.50s |Gather latency=0.00s (0.00%) |Generate time=1.51s (60.35%) |Training time=0.81s (32.39%) |Others=0.18 (7.26%)|CurSamplesPerSec=12.81 |AvgSamplesPerSec=12.58
epoch: 0|step: 791|ppo_ep: 1|act_loss: -0.0019931793212890625|cri_loss: 0.001560211181640625|unsuper_loss: 0.0
average reward score: -4.86328125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.26%) |Training time=0.81s (31.82%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 792|ppo_ep: 1|act_loss: -0.0013027191162109375|cri_loss: 0.0004451274871826172|unsuper_loss: 0.0
average reward score: -4.37109375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.80s (31.68%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 793|ppo_ep: 1|act_loss: -0.00081634521484375|cri_loss: 0.00010085105895996094|unsuper_loss: 0.0
average reward score: -4.42578125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.33%) |Training time=0.80s (31.68%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 794|ppo_ep: 1|act_loss: 0.00760650634765625|cri_loss: 0.0010881423950195312|unsuper_loss: 0.0
average reward score: -3.1796875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.97%) |Training time=0.82s (32.07%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 795|ppo_ep: 1|act_loss: 0.041595458984375|cri_loss: 0.0131683349609375|unsuper_loss: 0.0
average reward score: -6.375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.21%) |Training time=0.81s (31.88%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 796|ppo_ep: 1|act_loss: -0.01038360595703125|cri_loss: 0.0006246566772460938|unsuper_loss: 0.0
average reward score: -4.77734375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.39%) |Training time=0.80s (31.69%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.58
epoch: 0|step: 797|ppo_ep: 1|act_loss: -0.0028705596923828125|cri_loss: 0.0006313323974609375|unsuper_loss: 0.0
average reward score: -3.50390625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.21%) |Training time=0.81s (31.88%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 798|ppo_ep: 1|act_loss: 0.01387786865234375|cri_loss: 0.0022983551025390625|unsuper_loss: 0.0
average reward score: -4.75390625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.80s (31.68%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
[2023-07-01 08:41:26,352] [INFO] [logging.py:96:log_dist] [Rank 0] step=800, skipped=11, lr=[8.618748424440287e-07, 8.618748424440287e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:41:26,529] [INFO] [timer.py:215:stop] epoch=0/micro_step=800/global_step=800, RunningAvgSamplesPerSec=50.80810776358231, CurrSamplesPerSec=50.14485840245087, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:41:26,695] [INFO] [logging.py:96:log_dist] [Rank 0] step=800, skipped=11, lr=[4.4656727587773506e-07, 4.4656727587773506e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 799|ppo_ep: 1|act_loss: -0.0024509429931640625|cri_loss: 0.00016570091247558594|unsuper_loss: 0.0
average reward score: -4.5
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.14%) |Training time=0.81s (31.94%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 800|ppo_ep: 1|act_loss: 0.00510406494140625|cri_loss: 0.0002956390380859375|unsuper_loss: 0.0
average reward score: -4.1640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.80s (31.73%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 801|ppo_ep: 1|act_loss: 0.01212310791015625|cri_loss: 0.0028591156005859375|unsuper_loss: 0.0
average reward score: -3.73828125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.42%) |Training time=0.80s (31.63%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.58
epoch: 0|step: 802|ppo_ep: 1|act_loss: 0.00460052490234375|cri_loss: 0.0011653900146484375|unsuper_loss: 0.0
average reward score: -5.00390625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.29%) |Training time=0.80s (31.73%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 803|ppo_ep: 1|act_loss: -0.04107666015625|cri_loss: 0.019622802734375|unsuper_loss: 0.0
average reward score: -5.35546875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.34%) |Training time=0.81s (31.71%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 804|ppo_ep: 1|act_loss: -0.00540924072265625|cri_loss: 0.001010894775390625|unsuper_loss: 0.0
average reward score: -4.6328125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.21%) |Training time=0.81s (31.83%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.58
epoch: 0|step: 805|ppo_ep: 1|act_loss: 0.0260162353515625|cri_loss: 0.003101348876953125|unsuper_loss: 0.0
average reward score: -4.1640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.28%) |Training time=0.81s (31.78%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 806|ppo_ep: 1|act_loss: -0.0008225440979003906|cri_loss: 0.003002166748046875|unsuper_loss: 0.0
average reward score: -3.19921875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.81s (31.73%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 807|ppo_ep: 1|act_loss: -0.00559234619140625|cri_loss: 0.0009756088256835938|unsuper_loss: 0.0
average reward score: -4.20703125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.32%) |Training time=0.80s (31.69%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 808|ppo_ep: 1|act_loss: -0.00569915771484375|cri_loss: 0.0011138916015625|unsuper_loss: 0.0
average reward score: -4.203125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.44%) |Training time=0.81s (31.60%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58
[2023-07-01 08:41:51,762] [INFO] [logging.py:96:log_dist] [Rank 0] step=810, skipped=11, lr=[7.633366087885105e-07, 7.633366087885105e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:41:51,941] [INFO] [timer.py:215:stop] epoch=0/micro_step=810/global_step=810, RunningAvgSamplesPerSec=50.8059958537766, CurrSamplesPerSec=50.650301710637045, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:41:52,105] [INFO] [logging.py:96:log_dist] [Rank 0] step=810, skipped=11, lr=[3.9551119626347693e-07, 3.9551119626347693e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 809|ppo_ep: 1|act_loss: 0.01210784912109375|cri_loss: 0.002460479736328125|unsuper_loss: 0.0
average reward score: -4.80859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.81s (31.72%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 810|ppo_ep: 1|act_loss: 0.00421142578125|cri_loss: 0.0005979537963867188|unsuper_loss: 0.0
average reward score: -6.25390625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.58%) |Training time=0.80s (31.49%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 811|ppo_ep: 1|act_loss: -0.00463104248046875|cri_loss: 0.0008816719055175781|unsuper_loss: 0.0
average reward score: -4.390625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.47s (57.93%) |Training time=0.84s (33.15%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 812|ppo_ep: 1|act_loss: -0.0014476776123046875|cri_loss: 0.002285003662109375|unsuper_loss: 0.0
average reward score: -5.42578125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.80s (31.64%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 813|ppo_ep: 1|act_loss: -0.0010023117065429688|cri_loss: 0.0009264945983886719|unsuper_loss: 0.0
average reward score: -3.80859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.25%) |Training time=0.81s (31.79%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 814|ppo_ep: 1|act_loss: 0.00547027587890625|cri_loss: 0.0005512237548828125|unsuper_loss: 0.0
average reward score: -2.65234375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.47%) |Training time=0.80s (31.59%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 815|ppo_ep: 1|act_loss: -0.0011491775512695312|cri_loss: 0.0001633167266845703|unsuper_loss: 0.0
average reward score: -4.66796875
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.32%) |Training time=0.80s (31.71%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 816|ppo_ep: 1|act_loss: 0.00296783447265625|cri_loss: 0.00028014183044433594|unsuper_loss: 0.0
average reward score: -4.8984375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.23%) |Training time=0.81s (31.81%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 817|ppo_ep: 1|act_loss: -0.00771331787109375|cri_loss: 0.0005936622619628906|unsuper_loss: 0.0
average reward score: -3.818359375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.96%) |Training time=0.82s (32.10%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 818|ppo_ep: 1|act_loss: 0.01326751708984375|cri_loss: 0.002292633056640625|unsuper_loss: 0.0
average reward score: -4.2421875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.19%) |Training time=0.81s (31.88%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
[2023-07-01 08:42:17,131] [INFO] [logging.py:96:log_dist] [Rank 0] step=820, skipped=11, lr=[6.702942768241414e-07, 6.702942768241414e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:42:17,313] [INFO] [timer.py:215:stop] epoch=0/micro_step=820/global_step=820, RunningAvgSamplesPerSec=50.800591528592534, CurrSamplesPerSec=50.43989200824971, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:42:17,478] [INFO] [logging.py:96:log_dist] [Rank 0] step=820, skipped=11, lr=[3.473027341057728e-07, 3.473027341057728e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 819|ppo_ep: 1|act_loss: -0.0131378173828125|cri_loss: 0.0010347366333007812|unsuper_loss: 0.0
average reward score: -4.12109375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.22%) |Training time=0.81s (31.85%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
[2023-07-01 08:42:20,008] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384
epoch: 0|step: 820|ppo_ep: 1|act_loss: -0.0321044921875|cri_loss: 0.01514434814453125|unsuper_loss: 0.0
average reward score: -4.50390625
-------------------------------------------------------------------------------------
|E2E latency=2.49s |Gather latency=0.00s (0.00%) |Generate time=1.50s (60.40%) |Training time=0.81s (32.41%) |Others=0.18 (7.18%)|CurSamplesPerSec=12.86 |AvgSamplesPerSec=12.58
epoch: 0|step: 821|ppo_ep: 1|act_loss: -0.01032257080078125|cri_loss: 0.006351470947265625|unsuper_loss: 0.0
average reward score: -3.16796875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.45s (57.05%) |Training time=0.86s (33.95%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58
epoch: 0|step: 822|ppo_ep: 1|act_loss: 0.002361297607421875|cri_loss: 0.0009703636169433594|unsuper_loss: 0.0
average reward score: -5.15234375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.93%) |Training time=0.82s (32.09%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 823|ppo_ep: 1|act_loss: -0.0052490234375|cri_loss: 0.001323699951171875|unsuper_loss: 0.0
average reward score: -4.75
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.87%) |Training time=0.82s (32.18%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 824|ppo_ep: 1|act_loss: -0.0278167724609375|cri_loss: 0.0184173583984375|unsuper_loss: 0.0
average reward score: -5.17578125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.46%) |Training time=0.80s (31.57%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
[2023-07-01 08:42:32,324] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, but hysteresis is 2. Reducing hysteresis to 1
epoch: 0|step: 825|ppo_ep: 1|act_loss: -0.0045013427734375|cri_loss: 0.00031685829162597656|unsuper_loss: 0.0
average reward score: -5.07421875
-------------------------------------------------------------------------------------
|E2E latency=2.35s |Gather latency=0.00s (0.00%) |Generate time=1.51s (64.15%) |Training time=0.62s (26.18%) |Others=0.23 (9.68%)|CurSamplesPerSec=13.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 826|ppo_ep: 1|act_loss: 0.01226806640625|cri_loss: 0.000789642333984375|unsuper_loss: 0.0
average reward score: -4.375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.11%) |Training time=0.81s (31.97%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 827|ppo_ep: 1|act_loss: -0.007415771484375|cri_loss: 0.0002620220184326172|unsuper_loss: 0.0
average reward score: -3.84375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.93%) |Training time=0.82s (32.10%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 828|ppo_ep: 1|act_loss: 0.002338409423828125|cri_loss: 0.0010023117065429688|unsuper_loss: 0.0
average reward score: -4.01953125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.00%) |Training time=0.82s (32.12%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
[2023-07-01 08:42:42,316] [INFO] [logging.py:96:log_dist] [Rank 0] step=830, skipped=12, lr=[5.913593843626703e-07, 5.913593843626703e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:42:42,494] [INFO] [timer.py:215:stop] epoch=0/micro_step=830/global_step=830, RunningAvgSamplesPerSec=50.80620896325542, CurrSamplesPerSec=49.97083979759643, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:42:42,658] [INFO] [logging.py:96:log_dist] [Rank 0] step=830, skipped=12, lr=[3.064038260946478e-07, 3.064038260946478e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 829|ppo_ep: 1|act_loss: 0.0083770751953125|cri_loss: 0.0008697509765625|unsuper_loss: 0.0
average reward score: -4.5234375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.05%) |Training time=0.81s (32.08%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 830|ppo_ep: 1|act_loss: 0.012237548828125|cri_loss: 0.0024394989013671875|unsuper_loss: 0.0
average reward score: -4.0625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (31.96%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 831|ppo_ep: 1|act_loss: 0.0014257431030273438|cri_loss: 0.00044035911560058594|unsuper_loss: 0.0
average reward score: -4.24609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.47s (57.77%) |Training time=0.84s (33.25%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 832|ppo_ep: 1|act_loss: 0.003421783447265625|cri_loss: 0.0006437301635742188|unsuper_loss: 0.0
average reward score: -5.921875
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.23%) |Training time=0.81s (31.82%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 833|ppo_ep: 1|act_loss: 0.005580902099609375|cri_loss: 0.0007596015930175781|unsuper_loss: 0.0
average reward score: -4.3359375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.22%) |Training time=0.81s (31.91%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 834|ppo_ep: 1|act_loss: -0.01206207275390625|cri_loss: 0.00634765625|unsuper_loss: 0.0
average reward score: -3.984375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.82%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 835|ppo_ep: 1|act_loss: 0.002063751220703125|cri_loss: 0.0020275115966796875|unsuper_loss: 0.0
average reward score: -4.0546875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.95%) |Training time=0.82s (32.11%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 836|ppo_ep: 1|act_loss: 0.0113983154296875|cri_loss: 0.0008807182312011719|unsuper_loss: 0.0
average reward score: -3.90625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.90%) |Training time=0.82s (32.14%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 837|ppo_ep: 1|act_loss: -0.002044677734375|cri_loss: 0.00042629241943359375|unsuper_loss: 0.0
average reward score: -3.69921875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.21%) |Training time=0.81s (31.86%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 838|ppo_ep: 1|act_loss: -0.00036835670471191406|cri_loss: 0.0003173351287841797|unsuper_loss: 0.0
average reward score: -5.38671875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.35%) |Training time=0.80s (31.70%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
[2023-07-01 08:43:07,699] [INFO] [logging.py:96:log_dist] [Rank 0] step=840, skipped=12, lr=[5.090998282460625e-07, 5.090998282460625e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:43:07,882] [INFO] [timer.py:215:stop] epoch=0/micro_step=840/global_step=840, RunningAvgSamplesPerSec=50.79660626428774, CurrSamplesPerSec=50.462269584820305, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:43:08,047] [INFO] [logging.py:96:log_dist] [Rank 0] step=840, skipped=12, lr=[2.6378229442801163e-07, 2.6378229442801163e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 839|ppo_ep: 1|act_loss: 0.0015211105346679688|cri_loss: 0.0004911422729492188|unsuper_loss: 0.0
average reward score: -4.1875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.21%) |Training time=0.81s (31.80%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 840|ppo_ep: 1|act_loss: 0.018218994140625|cri_loss: 0.0022373199462890625|unsuper_loss: 0.0
average reward score: -3.208984375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.17%) |Training time=0.81s (31.89%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 841|ppo_ep: 1|act_loss: -0.0058441162109375|cri_loss: 0.000354766845703125|unsuper_loss: 0.0
average reward score: -2.947265625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.81%) |Training time=0.82s (32.23%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58
epoch: 0|step: 842|ppo_ep: 1|act_loss: -0.020782470703125|cri_loss: 0.0105743408203125|unsuper_loss: 0.0
average reward score: -4.51953125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.49s (58.47%) |Training time=0.83s (32.61%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 843|ppo_ep: 1|act_loss: -0.0130767822265625|cri_loss: 0.0010137557983398438|unsuper_loss: 0.0
average reward score: -5.4375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.14%) |Training time=0.81s (31.94%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 844|ppo_ep: 1|act_loss: -0.0033321380615234375|cri_loss: 0.0001571178436279297|unsuper_loss: 0.0
average reward score: -3.626953125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.35%) |Training time=0.80s (31.70%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 845|ppo_ep: 1|act_loss: 0.0133819580078125|cri_loss: 0.0014791488647460938|unsuper_loss: 0.0
average reward score: -4.5390625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.29%) |Training time=0.81s (31.77%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 846|ppo_ep: 1|act_loss: 0.007793426513671875|cri_loss: 0.00146484375|unsuper_loss: 0.0
average reward score: -3.703125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.26%) |Training time=0.81s (31.84%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 847|ppo_ep: 1|act_loss: 0.00125885009765625|cri_loss: 0.0009150505065917969|unsuper_loss: 0.0
average reward score: -3.9921875
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.28%) |Training time=0.81s (31.79%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 848|ppo_ep: 1|act_loss: 0.0070648193359375|cri_loss: 0.0006608963012695312|unsuper_loss: 0.0
average reward score: -5.9609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.11%) |Training time=0.81s (31.94%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
[2023-07-01 08:43:33,094] [INFO] [logging.py:96:log_dist] [Rank 0] step=850, skipped=12, lr=[4.326801856742557e-07, 4.326801856742557e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:43:33,272] [INFO] [timer.py:215:stop] epoch=0/micro_step=850/global_step=850, RunningAvgSamplesPerSec=50.789375356614926, CurrSamplesPerSec=50.198571586723595, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:43:33,438] [INFO] [logging.py:96:log_dist] [Rank 0] step=850, skipped=12, lr=[2.241866247016869e-07, 2.241866247016869e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 849|ppo_ep: 1|act_loss: -0.002407073974609375|cri_loss: 0.00022208690643310547|unsuper_loss: 0.0
average reward score: -4.05859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.06%) |Training time=0.81s (32.00%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 850|ppo_ep: 1|act_loss: 0.01123809814453125|cri_loss: 0.0011377334594726562|unsuper_loss: 0.0
average reward score: -3.6015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.13%) |Training time=0.81s (31.91%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 851|ppo_ep: 1|act_loss: -0.01062774658203125|cri_loss: 0.0006108283996582031|unsuper_loss: 0.0
average reward score: -4.453125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.80s (31.73%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 852|ppo_ep: 1|act_loss: -0.006153106689453125|cri_loss: 0.0008974075317382812|unsuper_loss: 0.0
average reward score: -3.8984375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.76%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 853|ppo_ep: 1|act_loss: 0.0023746490478515625|cri_loss: 0.00080108642578125|unsuper_loss: 0.0
average reward score: -4.0625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.84%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 854|ppo_ep: 1|act_loss: 0.005146026611328125|cri_loss: 0.0017766952514648438|unsuper_loss: 0.0
average reward score: -7.328125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.19%) |Training time=0.81s (31.87%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 855|ppo_ep: 1|act_loss: -0.005863189697265625|cri_loss: 0.00016427040100097656|unsuper_loss: 0.0
average reward score: -5.59375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.21%) |Training time=0.81s (31.81%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 856|ppo_ep: 1|act_loss: 0.0037174224853515625|cri_loss: 0.0004315376281738281|unsuper_loss: 0.0
average reward score: -5.390625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.07%) |Training time=0.81s (31.94%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 857|ppo_ep: 1|act_loss: 0.009246826171875|cri_loss: 0.0009007453918457031|unsuper_loss: 0.0
average reward score: -2.759765625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (31.94%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 858|ppo_ep: 1|act_loss: 0.024627685546875|cri_loss: 0.0086517333984375|unsuper_loss: 0.0
average reward score: -5.80859375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.87%) |Training time=0.82s (32.17%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.58
[2023-07-01 08:43:58,505] [INFO] [logging.py:96:log_dist] [Rank 0] step=860, skipped=12, lr=[3.6220386128776603e-07, 3.6220386128776603e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:43:58,686] [INFO] [timer.py:215:stop] epoch=0/micro_step=860/global_step=860, RunningAvgSamplesPerSec=50.78290749061954, CurrSamplesPerSec=49.63691729629178, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:43:58,852] [INFO] [logging.py:96:log_dist] [Rank 0] step=860, skipped=12, lr=[1.876703944496197e-07, 1.876703944496197e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 859|ppo_ep: 1|act_loss: 0.017669677734375|cri_loss: 0.00274658203125|unsuper_loss: 0.0
average reward score: -5.40234375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.92%) |Training time=0.82s (32.13%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58
epoch: 0|step: 860|ppo_ep: 1|act_loss: -0.0022125244140625|cri_loss: 0.0007147789001464844|unsuper_loss: 0.0
average reward score: -4.6796875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.95%) |Training time=0.82s (32.16%) |Others=0.23 (8.89%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 861|ppo_ep: 1|act_loss: 0.0033016204833984375|cri_loss: 0.0009131431579589844|unsuper_loss: 0.0
average reward score: -5.96484375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.38%) |Training time=0.80s (31.66%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.58
epoch: 0|step: 862|ppo_ep: 1|act_loss: 0.006336212158203125|cri_loss: 0.00039076805114746094|unsuper_loss: 0.0
average reward score: -4.375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.36%) |Training time=0.80s (31.65%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58
epoch: 0|step: 863|ppo_ep: 1|act_loss: -0.016876220703125|cri_loss: 0.0005512237548828125|unsuper_loss: 0.0
average reward score: -3.08984375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.10%) |Training time=0.81s (31.93%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 864|ppo_ep: 1|act_loss: 0.006755828857421875|cri_loss: 0.00051116943359375|unsuper_loss: 0.0
average reward score: -5.0625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.99%) |Training time=0.81s (32.03%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 865|ppo_ep: 1|act_loss: 0.0220489501953125|cri_loss: 0.004367828369140625|unsuper_loss: 0.0
average reward score: -5.58203125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.25%) |Training time=0.81s (31.82%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 866|ppo_ep: 1|act_loss: -0.01568603515625|cri_loss: 0.0013265609741210938|unsuper_loss: 0.0
average reward score: -4.46484375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.47s (57.90%) |Training time=0.84s (33.18%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 867|ppo_ep: 1|act_loss: 0.020904541015625|cri_loss: 0.00174713134765625|unsuper_loss: 0.0
average reward score: -4.6640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.76%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 868|ppo_ep: 1|act_loss: 0.005001068115234375|cri_loss: 0.0016937255859375|unsuper_loss: 0.0
average reward score: -4.2734375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.06%) |Training time=0.81s (31.95%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
[2023-07-01 08:44:23,887] [INFO] [logging.py:96:log_dist] [Rank 0] step=870, skipped=12, lr=[2.9776621772821655e-07, 2.9776621772821655e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:44:24,070] [INFO] [timer.py:215:stop] epoch=0/micro_step=870/global_step=870, RunningAvgSamplesPerSec=50.7752857768912, CurrSamplesPerSec=50.22530220075268, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:44:24,236] [INFO] [logging.py:96:log_dist] [Rank 0] step=870, skipped=12, lr=[1.542830143669516e-07, 1.542830143669516e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 869|ppo_ep: 1|act_loss: 0.0167388916015625|cri_loss: 0.002178192138671875|unsuper_loss: 0.0
average reward score: -4.32421875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.11%) |Training time=0.81s (31.93%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 870|ppo_ep: 1|act_loss: 0.020416259765625|cri_loss: 0.0031757354736328125|unsuper_loss: 0.0
average reward score: -6.64453125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.89%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
epoch: 0|step: 871|ppo_ep: 1|act_loss: -0.001644134521484375|cri_loss: 0.0006380081176757812|unsuper_loss: 0.0
average reward score: -3.47265625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.05%) |Training time=0.82s (32.02%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.58
epoch: 0|step: 872|ppo_ep: 1|act_loss: -0.004741668701171875|cri_loss: 0.00035190582275390625|unsuper_loss: 0.0
average reward score: -3.298828125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.90%) |Training time=0.82s (32.11%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.54 |AvgSamplesPerSec=12.58
epoch: 0|step: 873|ppo_ep: 1|act_loss: 0.012420654296875|cri_loss: 0.002422332763671875|unsuper_loss: 0.0
average reward score: -3.5234375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.00%) |Training time=0.81s (31.99%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 874|ppo_ep: 1|act_loss: 0.0034027099609375|cri_loss: 0.0008091926574707031|unsuper_loss: 0.0
average reward score: -3.046875
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.29%) |Training time=0.80s (31.75%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.58
epoch: 0|step: 875|ppo_ep: 1|act_loss: 0.0013265609741210938|cri_loss: 0.0019321441650390625|unsuper_loss: 0.0
average reward score: -4.76953125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.17%) |Training time=0.81s (31.90%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 876|ppo_ep: 1|act_loss: -0.01454925537109375|cri_loss: 0.0009756088256835938|unsuper_loss: 0.0
average reward score: -3.947265625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.28%) |Training time=0.81s (31.74%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 877|ppo_ep: 1|act_loss: 0.01018524169921875|cri_loss: 0.0010223388671875|unsuper_loss: 0.0
average reward score: -3.416015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.91%) |Training time=0.82s (32.14%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 878|ppo_ep: 1|act_loss: 0.0021953582763671875|cri_loss: 0.0004718303680419922|unsuper_loss: 0.0
average reward score: -6.7890625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.06%) |Training time=0.81s (31.96%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.58
[2023-07-01 08:44:49,297] [INFO] [logging.py:96:log_dist] [Rank 0] step=880, skipped=12, lr=[2.3945444660163493e-07, 2.3945444660163493e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:44:49,475] [INFO] [timer.py:215:stop] epoch=0/micro_step=880/global_step=880, RunningAvgSamplesPerSec=50.769397429373015, CurrSamplesPerSec=50.89804140475208, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:44:49,641] [INFO] [logging.py:96:log_dist] [Rank 0] step=880, skipped=12, lr=[1.240696614516243e-07, 1.240696614516243e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 879|ppo_ep: 1|act_loss: 0.00214385986328125|cri_loss: 0.0010223388671875|unsuper_loss: 0.0
average reward score: -4.98046875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.37%) |Training time=0.80s (31.70%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 880|ppo_ep: 1|act_loss: 0.00743865966796875|cri_loss: 0.0005936622619628906|unsuper_loss: 0.0
average reward score: -4.19140625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.50%) |Training time=0.80s (31.57%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.58
epoch: 0|step: 881|ppo_ep: 1|act_loss: -0.00836181640625|cri_loss: 0.009185791015625|unsuper_loss: 0.0
average reward score: -2.8359375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.26%) |Training time=0.81s (31.78%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 882|ppo_ep: 1|act_loss: -0.00812530517578125|cri_loss: 0.0003821849822998047|unsuper_loss: 0.0
average reward score: -4.09765625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.18%) |Training time=0.81s (31.88%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 883|ppo_ep: 1|act_loss: -0.0146636962890625|cri_loss: 0.002384185791015625|unsuper_loss: 0.0
average reward score: -4.5546875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.38%) |Training time=0.81s (31.71%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 884|ppo_ep: 1|act_loss: 0.0037288665771484375|cri_loss: 0.0004286766052246094|unsuper_loss: 0.0
average reward score: -7.01171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.12%) |Training time=0.81s (31.97%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 885|ppo_ep: 1|act_loss: -0.006732940673828125|cri_loss: 0.0052947998046875|unsuper_loss: 0.0
average reward score: -4.6640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.34%) |Training time=0.80s (31.68%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 886|ppo_ep: 1|act_loss: -0.0012540817260742188|cri_loss: 0.0004513263702392578|unsuper_loss: 0.0
average reward score: -5.1171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.14%) |Training time=0.81s (31.84%) |Others=0.23 (9.03%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 887|ppo_ep: 1|act_loss: -0.003856658935546875|cri_loss: 0.0034160614013671875|unsuper_loss: 0.0
average reward score: -3.458984375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.96%) |Training time=0.81s (31.99%) |Others=0.23 (9.05%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 888|ppo_ep: 1|act_loss: 0.0005216598510742188|cri_loss: 9.97781753540039e-05|unsuper_loss: 0.0
average reward score: -6.09375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.97%) |Training time=0.82s (32.08%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
[2023-07-01 08:45:14,697] [INFO] [logging.py:96:log_dist] [Rank 0] step=890, skipped=12, lr=[1.8734745049808622e-07, 1.8734745049808622e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:45:14,879] [INFO] [timer.py:215:stop] epoch=0/micro_step=890/global_step=890, RunningAvgSamplesPerSec=50.76462231494663, CurrSamplesPerSec=49.69917014829651, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:45:15,044] [INFO] [logging.py:96:log_dist] [Rank 0] step=890, skipped=12, lr=[9.707121787465607e-08, 9.707121787465607e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 889|ppo_ep: 1|act_loss: 0.0171966552734375|cri_loss: 0.0038604736328125|unsuper_loss: 0.0
average reward score: -5.51171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.96%) |Training time=0.82s (32.14%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 890|ppo_ep: 1|act_loss: 0.039337158203125|cri_loss: 0.008758544921875|unsuper_loss: 0.0
average reward score: -4.7265625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.24%) |Training time=0.81s (31.79%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.58
epoch: 0|step: 891|ppo_ep: 1|act_loss: 0.002552032470703125|cri_loss: 0.00045037269592285156|unsuper_loss: 0.0
average reward score: -4.2109375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.01%) |Training time=0.81s (32.02%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.58
epoch: 0|step: 892|ppo_ep: 1|act_loss: 0.007457733154296875|cri_loss: 0.0010309219360351562|unsuper_loss: 0.0
average reward score: -5.6171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.75%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 893|ppo_ep: 1|act_loss: 0.007785797119140625|cri_loss: 0.0020313262939453125|unsuper_loss: 0.0
average reward score: -4.9453125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.08%) |Training time=0.81s (32.00%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.58
epoch: 0|step: 894|ppo_ep: 1|act_loss: 0.03369140625|cri_loss: 0.005992889404296875|unsuper_loss: 0.0
average reward score: -3.333984375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.43%) |Training time=0.80s (31.65%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.58
epoch: 0|step: 895|ppo_ep: 1|act_loss: 0.0099334716796875|cri_loss: 0.0010776519775390625|unsuper_loss: 0.0
average reward score: -5.23828125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.13%) |Training time=0.81s (31.93%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.58
epoch: 0|step: 896|ppo_ep: 1|act_loss: 0.032257080078125|cri_loss: 0.0107879638671875|unsuper_loss: 0.0
average reward score: -3.556640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.26%) |Training time=0.81s (31.77%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.58
epoch: 0|step: 897|ppo_ep: 1|act_loss: 0.004215240478515625|cri_loss: 0.0020122528076171875|unsuper_loss: 0.0
average reward score: -6.625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.40%) |Training time=0.80s (31.68%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59
epoch: 0|step: 898|ppo_ep: 1|act_loss: -0.006298065185546875|cri_loss: 0.0010805130004882812|unsuper_loss: 0.0
average reward score: -4.6015625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.41%) |Training time=0.80s (31.71%) |Others=0.22 (8.88%)|CurSamplesPerSec=12.65 |AvgSamplesPerSec=12.59
[2023-07-01 08:45:40,082] [INFO] [logging.py:96:log_dist] [Rank 0] step=900, skipped=12, lr=[1.4151573622732538e-07, 1.4151573622732538e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:45:40,260] [INFO] [timer.py:215:stop] epoch=0/micro_step=900/global_step=900, RunningAvgSamplesPerSec=50.76217522860796, CurrSamplesPerSec=50.62653490921244, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:45:40,425] [INFO] [logging.py:96:log_dist] [Rank 0] step=900, skipped=12, lr=[7.332421566182663e-08, 7.332421566182663e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 899|ppo_ep: 1|act_loss: 0.004848480224609375|cri_loss: 0.00037550926208496094|unsuper_loss: 0.0
average reward score: -4.27734375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.29%) |Training time=0.81s (31.79%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.59
epoch: 0|step: 900|ppo_ep: 1|act_loss: 0.0011148452758789062|cri_loss: 0.00035572052001953125|unsuper_loss: 0.0
average reward score: -5.921875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.96%) |Training time=0.82s (32.07%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.59
epoch: 0|step: 901|ppo_ep: 1|act_loss: -0.00963592529296875|cri_loss: 0.0016508102416992188|unsuper_loss: 0.0
average reward score: -4.36328125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.79%) |Training time=0.82s (32.22%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.59
epoch: 0|step: 902|ppo_ep: 1|act_loss: -1.7344951629638672e-05|cri_loss: 0.0008778572082519531|unsuper_loss: 0.0
average reward score: -4.42578125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.92%) |Training time=0.82s (32.18%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.59
epoch: 0|step: 903|ppo_ep: 1|act_loss: -0.006374359130859375|cri_loss: 0.0014057159423828125|unsuper_loss: 0.0
average reward score: -3.521484375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.17%) |Training time=0.81s (31.88%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.59
epoch: 0|step: 904|ppo_ep: 1|act_loss: 0.0097808837890625|cri_loss: 0.0011816024780273438|unsuper_loss: 0.0
average reward score: -4.6875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.07%) |Training time=0.81s (31.93%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.59
epoch: 0|step: 905|ppo_ep: 1|act_loss: 0.004337310791015625|cri_loss: 0.0012025833129882812|unsuper_loss: 0.0
average reward score: -5.5078125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.95%) |Training time=0.82s (32.10%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.59
epoch: 0|step: 906|ppo_ep: 1|act_loss: -0.0035610198974609375|cri_loss: 0.0003063678741455078|unsuper_loss: 0.0
average reward score: -3.875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.88%) |Training time=0.82s (32.17%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.59
epoch: 0|step: 907|ppo_ep: 1|act_loss: 0.001010894775390625|cri_loss: 0.00040459632873535156|unsuper_loss: 0.0
average reward score: -5.0390625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.04%) |Training time=0.81s (32.05%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.59
epoch: 0|step: 908|ppo_ep: 1|act_loss: 0.03399658203125|cri_loss: 0.010284423828125|unsuper_loss: 0.0
average reward score: -3.48828125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.94%) |Training time=0.81s (32.07%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59
[2023-07-01 08:46:05,508] [INFO] [logging.py:96:log_dist] [Rank 0] step=910, skipped=12, lr=[1.0202131941489858e-07, 1.0202131941489858e-07], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:46:05,687] [INFO] [timer.py:215:stop] epoch=0/micro_step=910/global_step=910, RunningAvgSamplesPerSec=50.75345093850162, CurrSamplesPerSec=50.96021034446754, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:46:05,852] [INFO] [logging.py:96:log_dist] [Rank 0] step=910, skipped=12, lr=[5.2860787261605485e-08, 5.2860787261605485e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 909|ppo_ep: 1|act_loss: -0.00942230224609375|cri_loss: 0.0011739730834960938|unsuper_loss: 0.0
average reward score: -4.11328125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.45%) |Training time=0.80s (31.60%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59
epoch: 0|step: 910|ppo_ep: 1|act_loss: 4.76837158203125e-05|cri_loss: 0.0003173351287841797|unsuper_loss: 0.0
average reward score: -4.015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.79%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59
epoch: 0|step: 911|ppo_ep: 1|act_loss: 0.0053863525390625|cri_loss: 0.000133514404296875|unsuper_loss: 0.0
average reward score: -4.31640625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.15%) |Training time=0.81s (31.97%) |Others=0.23 (8.87%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.59
epoch: 0|step: 912|ppo_ep: 1|act_loss: 0.002010345458984375|cri_loss: 0.0004036426544189453|unsuper_loss: 0.0
average reward score: -5.0234375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.23%) |Training time=0.81s (31.86%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.59
epoch: 0|step: 913|ppo_ep: 1|act_loss: 0.01419830322265625|cri_loss: 0.0026874542236328125|unsuper_loss: 0.0
average reward score: -6.05859375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.15%) |Training time=0.81s (31.93%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59
epoch: 0|step: 914|ppo_ep: 1|act_loss: -0.0244293212890625|cri_loss: 0.00307464599609375|unsuper_loss: 0.0
average reward score: -5.015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.11%) |Training time=0.81s (31.96%) |Others=0.23 (8.92%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.59
epoch: 0|step: 915|ppo_ep: 1|act_loss: 0.006687164306640625|cri_loss: 0.00048828125|unsuper_loss: 0.0
average reward score: -2.95703125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.99%) |Training time=0.82s (32.07%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.59
epoch: 0|step: 916|ppo_ep: 1|act_loss: 0.01415252685546875|cri_loss: 0.001434326171875|unsuper_loss: 0.0
average reward score: -3.84375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.15%) |Training time=0.81s (31.90%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59
epoch: 0|step: 917|ppo_ep: 1|act_loss: -0.004100799560546875|cri_loss: 0.0016088485717773438|unsuper_loss: 0.0
average reward score: -3.88671875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.13%) |Training time=0.81s (31.92%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.59
epoch: 0|step: 918|ppo_ep: 1|act_loss: -0.0036163330078125|cri_loss: 0.0006704330444335938|unsuper_loss: 0.0
average reward score: -4.64453125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.16%) |Training time=0.81s (31.83%) |Others=0.23 (9.01%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59
[2023-07-01 08:46:30,911] [INFO] [logging.py:96:log_dist] [Rank 0] step=920, skipped=12, lr=[6.891764058781328e-08, 6.891764058781328e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:46:31,094] [INFO] [timer.py:215:stop] epoch=0/micro_step=920/global_step=920, RunningAvgSamplesPerSec=50.74695745130744, CurrSamplesPerSec=49.34110677850644, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:46:31,260] [INFO] [logging.py:96:log_dist] [Rank 0] step=920, skipped=12, lr=[3.5708622066224494e-08, 3.5708622066224494e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 919|ppo_ep: 1|act_loss: 0.004119873046875|cri_loss: 0.0008497238159179688|unsuper_loss: 0.0
average reward score: -4.56640625
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.49s (58.68%) |Training time=0.82s (32.31%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.59
epoch: 0|step: 920|ppo_ep: 1|act_loss: 0.00536346435546875|cri_loss: 0.0008797645568847656|unsuper_loss: 0.0
average reward score: -3.58203125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.88%) |Training time=0.82s (32.14%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.59
epoch: 0|step: 921|ppo_ep: 1|act_loss: 0.001399993896484375|cri_loss: 0.001434326171875|unsuper_loss: 0.0
average reward score: -5.1015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.10%) |Training time=0.81s (31.95%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.59
epoch: 0|step: 922|ppo_ep: 1|act_loss: -0.00794219970703125|cri_loss: 0.0005536079406738281|unsuper_loss: 0.0
average reward score: -3.990234375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.28%) |Training time=0.81s (31.79%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59
epoch: 0|step: 923|ppo_ep: 1|act_loss: -0.0015001296997070312|cri_loss: 0.0007758140563964844|unsuper_loss: 0.0
average reward score: -3.361328125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.20%) |Training time=0.81s (31.84%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.59
epoch: 0|step: 924|ppo_ep: 1|act_loss: 0.006153106689453125|cri_loss: 0.00045418739318847656|unsuper_loss: 0.0
average reward score: -4.60546875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.36%) |Training time=0.80s (31.69%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59
epoch: 0|step: 925|ppo_ep: 1|act_loss: -0.0109405517578125|cri_loss: 0.001224517822265625|unsuper_loss: 0.0
average reward score: -4.4609375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.29%) |Training time=0.81s (31.81%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.59
epoch: 0|step: 926|ppo_ep: 1|act_loss: -0.0303497314453125|cri_loss: 0.02313232421875|unsuper_loss: 0.0
average reward score: -3.689453125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.31%) |Training time=0.80s (31.76%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.59
epoch: 0|step: 927|ppo_ep: 1|act_loss: -0.004962921142578125|cri_loss: 0.000957489013671875|unsuper_loss: 0.0
average reward score: -4.859375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.30%) |Training time=0.80s (31.72%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.59
epoch: 0|step: 928|ppo_ep: 1|act_loss: -0.00824737548828125|cri_loss: 0.00037479400634765625|unsuper_loss: 0.0
average reward score: -5.828125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.15%) |Training time=0.81s (31.92%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59
[2023-07-01 08:46:56,300] [INFO] [logging.py:96:log_dist] [Rank 0] step=930, skipped=12, lr=[4.2249492863304246e-08, 4.2249492863304246e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:46:56,478] [INFO] [timer.py:215:stop] epoch=0/micro_step=930/global_step=930, RunningAvgSamplesPerSec=50.74082119613501, CurrSamplesPerSec=48.05202967948661, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:46:56,644] [INFO] [logging.py:96:log_dist] [Rank 0] step=930, skipped=12, lr=[2.1890928944717228e-08, 2.1890928944717228e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 929|ppo_ep: 1|act_loss: -0.0035228729248046875|cri_loss: 0.00042128562927246094|unsuper_loss: 0.0
average reward score: -5.26171875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.47s (57.99%) |Training time=0.84s (33.08%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59
epoch: 0|step: 930|ppo_ep: 1|act_loss: -0.007450103759765625|cri_loss: 0.0005202293395996094|unsuper_loss: 0.0
average reward score: -3.787109375
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.20%) |Training time=0.81s (31.89%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.59
epoch: 0|step: 931|ppo_ep: 1|act_loss: -0.0017108917236328125|cri_loss: 0.0004513263702392578|unsuper_loss: 0.0
average reward score: -4.46484375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.23%) |Training time=0.81s (31.87%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.59
epoch: 0|step: 932|ppo_ep: 1|act_loss: 0.00266265869140625|cri_loss: 0.0014085769653320312|unsuper_loss: 0.0
average reward score: -5.07421875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.98%) |Training time=0.82s (32.02%) |Others=0.23 (9.00%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.59
epoch: 0|step: 933|ppo_ep: 1|act_loss: -0.0191192626953125|cri_loss: 0.0049591064453125|unsuper_loss: 0.0
average reward score: -3.88671875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.97%) |Training time=0.82s (32.07%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.57 |AvgSamplesPerSec=12.59
epoch: 0|step: 934|ppo_ep: 1|act_loss: 0.004848480224609375|cri_loss: 0.00034737586975097656|unsuper_loss: 0.0
average reward score: -4.71484375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (31.94%) |Others=0.23 (8.97%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.59
epoch: 0|step: 935|ppo_ep: 1|act_loss: 0.002735137939453125|cri_loss: 0.00042557716369628906|unsuper_loss: 0.0
average reward score: -6.6796875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.27%) |Training time=0.81s (31.79%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59
epoch: 0|step: 936|ppo_ep: 1|act_loss: -0.00638580322265625|cri_loss: 0.0005917549133300781|unsuper_loss: 0.0
average reward score: -3.703125
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.92%) |Training time=0.82s (32.17%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.59
epoch: 0|step: 937|ppo_ep: 1|act_loss: -0.0035610198974609375|cri_loss: 0.000263214111328125|unsuper_loss: 0.0
average reward score: -5.13671875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.95%) |Training time=0.82s (32.11%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.56 |AvgSamplesPerSec=12.59
epoch: 0|step: 938|ppo_ep: 1|act_loss: 0.04254150390625|cri_loss: 0.01024627685546875|unsuper_loss: 0.0
average reward score: -4.6015625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.94%) |Training time=0.82s (32.12%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.58 |AvgSamplesPerSec=12.59
[2023-07-01 08:47:21,724] [INFO] [logging.py:96:log_dist] [Rank 0] step=940, skipped=12, lr=[2.205296133854851e-08, 2.205296133854851e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:47:21,905] [INFO] [timer.py:215:stop] epoch=0/micro_step=940/global_step=940, RunningAvgSamplesPerSec=50.731099212683596, CurrSamplesPerSec=48.306048579676634, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:47:22,069] [INFO] [logging.py:96:log_dist] [Rank 0] step=940, skipped=12, lr=[1.142640483862617e-08, 1.142640483862617e-08], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 939|ppo_ep: 1|act_loss: -0.0245208740234375|cri_loss: 0.0036296844482421875|unsuper_loss: 0.0
average reward score: -6.44921875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.48s (58.17%) |Training time=0.84s (32.95%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.59
epoch: 0|step: 940|ppo_ep: 1|act_loss: 0.01275634765625|cri_loss: 0.0016412734985351562|unsuper_loss: 0.0
average reward score: -4.328125
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.51%) |Training time=0.80s (31.54%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.64 |AvgSamplesPerSec=12.59
epoch: 0|step: 941|ppo_ep: 1|act_loss: -0.0200958251953125|cri_loss: 0.002391815185546875|unsuper_loss: 0.0
average reward score: -6.6015625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.36%) |Training time=0.80s (31.68%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.59
epoch: 0|step: 942|ppo_ep: 1|act_loss: -0.0009608268737792969|cri_loss: 0.00025773048400878906|unsuper_loss: 0.0
average reward score: -3.0625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.31%) |Training time=0.81s (31.76%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.59
epoch: 0|step: 943|ppo_ep: 1|act_loss: -0.00675201416015625|cri_loss: 0.00072479248046875|unsuper_loss: 0.0
average reward score: -5.109375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.09%) |Training time=0.81s (31.97%) |Others=0.23 (8.93%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.59
epoch: 0|step: 944|ppo_ep: 1|act_loss: 0.0015001296997070312|cri_loss: 0.0011434555053710938|unsuper_loss: 0.0
average reward score: -4.53515625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.21%) |Training time=0.81s (31.91%) |Others=0.23 (8.88%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.59
epoch: 0|step: 945|ppo_ep: 1|act_loss: -0.002216339111328125|cri_loss: 0.0010166168212890625|unsuper_loss: 0.0
average reward score: -4.390625
-------------------------------------------------------------------------------------
|E2E latency=2.53s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.23%) |Training time=0.81s (31.87%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.63 |AvgSamplesPerSec=12.59
epoch: 0|step: 946|ppo_ep: 1|act_loss: -0.0008983612060546875|cri_loss: 0.0008797645568847656|unsuper_loss: 0.0
average reward score: -5.04296875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.17%) |Training time=0.81s (31.88%) |Others=0.23 (8.96%)|CurSamplesPerSec=12.62 |AvgSamplesPerSec=12.59
epoch: 0|step: 947|ppo_ep: 1|act_loss: 0.004314422607421875|cri_loss: 0.00016379356384277344|unsuper_loss: 0.0
average reward score: -4.6953125
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.17%) |Training time=0.81s (31.89%) |Others=0.23 (8.94%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59
epoch: 0|step: 948|ppo_ep: 1|act_loss: -0.00235748291015625|cri_loss: 0.0015211105346679688|unsuper_loss: 0.0
average reward score: -5.07421875
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (58.98%) |Training time=0.82s (32.12%) |Others=0.23 (8.90%)|CurSamplesPerSec=12.59 |AvgSamplesPerSec=12.59
[2023-07-01 08:47:47,089] [INFO] [logging.py:96:log_dist] [Rank 0] step=950, skipped=12, lr=[8.355374263348676e-09, 8.355374263348676e-09], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-07-01 08:47:47,270] [INFO] [timer.py:215:stop] epoch=0/micro_step=950/global_step=950, RunningAvgSamplesPerSec=50.72813976244511, CurrSamplesPerSec=50.11948595103353, MemAllocated=12.09GB, MaxMemAllocated=21.86GB
[2023-07-01 08:47:47,435] [INFO] [logging.py:96:log_dist] [Rank 0] step=950, skipped=12, lr=[4.329209462874961e-09, 4.329209462874961e-09], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 949|ppo_ep: 1|act_loss: -0.0032787322998046875|cri_loss: 0.0004875659942626953|unsuper_loss: 0.0
average reward score: -4.50390625
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.50s (59.07%) |Training time=0.81s (32.01%) |Others=0.23 (8.91%)|CurSamplesPerSec=12.61 |AvgSamplesPerSec=12.59
epoch: 0|step: 950|ppo_ep: 1|act_loss: -5.3882598876953125e-05|cri_loss: 0.0008039474487304688|unsuper_loss: 0.0
average reward score: -4.34375
-------------------------------------------------------------------------------------
|E2E latency=2.54s |Gather latency=0.00s (0.00%) |Generate time=1.48s (58.35%) |Training time=0.83s (32.66%) |Others=0.23 (8.98%)|CurSamplesPerSec=12.60 |AvgSamplesPerSec=12.59
epoch: 0|step: 951|ppo_ep: 1|act_loss: 0.00273895263671875|cri_loss: 0.0008635520935058594|unsuper_loss: 0.0
average reward score: -4.04296875
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.52s (59.35%) |Training time=0.81s (31.70%) |Others=0.23 (8.95%)|CurSamplesPerSec=12.53 |AvgSamplesPerSec=12.59
epoch: 0|step: 952|ppo_ep: 1|act_loss: -0.03765869140625|cri_loss: 0.01186370849609375|unsuper_loss: 0.0
average reward score: -4.15234375
-------------------------------------------------------------------------------------
|E2E latency=2.55s |Gather latency=0.00s (0.00%) |Generate time=1.51s (59.39%) |Training time=0.81s (31.62%) |Others=0.23 (8.99%)|CurSamplesPerSec=12.55 |AvgSamplesPerSec=12.59
epoch: 0|step: 953|ppo_ep: 1|act_loss: -0.0016870498657226562|cri_loss: 0.001628875732421875|unsuper_loss: 0.0
average reward score: -3.5234375
-------------------------------------------------------------------------------------
saving model ...saving model ...saving model ...saving model ...saving model ...saving model ...saving model ...
saving model ...


[2023-07-01 08:48:00,350] [INFO] [launch.py:346:main] Process 5958 exits successfully.
[2023-07-01 08:48:00,350] [INFO] [launch.py:346:main] Process 5960 exits successfully.
[2023-07-01 08:48:00,350] [INFO] [launch.py:346:main] Process 5956 exits successfully.
[2023-07-01 08:48:00,350] [INFO] [launch.py:346:main] Process 5957 exits successfully.
[2023-07-01 08:48:00,350] [INFO] [launch.py:346:main] Process 5962 exits successfully.
[2023-07-01 08:48:01,351] [INFO] [launch.py:346:main] Process 5959 exits successfully.
[2023-07-01 08:48:01,351] [INFO] [launch.py:346:main] Process 5961 exits successfully.
[2023-07-01 08:48:09,360] [INFO] [launch.py:346:main] Process 5955 exits successfully.