[2023-09-21 12:24:17,814][72173] Saving configuration to ./train_dir/Walker/config.json... [2023-09-21 12:24:17,880][72173] Rollout worker 0 uses device cpu [2023-09-21 12:24:17,881][72173] Rollout worker 1 uses device cpu [2023-09-21 12:24:17,881][72173] Rollout worker 2 uses device cpu [2023-09-21 12:24:17,882][72173] Rollout worker 3 uses device cpu [2023-09-21 12:24:17,882][72173] Rollout worker 4 uses device cpu [2023-09-21 12:24:17,883][72173] Rollout worker 5 uses device cpu [2023-09-21 12:24:17,883][72173] Rollout worker 6 uses device cpu [2023-09-21 12:24:17,884][72173] Rollout worker 7 uses device cpu [2023-09-21 12:24:17,884][72173] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1 [2023-09-21 12:24:17,930][72173] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-21 12:24:17,930][72173] InferenceWorker_p0-w0: min num requests: 2 [2023-09-21 12:24:17,954][72173] Starting all processes... [2023-09-21 12:24:17,955][72173] Starting process learner_proc0 [2023-09-21 12:24:17,958][72173] Starting all processes... [2023-09-21 12:24:17,961][72173] Starting process inference_proc0-0 [2023-09-21 12:24:17,962][72173] Starting process rollout_proc0 [2023-09-21 12:24:17,962][72173] Starting process rollout_proc1 [2023-09-21 12:24:17,962][72173] Starting process rollout_proc2 [2023-09-21 12:24:17,962][72173] Starting process rollout_proc3 [2023-09-21 12:24:17,963][72173] Starting process rollout_proc4 [2023-09-21 12:24:17,963][72173] Starting process rollout_proc5 [2023-09-21 12:24:17,963][72173] Starting process rollout_proc6 [2023-09-21 12:24:17,963][72173] Starting process rollout_proc7 [2023-09-21 12:24:19,739][72824] Worker 7 uses CPU cores [28, 29, 30, 31] [2023-09-21 12:24:19,747][72810] Worker 0 uses CPU cores [0, 1, 2, 3] [2023-09-21 12:24:19,748][72809] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-21 12:24:19,748][72809] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-09-21 12:24:19,765][72823] Worker 6 uses CPU cores [24, 25, 26, 27] [2023-09-21 12:24:19,766][72809] Num visible devices: 1 [2023-09-21 12:24:19,794][72815] Worker 3 uses CPU cores [12, 13, 14, 15] [2023-09-21 12:24:19,802][72822] Worker 5 uses CPU cores [20, 21, 22, 23] [2023-09-21 12:24:19,813][72813] Worker 4 uses CPU cores [16, 17, 18, 19] [2023-09-21 12:24:19,816][72816] Worker 2 uses CPU cores [8, 9, 10, 11] [2023-09-21 12:24:19,977][72817] Worker 1 uses CPU cores [4, 5, 6, 7] [2023-09-21 12:24:20,023][72796] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-21 12:24:20,023][72796] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-09-21 12:24:20,042][72796] Num visible devices: 1 [2023-09-21 12:24:20,064][72796] Starting seed is not provided [2023-09-21 12:24:20,064][72796] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-21 12:24:20,065][72796] Initializing actor-critic model on device cuda:0 [2023-09-21 12:24:20,065][72796] RunningMeanStd input shape: (17,) [2023-09-21 12:24:20,065][72796] RunningMeanStd input shape: (1,) [2023-09-21 12:24:20,145][72796] Created Actor Critic model with architecture: [2023-09-21 12:24:20,146][72796] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): MlpEncoder( (mlp_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=Tanh) (2): RecursiveScriptModule(original_name=Linear) (3): RecursiveScriptModule(original_name=Tanh) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=64, out_features=1, bias=True) (action_parameterization): ActionParameterizationContinuousNonAdaptiveStddev( (distribution_linear): Linear(in_features=64, out_features=6, bias=True) ) ) [2023-09-21 12:24:20,704][72796] Using optimizer [2023-09-21 12:24:20,705][72796] No checkpoints found [2023-09-21 12:24:20,705][72796] Did not load from checkpoint, starting from scratch! [2023-09-21 12:24:20,705][72796] Initialized policy 0 weights for model version 0 [2023-09-21 12:24:20,706][72796] LearnerWorker_p0 finished initialization! [2023-09-21 12:24:20,707][72796] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-21 12:24:21,239][72809] RunningMeanStd input shape: (17,) [2023-09-21 12:24:21,239][72809] RunningMeanStd input shape: (1,) [2023-09-21 12:24:21,271][72173] Inference worker 0-0 is ready! [2023-09-21 12:24:21,272][72173] All inference workers are ready! Signal rollout workers to start! [2023-09-21 12:24:21,365][72817] Decorrelating experience for 0 frames... [2023-09-21 12:24:21,366][72817] Decorrelating experience for 64 frames... [2023-09-21 12:24:21,373][72824] Decorrelating experience for 0 frames... [2023-09-21 12:24:21,374][72824] Decorrelating experience for 64 frames... [2023-09-21 12:24:21,378][72815] Decorrelating experience for 0 frames... [2023-09-21 12:24:21,379][72810] Decorrelating experience for 0 frames... [2023-09-21 12:24:21,379][72815] Decorrelating experience for 64 frames... [2023-09-21 12:24:21,379][72810] Decorrelating experience for 64 frames... [2023-09-21 12:24:21,380][72822] Decorrelating experience for 0 frames... [2023-09-21 12:24:21,380][72822] Decorrelating experience for 64 frames... [2023-09-21 12:24:21,384][72817] Decorrelating experience for 128 frames... [2023-09-21 12:24:21,385][72813] Decorrelating experience for 0 frames... [2023-09-21 12:24:21,386][72813] Decorrelating experience for 64 frames... [2023-09-21 12:24:21,392][72824] Decorrelating experience for 128 frames... [2023-09-21 12:24:21,397][72815] Decorrelating experience for 128 frames... [2023-09-21 12:24:21,398][72810] Decorrelating experience for 128 frames... [2023-09-21 12:24:21,404][72813] Decorrelating experience for 128 frames... [2023-09-21 12:24:21,411][72822] Decorrelating experience for 128 frames... [2023-09-21 12:24:21,420][72817] Decorrelating experience for 192 frames... [2023-09-21 12:24:21,424][72823] Decorrelating experience for 0 frames... [2023-09-21 12:24:21,424][72816] Decorrelating experience for 0 frames... [2023-09-21 12:24:21,424][72823] Decorrelating experience for 64 frames... [2023-09-21 12:24:21,424][72816] Decorrelating experience for 64 frames... [2023-09-21 12:24:21,428][72824] Decorrelating experience for 192 frames... [2023-09-21 12:24:21,433][72815] Decorrelating experience for 192 frames... [2023-09-21 12:24:21,435][72810] Decorrelating experience for 192 frames... [2023-09-21 12:24:21,441][72813] Decorrelating experience for 192 frames... [2023-09-21 12:24:21,450][72822] Decorrelating experience for 192 frames... [2023-09-21 12:24:21,457][72816] Decorrelating experience for 128 frames... [2023-09-21 12:24:21,457][72823] Decorrelating experience for 128 frames... [2023-09-21 12:24:21,489][72817] Decorrelating experience for 256 frames... [2023-09-21 12:24:21,496][72824] Decorrelating experience for 256 frames... [2023-09-21 12:24:21,502][72815] Decorrelating experience for 256 frames... [2023-09-21 12:24:21,504][72810] Decorrelating experience for 256 frames... [2023-09-21 12:24:21,510][72813] Decorrelating experience for 256 frames... [2023-09-21 12:24:21,518][72822] Decorrelating experience for 256 frames... [2023-09-21 12:24:21,521][72816] Decorrelating experience for 192 frames... [2023-09-21 12:24:21,521][72823] Decorrelating experience for 192 frames... [2023-09-21 12:24:21,562][72817] Decorrelating experience for 320 frames... [2023-09-21 12:24:21,566][72824] Decorrelating experience for 320 frames... [2023-09-21 12:24:21,574][72815] Decorrelating experience for 320 frames... [2023-09-21 12:24:21,575][72810] Decorrelating experience for 320 frames... [2023-09-21 12:24:21,582][72813] Decorrelating experience for 320 frames... [2023-09-21 12:24:21,589][72822] Decorrelating experience for 320 frames... [2023-09-21 12:24:21,635][72823] Decorrelating experience for 256 frames... [2023-09-21 12:24:21,635][72816] Decorrelating experience for 256 frames... [2023-09-21 12:24:21,650][72817] Decorrelating experience for 384 frames... [2023-09-21 12:24:21,654][72824] Decorrelating experience for 384 frames... [2023-09-21 12:24:21,664][72815] Decorrelating experience for 384 frames... [2023-09-21 12:24:21,672][72810] Decorrelating experience for 384 frames... [2023-09-21 12:24:21,673][72813] Decorrelating experience for 384 frames... [2023-09-21 12:24:21,691][72822] Decorrelating experience for 384 frames... [2023-09-21 12:24:21,705][72823] Decorrelating experience for 320 frames... [2023-09-21 12:24:21,705][72816] Decorrelating experience for 320 frames... [2023-09-21 12:24:21,760][72824] Decorrelating experience for 448 frames... [2023-09-21 12:24:21,760][72817] Decorrelating experience for 448 frames... [2023-09-21 12:24:21,773][72815] Decorrelating experience for 448 frames... [2023-09-21 12:24:21,779][72810] Decorrelating experience for 448 frames... [2023-09-21 12:24:21,783][72813] Decorrelating experience for 448 frames... [2023-09-21 12:24:21,792][72823] Decorrelating experience for 384 frames... [2023-09-21 12:24:21,796][72816] Decorrelating experience for 384 frames... [2023-09-21 12:24:21,798][72822] Decorrelating experience for 448 frames... [2023-09-21 12:24:21,899][72823] Decorrelating experience for 448 frames... [2023-09-21 12:24:21,903][72816] Decorrelating experience for 448 frames... [2023-09-21 12:24:24,157][72173] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4096. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:24:24,158][72173] Avg episode reward: [(0, '1.696')] [2023-09-21 12:24:28,134][72809] Updated weights for policy 0, policy_version 80 (0.0016) [2023-09-21 12:24:29,157][72173] Fps is (10 sec: 9011.2, 60 sec: 9011.2, 300 sec: 9011.2). Total num frames: 49152. Throughput: 0: 5971.2. Samples: 29856. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 12:24:29,157][72173] Avg episode reward: [(0, '254.178')] [2023-09-21 12:24:29,209][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000000104_53248.pth... [2023-09-21 12:24:31,890][72809] Updated weights for policy 0, policy_version 160 (0.0015) [2023-09-21 12:24:34,157][72173] Fps is (10 sec: 9830.2, 60 sec: 9830.2, 300 sec: 9830.2). Total num frames: 102400. Throughput: 0: 9698.2. Samples: 96984. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:24:34,158][72173] Avg episode reward: [(0, '353.703')] [2023-09-21 12:24:34,201][72796] Saving new best policy, reward=353.703! [2023-09-21 12:24:35,639][72809] Updated weights for policy 0, policy_version 240 (0.0014) [2023-09-21 12:24:37,921][72173] Heartbeat connected on Batcher_0 [2023-09-21 12:24:37,926][72173] Heartbeat connected on LearnerWorker_p0 [2023-09-21 12:24:37,932][72173] Heartbeat connected on InferenceWorker_p0-w0 [2023-09-21 12:24:37,937][72173] Heartbeat connected on RolloutWorker_w0 [2023-09-21 12:24:37,940][72173] Heartbeat connected on RolloutWorker_w1 [2023-09-21 12:24:37,942][72173] Heartbeat connected on RolloutWorker_w2 [2023-09-21 12:24:37,945][72173] Heartbeat connected on RolloutWorker_w3 [2023-09-21 12:24:37,948][72173] Heartbeat connected on RolloutWorker_w4 [2023-09-21 12:24:37,949][72173] Heartbeat connected on RolloutWorker_w5 [2023-09-21 12:24:37,954][72173] Heartbeat connected on RolloutWorker_w6 [2023-09-21 12:24:37,957][72173] Heartbeat connected on RolloutWorker_w7 [2023-09-21 12:24:39,157][72173] Fps is (10 sec: 11059.1, 60 sec: 10376.4, 300 sec: 10376.4). Total num frames: 159744. Throughput: 0: 8738.1. Samples: 131072. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:24:39,158][72173] Avg episode reward: [(0, '359.691')] [2023-09-21 12:24:39,159][72796] Saving new best policy, reward=359.691! [2023-09-21 12:24:39,330][72809] Updated weights for policy 0, policy_version 320 (0.0015) [2023-09-21 12:24:43,194][72809] Updated weights for policy 0, policy_version 400 (0.0015) [2023-09-21 12:24:44,157][72173] Fps is (10 sec: 11059.1, 60 sec: 10444.7, 300 sec: 10444.7). Total num frames: 212992. Throughput: 0: 9821.7. Samples: 196436. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 12:24:44,158][72173] Avg episode reward: [(0, '386.264')] [2023-09-21 12:24:44,163][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000000416_212992.pth... [2023-09-21 12:24:44,172][72796] Saving new best policy, reward=386.264! [2023-09-21 12:24:46,972][72809] Updated weights for policy 0, policy_version 480 (0.0014) [2023-09-21 12:24:49,157][72173] Fps is (10 sec: 10649.5, 60 sec: 10485.7, 300 sec: 10485.7). Total num frames: 266240. Throughput: 0: 10442.6. Samples: 261068. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 12:24:49,158][72173] Avg episode reward: [(0, '486.102')] [2023-09-21 12:24:49,159][72796] Saving new best policy, reward=486.102! [2023-09-21 12:24:50,799][72809] Updated weights for policy 0, policy_version 560 (0.0013) [2023-09-21 12:24:54,157][72173] Fps is (10 sec: 10649.8, 60 sec: 10513.0, 300 sec: 10513.0). Total num frames: 319488. Throughput: 0: 9755.4. Samples: 292664. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 12:24:54,158][72173] Avg episode reward: [(0, '690.231')] [2023-09-21 12:24:54,159][72796] Saving new best policy, reward=690.231! [2023-09-21 12:24:54,706][72809] Updated weights for policy 0, policy_version 640 (0.0013) [2023-09-21 12:24:58,582][72809] Updated weights for policy 0, policy_version 720 (0.0013) [2023-09-21 12:24:59,157][72173] Fps is (10 sec: 10649.6, 60 sec: 10532.5, 300 sec: 10532.5). Total num frames: 372736. Throughput: 0: 10158.1. Samples: 355536. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 12:24:59,158][72173] Avg episode reward: [(0, '849.437')] [2023-09-21 12:24:59,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000000728_372736.pth... [2023-09-21 12:24:59,172][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000000104_53248.pth [2023-09-21 12:24:59,173][72796] Saving new best policy, reward=849.437! [2023-09-21 12:25:02,358][72809] Updated weights for policy 0, policy_version 800 (0.0010) [2023-09-21 12:25:04,157][72173] Fps is (10 sec: 10649.7, 60 sec: 10547.2, 300 sec: 10547.2). Total num frames: 425984. Throughput: 0: 10469.5. Samples: 418780. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:25:04,158][72173] Avg episode reward: [(0, '1006.822')] [2023-09-21 12:25:04,158][72796] Saving new best policy, reward=1006.822! [2023-09-21 12:25:06,246][72809] Updated weights for policy 0, policy_version 880 (0.0012) [2023-09-21 12:25:09,157][72173] Fps is (10 sec: 10649.6, 60 sec: 10558.5, 300 sec: 10558.5). Total num frames: 479232. Throughput: 0: 10018.5. Samples: 450836. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:25:09,158][72173] Avg episode reward: [(0, '907.185')] [2023-09-21 12:25:09,919][72809] Updated weights for policy 0, policy_version 960 (0.0014) [2023-09-21 12:25:13,569][72809] Updated weights for policy 0, policy_version 1040 (0.0014) [2023-09-21 12:25:14,157][72173] Fps is (10 sec: 11059.0, 60 sec: 10649.6, 300 sec: 10649.6). Total num frames: 536576. Throughput: 0: 10824.7. Samples: 516968. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 12:25:14,158][72173] Avg episode reward: [(0, '969.459')] [2023-09-21 12:25:14,166][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000001048_536576.pth... [2023-09-21 12:25:14,172][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000000416_212992.pth [2023-09-21 12:25:17,581][72809] Updated weights for policy 0, policy_version 1120 (0.0014) [2023-09-21 12:25:19,157][72173] Fps is (10 sec: 10649.6, 60 sec: 10575.1, 300 sec: 10575.1). Total num frames: 585728. Throughput: 0: 10771.4. Samples: 581696. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:25:19,158][72173] Avg episode reward: [(0, '1166.586')] [2023-09-21 12:25:19,166][72796] Saving new best policy, reward=1166.586! [2023-09-21 12:25:21,584][72809] Updated weights for policy 0, policy_version 1200 (0.0012) [2023-09-21 12:25:24,157][72173] Fps is (10 sec: 10649.8, 60 sec: 10649.6, 300 sec: 10649.6). Total num frames: 643072. Throughput: 0: 10672.8. Samples: 611344. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 12:25:24,158][72173] Avg episode reward: [(0, '1277.066')] [2023-09-21 12:25:24,158][72796] Saving new best policy, reward=1277.066! [2023-09-21 12:25:24,969][72809] Updated weights for policy 0, policy_version 1280 (0.0014) [2023-09-21 12:25:28,671][72809] Updated weights for policy 0, policy_version 1360 (0.0015) [2023-09-21 12:25:29,157][72173] Fps is (10 sec: 11468.8, 60 sec: 10854.4, 300 sec: 10712.6). Total num frames: 700416. Throughput: 0: 10802.7. Samples: 682556. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:25:29,158][72173] Avg episode reward: [(0, '1234.020')] [2023-09-21 12:25:29,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000001368_700416.pth... [2023-09-21 12:25:29,171][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000000728_372736.pth [2023-09-21 12:25:32,574][72809] Updated weights for policy 0, policy_version 1440 (0.0015) [2023-09-21 12:25:34,157][72173] Fps is (10 sec: 11059.0, 60 sec: 10854.4, 300 sec: 10708.1). Total num frames: 753664. Throughput: 0: 10764.6. Samples: 745472. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:25:34,158][72173] Avg episode reward: [(0, '1477.758')] [2023-09-21 12:25:34,159][72796] Saving new best policy, reward=1477.758! [2023-09-21 12:25:36,372][72809] Updated weights for policy 0, policy_version 1520 (0.0015) [2023-09-21 12:25:39,157][72173] Fps is (10 sec: 10649.7, 60 sec: 10786.1, 300 sec: 10704.2). Total num frames: 806912. Throughput: 0: 10790.1. Samples: 778220. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:25:39,158][72173] Avg episode reward: [(0, '1355.504')] [2023-09-21 12:25:39,977][72809] Updated weights for policy 0, policy_version 1600 (0.0012) [2023-09-21 12:25:43,903][72809] Updated weights for policy 0, policy_version 1680 (0.0014) [2023-09-21 12:25:44,157][72173] Fps is (10 sec: 10649.5, 60 sec: 10786.1, 300 sec: 10700.8). Total num frames: 860160. Throughput: 0: 10846.4. Samples: 843624. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:25:44,158][72173] Avg episode reward: [(0, '1413.832')] [2023-09-21 12:25:44,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000001680_860160.pth... [2023-09-21 12:25:44,171][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000001048_536576.pth [2023-09-21 12:25:47,773][72809] Updated weights for policy 0, policy_version 1760 (0.0014) [2023-09-21 12:25:49,157][72173] Fps is (10 sec: 10649.6, 60 sec: 10786.2, 300 sec: 10697.8). Total num frames: 913408. Throughput: 0: 10859.9. Samples: 907476. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:25:49,158][72173] Avg episode reward: [(0, '1585.449')] [2023-09-21 12:25:49,159][72796] Saving new best policy, reward=1585.449! [2023-09-21 12:25:51,775][72809] Updated weights for policy 0, policy_version 1840 (0.0013) [2023-09-21 12:25:54,157][72173] Fps is (10 sec: 10649.8, 60 sec: 10786.1, 300 sec: 10695.1). Total num frames: 966656. Throughput: 0: 10823.3. Samples: 937880. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:25:54,158][72173] Avg episode reward: [(0, '1465.601')] [2023-09-21 12:25:55,307][72809] Updated weights for policy 0, policy_version 1920 (0.0014) [2023-09-21 12:25:59,114][72809] Updated weights for policy 0, policy_version 2000 (0.0012) [2023-09-21 12:25:59,157][72173] Fps is (10 sec: 11059.0, 60 sec: 10854.4, 300 sec: 10735.8). Total num frames: 1024000. Throughput: 0: 10822.6. Samples: 1003988. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:25:59,158][72173] Avg episode reward: [(0, '1600.423')] [2023-09-21 12:25:59,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000002000_1024000.pth... [2023-09-21 12:25:59,172][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000001368_700416.pth [2023-09-21 12:25:59,173][72796] Saving new best policy, reward=1600.423! [2023-09-21 12:26:02,952][72809] Updated weights for policy 0, policy_version 2080 (0.0012) [2023-09-21 12:26:04,157][72173] Fps is (10 sec: 11059.2, 60 sec: 10854.4, 300 sec: 10731.5). Total num frames: 1077248. Throughput: 0: 10831.7. Samples: 1069120. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:26:04,158][72173] Avg episode reward: [(0, '2035.980')] [2023-09-21 12:26:04,159][72796] Saving new best policy, reward=2035.980! [2023-09-21 12:26:06,877][72809] Updated weights for policy 0, policy_version 2160 (0.0013) [2023-09-21 12:26:09,157][72173] Fps is (10 sec: 10240.1, 60 sec: 10786.2, 300 sec: 10688.6). Total num frames: 1126400. Throughput: 0: 10857.2. Samples: 1099920. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 12:26:09,158][72173] Avg episode reward: [(0, '2386.214')] [2023-09-21 12:26:09,189][72796] Saving new best policy, reward=2386.214! [2023-09-21 12:26:10,642][72809] Updated weights for policy 0, policy_version 2240 (0.0012) [2023-09-21 12:26:14,157][72173] Fps is (10 sec: 10649.4, 60 sec: 10786.1, 300 sec: 10724.1). Total num frames: 1183744. Throughput: 0: 10689.3. Samples: 1163576. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 12:26:14,158][72173] Avg episode reward: [(0, '2264.691')] [2023-09-21 12:26:14,188][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000002320_1187840.pth... [2023-09-21 12:26:14,189][72809] Updated weights for policy 0, policy_version 2320 (0.0012) [2023-09-21 12:26:14,191][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000001680_860160.pth [2023-09-21 12:26:17,891][72809] Updated weights for policy 0, policy_version 2400 (0.0013) [2023-09-21 12:26:19,157][72173] Fps is (10 sec: 11468.8, 60 sec: 10922.7, 300 sec: 10756.4). Total num frames: 1241088. Throughput: 0: 10833.1. Samples: 1232960. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 12:26:19,158][72173] Avg episode reward: [(0, '2373.178')] [2023-09-21 12:26:21,585][72809] Updated weights for policy 0, policy_version 2480 (0.0013) [2023-09-21 12:26:24,157][72173] Fps is (10 sec: 11469.0, 60 sec: 10922.7, 300 sec: 10786.1). Total num frames: 1298432. Throughput: 0: 10837.8. Samples: 1265920. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:26:24,158][72173] Avg episode reward: [(0, '2391.000')] [2023-09-21 12:26:24,158][72796] Saving new best policy, reward=2391.000! [2023-09-21 12:26:25,326][72809] Updated weights for policy 0, policy_version 2560 (0.0014) [2023-09-21 12:26:28,975][72809] Updated weights for policy 0, policy_version 2640 (0.0011) [2023-09-21 12:26:29,157][72173] Fps is (10 sec: 11059.4, 60 sec: 10854.5, 300 sec: 10780.7). Total num frames: 1351680. Throughput: 0: 10878.1. Samples: 1333136. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:26:29,157][72173] Avg episode reward: [(0, '2446.952')] [2023-09-21 12:26:29,161][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000002640_1351680.pth... [2023-09-21 12:26:29,169][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000002000_1024000.pth [2023-09-21 12:26:29,169][72796] Saving new best policy, reward=2446.952! [2023-09-21 12:26:32,785][72809] Updated weights for policy 0, policy_version 2720 (0.0011) [2023-09-21 12:26:34,157][72173] Fps is (10 sec: 10649.4, 60 sec: 10854.4, 300 sec: 10775.6). Total num frames: 1404928. Throughput: 0: 10924.3. Samples: 1399072. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 12:26:34,158][72173] Avg episode reward: [(0, '2870.375')] [2023-09-21 12:26:34,186][72796] Saving new best policy, reward=2870.375! [2023-09-21 12:26:36,272][72809] Updated weights for policy 0, policy_version 2800 (0.0013) [2023-09-21 12:26:39,157][72173] Fps is (10 sec: 11468.7, 60 sec: 10990.9, 300 sec: 10831.6). Total num frames: 1466368. Throughput: 0: 11017.7. Samples: 1433676. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 12:26:39,158][72173] Avg episode reward: [(0, '2673.550')] [2023-09-21 12:26:39,948][72809] Updated weights for policy 0, policy_version 2880 (0.0014) [2023-09-21 12:26:43,617][72809] Updated weights for policy 0, policy_version 2960 (0.0013) [2023-09-21 12:26:44,157][72173] Fps is (10 sec: 11468.8, 60 sec: 10990.9, 300 sec: 10825.1). Total num frames: 1519616. Throughput: 0: 11073.2. Samples: 1502280. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:26:44,158][72173] Avg episode reward: [(0, '2854.637')] [2023-09-21 12:26:44,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000002968_1519616.pth... [2023-09-21 12:26:44,173][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000002320_1187840.pth [2023-09-21 12:26:47,248][72809] Updated weights for policy 0, policy_version 3040 (0.0013) [2023-09-21 12:26:49,157][72173] Fps is (10 sec: 11059.0, 60 sec: 11059.2, 300 sec: 10847.3). Total num frames: 1576960. Throughput: 0: 11126.3. Samples: 1569804. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:26:49,158][72173] Avg episode reward: [(0, '2887.009')] [2023-09-21 12:26:49,159][72796] Saving new best policy, reward=2887.009! [2023-09-21 12:26:50,795][72809] Updated weights for policy 0, policy_version 3120 (0.0013) [2023-09-21 12:26:54,157][72173] Fps is (10 sec: 11469.0, 60 sec: 11127.5, 300 sec: 10868.1). Total num frames: 1634304. Throughput: 0: 11184.6. Samples: 1603228. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 12:26:54,158][72173] Avg episode reward: [(0, '2833.777')] [2023-09-21 12:26:54,339][72809] Updated weights for policy 0, policy_version 3200 (0.0013) [2023-09-21 12:26:58,301][72809] Updated weights for policy 0, policy_version 3280 (0.0012) [2023-09-21 12:26:59,157][72173] Fps is (10 sec: 11059.4, 60 sec: 11059.3, 300 sec: 10861.0). Total num frames: 1687552. Throughput: 0: 11215.0. Samples: 1668248. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:26:59,157][72173] Avg episode reward: [(0, '2891.480')] [2023-09-21 12:26:59,162][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000003296_1687552.pth... [2023-09-21 12:26:59,168][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000002640_1351680.pth [2023-09-21 12:26:59,169][72796] Saving new best policy, reward=2891.480! [2023-09-21 12:27:02,035][72809] Updated weights for policy 0, policy_version 3360 (0.0009) [2023-09-21 12:27:04,157][72173] Fps is (10 sec: 10649.4, 60 sec: 11059.2, 300 sec: 10854.4). Total num frames: 1740800. Throughput: 0: 11187.8. Samples: 1736412. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 12:27:04,158][72173] Avg episode reward: [(0, '2984.772')] [2023-09-21 12:27:04,197][72796] Saving new best policy, reward=2984.772! [2023-09-21 12:27:05,807][72809] Updated weights for policy 0, policy_version 3440 (0.0013) [2023-09-21 12:27:09,157][72173] Fps is (10 sec: 10649.2, 60 sec: 11127.4, 300 sec: 10848.2). Total num frames: 1794048. Throughput: 0: 11127.0. Samples: 1766636. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-21 12:27:09,158][72173] Avg episode reward: [(0, '3116.162')] [2023-09-21 12:27:09,193][72796] Saving new best policy, reward=3116.162! [2023-09-21 12:27:09,481][72809] Updated weights for policy 0, policy_version 3520 (0.0013) [2023-09-21 12:27:13,311][72809] Updated weights for policy 0, policy_version 3600 (0.0014) [2023-09-21 12:27:14,157][72173] Fps is (10 sec: 11059.2, 60 sec: 11127.5, 300 sec: 10866.4). Total num frames: 1851392. Throughput: 0: 11127.5. Samples: 1833876. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:27:14,158][72173] Avg episode reward: [(0, '2963.569')] [2023-09-21 12:27:14,167][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000003616_1851392.pth... [2023-09-21 12:27:14,172][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000002968_1519616.pth [2023-09-21 12:27:17,448][72809] Updated weights for policy 0, policy_version 3680 (0.0014) [2023-09-21 12:27:19,157][72173] Fps is (10 sec: 10650.0, 60 sec: 10991.0, 300 sec: 10836.9). Total num frames: 1900544. Throughput: 0: 10963.3. Samples: 1892416. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:27:19,157][72173] Avg episode reward: [(0, '2786.239')] [2023-09-21 12:27:21,576][72809] Updated weights for policy 0, policy_version 3760 (0.0013) [2023-09-21 12:27:24,157][72173] Fps is (10 sec: 9830.5, 60 sec: 10854.4, 300 sec: 10808.9). Total num frames: 1949696. Throughput: 0: 10864.6. Samples: 1922584. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:27:24,158][72173] Avg episode reward: [(0, '2847.858')] [2023-09-21 12:27:25,210][72809] Updated weights for policy 0, policy_version 3840 (0.0013) [2023-09-21 12:27:28,620][72809] Updated weights for policy 0, policy_version 3920 (0.0015) [2023-09-21 12:27:29,157][72173] Fps is (10 sec: 11058.8, 60 sec: 10990.9, 300 sec: 10848.9). Total num frames: 2011136. Throughput: 0: 10854.2. Samples: 1990720. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 12:27:29,158][72173] Avg episode reward: [(0, '2943.824')] [2023-09-21 12:27:29,164][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000003928_2011136.pth... [2023-09-21 12:27:29,170][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000003296_1687552.pth [2023-09-21 12:27:32,398][72809] Updated weights for policy 0, policy_version 4000 (0.0011) [2023-09-21 12:27:34,157][72173] Fps is (10 sec: 11468.9, 60 sec: 10991.0, 300 sec: 10843.6). Total num frames: 2064384. Throughput: 0: 10845.2. Samples: 2057836. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:27:34,158][72173] Avg episode reward: [(0, '3234.689')] [2023-09-21 12:27:34,159][72796] Saving new best policy, reward=3234.689! [2023-09-21 12:27:36,198][72809] Updated weights for policy 0, policy_version 4080 (0.0014) [2023-09-21 12:27:39,157][72173] Fps is (10 sec: 11059.3, 60 sec: 10922.6, 300 sec: 10859.6). Total num frames: 2121728. Throughput: 0: 10823.3. Samples: 2090280. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:27:39,158][72173] Avg episode reward: [(0, '3395.795')] [2023-09-21 12:27:39,159][72796] Saving new best policy, reward=3395.795! [2023-09-21 12:27:39,660][72809] Updated weights for policy 0, policy_version 4160 (0.0014) [2023-09-21 12:27:43,467][72809] Updated weights for policy 0, policy_version 4240 (0.0016) [2023-09-21 12:27:44,157][72173] Fps is (10 sec: 11468.6, 60 sec: 10990.9, 300 sec: 10874.9). Total num frames: 2179072. Throughput: 0: 10893.8. Samples: 2158472. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:27:44,158][72173] Avg episode reward: [(0, '3485.166')] [2023-09-21 12:27:44,166][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000004256_2179072.pth... [2023-09-21 12:27:44,172][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000003616_1851392.pth [2023-09-21 12:27:44,173][72796] Saving new best policy, reward=3485.166! [2023-09-21 12:27:47,108][72809] Updated weights for policy 0, policy_version 4320 (0.0015) [2023-09-21 12:27:49,157][72173] Fps is (10 sec: 11059.4, 60 sec: 10922.7, 300 sec: 10869.4). Total num frames: 2232320. Throughput: 0: 10879.3. Samples: 2225976. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:27:49,158][72173] Avg episode reward: [(0, '3033.393')] [2023-09-21 12:27:50,789][72809] Updated weights for policy 0, policy_version 4400 (0.0014) [2023-09-21 12:27:54,157][72173] Fps is (10 sec: 11059.1, 60 sec: 10922.6, 300 sec: 10883.6). Total num frames: 2289664. Throughput: 0: 10961.3. Samples: 2259896. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:27:54,158][72173] Avg episode reward: [(0, '2911.105')] [2023-09-21 12:27:54,247][72809] Updated weights for policy 0, policy_version 4480 (0.0013) [2023-09-21 12:27:58,081][72809] Updated weights for policy 0, policy_version 4560 (0.0016) [2023-09-21 12:27:59,157][72173] Fps is (10 sec: 11058.9, 60 sec: 10922.6, 300 sec: 10878.2). Total num frames: 2342912. Throughput: 0: 10947.8. Samples: 2326528. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 12:27:59,158][72173] Avg episode reward: [(0, '3278.382')] [2023-09-21 12:27:59,185][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000004584_2347008.pth... [2023-09-21 12:27:59,188][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000003928_2011136.pth [2023-09-21 12:28:01,622][72809] Updated weights for policy 0, policy_version 4640 (0.0015) [2023-09-21 12:28:04,157][72173] Fps is (10 sec: 11468.9, 60 sec: 11059.2, 300 sec: 10910.2). Total num frames: 2404352. Throughput: 0: 11194.2. Samples: 2396160. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 12:28:04,158][72173] Avg episode reward: [(0, '3379.472')] [2023-09-21 12:28:05,167][72809] Updated weights for policy 0, policy_version 4720 (0.0014) [2023-09-21 12:28:08,585][72809] Updated weights for policy 0, policy_version 4800 (0.0013) [2023-09-21 12:28:09,157][72173] Fps is (10 sec: 11878.5, 60 sec: 11127.5, 300 sec: 10922.7). Total num frames: 2461696. Throughput: 0: 11253.5. Samples: 2428992. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:28:09,158][72173] Avg episode reward: [(0, '3148.234')] [2023-09-21 12:28:12,287][72809] Updated weights for policy 0, policy_version 4880 (0.0013) [2023-09-21 12:28:14,158][72173] Fps is (10 sec: 11468.3, 60 sec: 11127.4, 300 sec: 10934.5). Total num frames: 2519040. Throughput: 0: 11286.7. Samples: 2498624. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:28:14,158][72173] Avg episode reward: [(0, '2797.582')] [2023-09-21 12:28:14,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000004920_2519040.pth... [2023-09-21 12:28:14,171][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000004256_2179072.pth [2023-09-21 12:28:15,917][72809] Updated weights for policy 0, policy_version 4960 (0.0014) [2023-09-21 12:28:19,157][72173] Fps is (10 sec: 11468.7, 60 sec: 11263.9, 300 sec: 10945.9). Total num frames: 2576384. Throughput: 0: 11340.7. Samples: 2568172. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 12:28:19,158][72173] Avg episode reward: [(0, '3412.071')] [2023-09-21 12:28:19,436][72809] Updated weights for policy 0, policy_version 5040 (0.0014) [2023-09-21 12:28:22,979][72809] Updated weights for policy 0, policy_version 5120 (0.0014) [2023-09-21 12:28:24,157][72173] Fps is (10 sec: 11469.5, 60 sec: 11400.6, 300 sec: 10956.8). Total num frames: 2633728. Throughput: 0: 11439.5. Samples: 2605056. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 12:28:24,158][72173] Avg episode reward: [(0, '2897.346')] [2023-09-21 12:28:26,369][72809] Updated weights for policy 0, policy_version 5200 (0.0013) [2023-09-21 12:28:29,157][72173] Fps is (10 sec: 11878.4, 60 sec: 11400.5, 300 sec: 10984.0). Total num frames: 2695168. Throughput: 0: 11471.6. Samples: 2674696. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 12:28:29,158][72173] Avg episode reward: [(0, '3276.340')] [2023-09-21 12:28:29,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000005264_2695168.pth... [2023-09-21 12:28:29,172][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000004584_2347008.pth [2023-09-21 12:28:29,640][72809] Updated weights for policy 0, policy_version 5280 (0.0014) [2023-09-21 12:28:33,366][72809] Updated weights for policy 0, policy_version 5360 (0.0012) [2023-09-21 12:28:34,157][72173] Fps is (10 sec: 11468.6, 60 sec: 11400.5, 300 sec: 10977.3). Total num frames: 2748416. Throughput: 0: 11518.7. Samples: 2744320. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:28:34,158][72173] Avg episode reward: [(0, '3366.656')] [2023-09-21 12:28:37,074][72809] Updated weights for policy 0, policy_version 5440 (0.0010) [2023-09-21 12:28:39,157][72173] Fps is (10 sec: 11059.5, 60 sec: 11400.6, 300 sec: 10986.9). Total num frames: 2805760. Throughput: 0: 11472.0. Samples: 2776132. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:28:39,158][72173] Avg episode reward: [(0, '3406.860')] [2023-09-21 12:28:40,759][72809] Updated weights for policy 0, policy_version 5520 (0.0014) [2023-09-21 12:28:44,157][72173] Fps is (10 sec: 11468.8, 60 sec: 11400.5, 300 sec: 10996.2). Total num frames: 2863104. Throughput: 0: 11470.4. Samples: 2842696. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:28:44,158][72173] Avg episode reward: [(0, '3189.175')] [2023-09-21 12:28:44,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000005592_2863104.pth... [2023-09-21 12:28:44,170][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000004920_2519040.pth [2023-09-21 12:28:44,467][72809] Updated weights for policy 0, policy_version 5600 (0.0014) [2023-09-21 12:28:47,889][72809] Updated weights for policy 0, policy_version 5680 (0.0013) [2023-09-21 12:28:49,157][72173] Fps is (10 sec: 11878.0, 60 sec: 11537.0, 300 sec: 11020.5). Total num frames: 2924544. Throughput: 0: 11494.9. Samples: 2913432. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 12:28:49,159][72173] Avg episode reward: [(0, '3030.694')] [2023-09-21 12:28:51,205][72809] Updated weights for policy 0, policy_version 5760 (0.0014) [2023-09-21 12:28:54,157][72173] Fps is (10 sec: 11469.0, 60 sec: 11468.8, 300 sec: 11013.7). Total num frames: 2977792. Throughput: 0: 11575.8. Samples: 2949904. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:28:54,158][72173] Avg episode reward: [(0, '3734.266')] [2023-09-21 12:28:54,159][72796] Saving new best policy, reward=3734.266! [2023-09-21 12:28:55,061][72809] Updated weights for policy 0, policy_version 5840 (0.0015) [2023-09-21 12:28:58,965][72809] Updated weights for policy 0, policy_version 5920 (0.0014) [2023-09-21 12:28:59,157][72173] Fps is (10 sec: 10649.7, 60 sec: 11468.8, 300 sec: 11007.1). Total num frames: 3031040. Throughput: 0: 11452.4. Samples: 3013976. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:28:59,158][72173] Avg episode reward: [(0, '3658.285')] [2023-09-21 12:28:59,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000005920_3031040.pth... [2023-09-21 12:28:59,173][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000005264_2695168.pth [2023-09-21 12:29:02,850][72809] Updated weights for policy 0, policy_version 6000 (0.0014) [2023-09-21 12:29:04,157][72173] Fps is (10 sec: 10649.7, 60 sec: 11332.3, 300 sec: 11000.7). Total num frames: 3084288. Throughput: 0: 11313.1. Samples: 3077256. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 12:29:04,157][72173] Avg episode reward: [(0, '3650.124')] [2023-09-21 12:29:06,458][72809] Updated weights for policy 0, policy_version 6080 (0.0012) [2023-09-21 12:29:09,157][72173] Fps is (10 sec: 10649.8, 60 sec: 11264.0, 300 sec: 10994.5). Total num frames: 3137536. Throughput: 0: 11276.6. Samples: 3112504. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:29:09,158][72173] Avg episode reward: [(0, '3634.345')] [2023-09-21 12:29:10,169][72809] Updated weights for policy 0, policy_version 6160 (0.0012) [2023-09-21 12:29:14,069][72809] Updated weights for policy 0, policy_version 6240 (0.0015) [2023-09-21 12:29:14,157][72173] Fps is (10 sec: 11058.9, 60 sec: 11264.1, 300 sec: 11002.7). Total num frames: 3194880. Throughput: 0: 11121.8. Samples: 3175176. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-21 12:29:14,158][72173] Avg episode reward: [(0, '3626.259')] [2023-09-21 12:29:14,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000006240_3194880.pth... [2023-09-21 12:29:14,172][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000005592_2863104.pth [2023-09-21 12:29:17,862][72809] Updated weights for policy 0, policy_version 6320 (0.0014) [2023-09-21 12:29:19,157][72173] Fps is (10 sec: 11059.2, 60 sec: 11195.8, 300 sec: 10996.7). Total num frames: 3248128. Throughput: 0: 11026.6. Samples: 3240516. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:29:19,158][72173] Avg episode reward: [(0, '3449.863')] [2023-09-21 12:29:21,596][72809] Updated weights for policy 0, policy_version 6400 (0.0014) [2023-09-21 12:29:24,157][72173] Fps is (10 sec: 11059.5, 60 sec: 11195.7, 300 sec: 11038.4). Total num frames: 3305472. Throughput: 0: 11053.6. Samples: 3273544. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 12:29:24,158][72173] Avg episode reward: [(0, '3980.523')] [2023-09-21 12:29:24,159][72796] Saving new best policy, reward=3980.523! [2023-09-21 12:29:25,138][72809] Updated weights for policy 0, policy_version 6480 (0.0012) [2023-09-21 12:29:28,663][72809] Updated weights for policy 0, policy_version 6560 (0.0014) [2023-09-21 12:29:29,157][72173] Fps is (10 sec: 11468.5, 60 sec: 11127.5, 300 sec: 11052.3). Total num frames: 3362816. Throughput: 0: 11095.4. Samples: 3341988. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:29:29,158][72173] Avg episode reward: [(0, '3745.197')] [2023-09-21 12:29:29,164][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000006568_3362816.pth... [2023-09-21 12:29:29,170][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000005920_3031040.pth [2023-09-21 12:29:32,634][72809] Updated weights for policy 0, policy_version 6640 (0.0014) [2023-09-21 12:29:34,157][72173] Fps is (10 sec: 10649.6, 60 sec: 11059.2, 300 sec: 11024.5). Total num frames: 3411968. Throughput: 0: 10972.3. Samples: 3407180. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:29:34,158][72173] Avg episode reward: [(0, '3859.761')] [2023-09-21 12:29:36,641][72809] Updated weights for policy 0, policy_version 6720 (0.0015) [2023-09-21 12:29:39,157][72173] Fps is (10 sec: 10240.1, 60 sec: 10990.9, 300 sec: 11024.5). Total num frames: 3465216. Throughput: 0: 10819.4. Samples: 3436780. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:29:39,158][72173] Avg episode reward: [(0, '3784.775')] [2023-09-21 12:29:40,353][72809] Updated weights for policy 0, policy_version 6800 (0.0014) [2023-09-21 12:29:44,157][72173] Fps is (10 sec: 10649.2, 60 sec: 10922.6, 300 sec: 11024.5). Total num frames: 3518464. Throughput: 0: 10846.4. Samples: 3502068. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:29:44,159][72173] Avg episode reward: [(0, '3833.411')] [2023-09-21 12:29:44,166][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000006872_3518464.pth... [2023-09-21 12:29:44,178][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000006240_3194880.pth [2023-09-21 12:29:44,450][72809] Updated weights for policy 0, policy_version 6880 (0.0014) [2023-09-21 12:29:48,574][72809] Updated weights for policy 0, policy_version 6960 (0.0011) [2023-09-21 12:29:49,157][72173] Fps is (10 sec: 10240.0, 60 sec: 10717.9, 300 sec: 11010.6). Total num frames: 3567616. Throughput: 0: 10761.8. Samples: 3561540. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:29:49,158][72173] Avg episode reward: [(0, '3654.702')] [2023-09-21 12:29:52,332][72809] Updated weights for policy 0, policy_version 7040 (0.0013) [2023-09-21 12:29:54,157][72173] Fps is (10 sec: 10240.2, 60 sec: 10717.9, 300 sec: 11010.6). Total num frames: 3620864. Throughput: 0: 10661.1. Samples: 3592256. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 12:29:54,158][72173] Avg episode reward: [(0, '3177.531')] [2023-09-21 12:29:56,326][72809] Updated weights for policy 0, policy_version 7120 (0.0014) [2023-09-21 12:29:59,157][72173] Fps is (10 sec: 11059.0, 60 sec: 10786.1, 300 sec: 11024.5). Total num frames: 3678208. Throughput: 0: 10750.4. Samples: 3658944. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 12:29:59,159][72173] Avg episode reward: [(0, '3663.007')] [2023-09-21 12:29:59,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000007184_3678208.pth... [2023-09-21 12:29:59,170][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000006568_3362816.pth [2023-09-21 12:29:59,711][72809] Updated weights for policy 0, policy_version 7200 (0.0014) [2023-09-21 12:30:03,404][72809] Updated weights for policy 0, policy_version 7280 (0.0015) [2023-09-21 12:30:04,157][72173] Fps is (10 sec: 11468.7, 60 sec: 10854.4, 300 sec: 11038.4). Total num frames: 3735552. Throughput: 0: 10807.6. Samples: 3726860. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:30:04,158][72173] Avg episode reward: [(0, '3488.674')] [2023-09-21 12:30:06,899][72809] Updated weights for policy 0, policy_version 7360 (0.0015) [2023-09-21 12:30:09,157][72173] Fps is (10 sec: 11059.4, 60 sec: 10854.4, 300 sec: 11024.5). Total num frames: 3788800. Throughput: 0: 10842.3. Samples: 3761448. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:30:09,158][72173] Avg episode reward: [(0, '3183.476')] [2023-09-21 12:30:10,772][72809] Updated weights for policy 0, policy_version 7440 (0.0014) [2023-09-21 12:30:14,060][72809] Updated weights for policy 0, policy_version 7520 (0.0014) [2023-09-21 12:30:14,157][72173] Fps is (10 sec: 11468.8, 60 sec: 10922.7, 300 sec: 11066.1). Total num frames: 3850240. Throughput: 0: 10819.3. Samples: 3828856. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:30:14,158][72173] Avg episode reward: [(0, '2730.022')] [2023-09-21 12:30:14,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000007520_3850240.pth... [2023-09-21 12:30:14,172][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000006872_3518464.pth [2023-09-21 12:30:17,815][72809] Updated weights for policy 0, policy_version 7600 (0.0011) [2023-09-21 12:30:19,157][72173] Fps is (10 sec: 11468.9, 60 sec: 10922.6, 300 sec: 11052.3). Total num frames: 3903488. Throughput: 0: 10860.3. Samples: 3895896. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:30:19,158][72173] Avg episode reward: [(0, '3449.347')] [2023-09-21 12:30:21,619][72809] Updated weights for policy 0, policy_version 7680 (0.0012) [2023-09-21 12:30:24,157][72173] Fps is (10 sec: 11059.2, 60 sec: 10922.6, 300 sec: 11052.3). Total num frames: 3960832. Throughput: 0: 10937.9. Samples: 3928988. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:30:24,158][72173] Avg episode reward: [(0, '4009.460')] [2023-09-21 12:30:24,159][72796] Saving new best policy, reward=4009.460! [2023-09-21 12:30:24,934][72809] Updated weights for policy 0, policy_version 7760 (0.0013) [2023-09-21 12:30:28,865][72809] Updated weights for policy 0, policy_version 7840 (0.0014) [2023-09-21 12:30:29,157][72173] Fps is (10 sec: 11059.1, 60 sec: 10854.4, 300 sec: 11052.3). Total num frames: 4014080. Throughput: 0: 11015.4. Samples: 3997760. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 12:30:29,158][72173] Avg episode reward: [(0, '3712.979')] [2023-09-21 12:30:29,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000007840_4014080.pth... [2023-09-21 12:30:29,173][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000007184_3678208.pth [2023-09-21 12:30:32,411][72809] Updated weights for policy 0, policy_version 7920 (0.0015) [2023-09-21 12:30:34,157][72173] Fps is (10 sec: 11468.9, 60 sec: 11059.2, 300 sec: 11080.0). Total num frames: 4075520. Throughput: 0: 11239.3. Samples: 4067308. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:30:34,158][72173] Avg episode reward: [(0, '3265.598')] [2023-09-21 12:30:35,919][72809] Updated weights for policy 0, policy_version 8000 (0.0012) [2023-09-21 12:30:39,157][72173] Fps is (10 sec: 11468.7, 60 sec: 11059.2, 300 sec: 11080.0). Total num frames: 4128768. Throughput: 0: 11287.0. Samples: 4100172. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:30:39,158][72173] Avg episode reward: [(0, '3397.797')] [2023-09-21 12:30:39,524][72809] Updated weights for policy 0, policy_version 8080 (0.0014) [2023-09-21 12:30:43,324][72809] Updated weights for policy 0, policy_version 8160 (0.0014) [2023-09-21 12:30:44,157][72173] Fps is (10 sec: 11059.3, 60 sec: 11127.5, 300 sec: 11093.9). Total num frames: 4186112. Throughput: 0: 11262.4. Samples: 4165748. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 12:30:44,158][72173] Avg episode reward: [(0, '3460.354')] [2023-09-21 12:30:44,163][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000008176_4186112.pth... [2023-09-21 12:30:44,170][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000007520_3850240.pth [2023-09-21 12:30:46,728][72809] Updated weights for policy 0, policy_version 8240 (0.0013) [2023-09-21 12:30:49,157][72173] Fps is (10 sec: 11468.9, 60 sec: 11264.0, 300 sec: 11107.8). Total num frames: 4243456. Throughput: 0: 11346.9. Samples: 4237468. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 12:30:49,158][72173] Avg episode reward: [(0, '3265.655')] [2023-09-21 12:30:50,499][72809] Updated weights for policy 0, policy_version 8320 (0.0013) [2023-09-21 12:30:54,157][72173] Fps is (10 sec: 11059.3, 60 sec: 11264.0, 300 sec: 11093.9). Total num frames: 4296704. Throughput: 0: 11257.7. Samples: 4268040. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 12:30:54,157][72173] Avg episode reward: [(0, '3448.783')] [2023-09-21 12:30:54,527][72809] Updated weights for policy 0, policy_version 8400 (0.0014) [2023-09-21 12:30:58,551][72809] Updated weights for policy 0, policy_version 8480 (0.0011) [2023-09-21 12:30:59,157][72173] Fps is (10 sec: 10240.1, 60 sec: 11127.5, 300 sec: 11080.0). Total num frames: 4345856. Throughput: 0: 11126.3. Samples: 4329536. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 12:30:59,158][72173] Avg episode reward: [(0, '3605.290')] [2023-09-21 12:30:59,162][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000008488_4345856.pth... [2023-09-21 12:30:59,169][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000007840_4014080.pth [2023-09-21 12:31:02,396][72809] Updated weights for policy 0, policy_version 8560 (0.0014) [2023-09-21 12:31:04,157][72173] Fps is (10 sec: 10239.7, 60 sec: 11059.2, 300 sec: 11093.9). Total num frames: 4399104. Throughput: 0: 11056.5. Samples: 4393440. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 12:31:04,158][72173] Avg episode reward: [(0, '3810.161')] [2023-09-21 12:31:06,171][72809] Updated weights for policy 0, policy_version 8640 (0.0014) [2023-09-21 12:31:09,157][72173] Fps is (10 sec: 11058.9, 60 sec: 11127.4, 300 sec: 11093.9). Total num frames: 4456448. Throughput: 0: 11027.2. Samples: 4425212. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:31:09,158][72173] Avg episode reward: [(0, '3606.083')] [2023-09-21 12:31:09,961][72809] Updated weights for policy 0, policy_version 8720 (0.0015) [2023-09-21 12:31:13,521][72809] Updated weights for policy 0, policy_version 8800 (0.0012) [2023-09-21 12:31:14,157][72173] Fps is (10 sec: 11059.2, 60 sec: 10990.9, 300 sec: 11080.0). Total num frames: 4509696. Throughput: 0: 10936.3. Samples: 4489896. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:31:14,158][72173] Avg episode reward: [(0, '3480.801')] [2023-09-21 12:31:14,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000008808_4509696.pth... [2023-09-21 12:31:14,168][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000008176_4186112.pth [2023-09-21 12:31:17,514][72809] Updated weights for policy 0, policy_version 8880 (0.0013) [2023-09-21 12:31:19,157][72173] Fps is (10 sec: 10240.2, 60 sec: 10922.7, 300 sec: 11052.3). Total num frames: 4558848. Throughput: 0: 10833.5. Samples: 4554816. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 12:31:19,158][72173] Avg episode reward: [(0, '3470.813')] [2023-09-21 12:31:21,760][72809] Updated weights for policy 0, policy_version 8960 (0.0015) [2023-09-21 12:31:24,157][72173] Fps is (10 sec: 10240.1, 60 sec: 10854.4, 300 sec: 11052.2). Total num frames: 4612096. Throughput: 0: 10738.6. Samples: 4583408. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 12:31:24,158][72173] Avg episode reward: [(0, '3511.822')] [2023-09-21 12:31:25,548][72809] Updated weights for policy 0, policy_version 9040 (0.0012) [2023-09-21 12:31:28,953][72809] Updated weights for policy 0, policy_version 9120 (0.0013) [2023-09-21 12:31:29,157][72173] Fps is (10 sec: 11059.2, 60 sec: 10922.7, 300 sec: 11066.1). Total num frames: 4669440. Throughput: 0: 10737.3. Samples: 4648928. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 12:31:29,158][72173] Avg episode reward: [(0, '3621.338')] [2023-09-21 12:31:29,167][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000009120_4669440.pth... [2023-09-21 12:31:29,175][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000008488_4345856.pth [2023-09-21 12:31:32,902][72809] Updated weights for policy 0, policy_version 9200 (0.0014) [2023-09-21 12:31:34,157][72173] Fps is (10 sec: 11059.3, 60 sec: 10786.1, 300 sec: 11038.4). Total num frames: 4722688. Throughput: 0: 10600.8. Samples: 4714504. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:31:34,158][72173] Avg episode reward: [(0, '3683.539')] [2023-09-21 12:31:36,834][72809] Updated weights for policy 0, policy_version 9280 (0.0012) [2023-09-21 12:31:39,157][72173] Fps is (10 sec: 10240.2, 60 sec: 10717.9, 300 sec: 11024.5). Total num frames: 4771840. Throughput: 0: 10644.2. Samples: 4747028. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:31:39,158][72173] Avg episode reward: [(0, '3754.367')] [2023-09-21 12:31:40,878][72809] Updated weights for policy 0, policy_version 9360 (0.0014) [2023-09-21 12:31:44,157][72173] Fps is (10 sec: 10239.8, 60 sec: 10649.6, 300 sec: 11010.6). Total num frames: 4825088. Throughput: 0: 10649.6. Samples: 4808768. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:31:44,158][72173] Avg episode reward: [(0, '3926.364')] [2023-09-21 12:31:44,166][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000009424_4825088.pth... [2023-09-21 12:31:44,175][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000008808_4509696.pth [2023-09-21 12:31:44,795][72809] Updated weights for policy 0, policy_version 9440 (0.0013) [2023-09-21 12:31:48,364][72809] Updated weights for policy 0, policy_version 9520 (0.0014) [2023-09-21 12:31:49,157][72173] Fps is (10 sec: 11059.2, 60 sec: 10649.6, 300 sec: 11010.6). Total num frames: 4882432. Throughput: 0: 10673.7. Samples: 4873756. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 12:31:49,158][72173] Avg episode reward: [(0, '3926.191')] [2023-09-21 12:31:51,967][72809] Updated weights for policy 0, policy_version 9600 (0.0015) [2023-09-21 12:31:54,157][72173] Fps is (10 sec: 11468.8, 60 sec: 10717.8, 300 sec: 11024.5). Total num frames: 4939776. Throughput: 0: 10738.1. Samples: 4908424. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:31:54,158][72173] Avg episode reward: [(0, '3933.235')] [2023-09-21 12:31:55,338][72809] Updated weights for policy 0, policy_version 9680 (0.0013) [2023-09-21 12:31:59,157][72173] Fps is (10 sec: 11059.2, 60 sec: 10786.1, 300 sec: 11024.5). Total num frames: 4993024. Throughput: 0: 10816.0. Samples: 4976612. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:31:59,158][72173] Avg episode reward: [(0, '3981.464')] [2023-09-21 12:31:59,164][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000009752_4993024.pth... [2023-09-21 12:31:59,171][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000009120_4669440.pth [2023-09-21 12:31:59,329][72809] Updated weights for policy 0, policy_version 9760 (0.0013) [2023-09-21 12:32:02,954][72809] Updated weights for policy 0, policy_version 9840 (0.0012) [2023-09-21 12:32:04,157][72173] Fps is (10 sec: 11059.3, 60 sec: 10854.4, 300 sec: 11038.4). Total num frames: 5050368. Throughput: 0: 10834.3. Samples: 5042360. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:32:04,158][72173] Avg episode reward: [(0, '4156.763')] [2023-09-21 12:32:04,159][72796] Saving new best policy, reward=4156.763! [2023-09-21 12:32:06,553][72809] Updated weights for policy 0, policy_version 9920 (0.0013) [2023-09-21 12:32:09,157][72173] Fps is (10 sec: 11468.6, 60 sec: 10854.4, 300 sec: 11038.4). Total num frames: 5107712. Throughput: 0: 10936.5. Samples: 5075552. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:32:09,158][72173] Avg episode reward: [(0, '4159.556')] [2023-09-21 12:32:09,159][72796] Saving new best policy, reward=4159.556! [2023-09-21 12:32:10,066][72809] Updated weights for policy 0, policy_version 10000 (0.0011) [2023-09-21 12:32:13,819][72809] Updated weights for policy 0, policy_version 10080 (0.0012) [2023-09-21 12:32:14,157][72173] Fps is (10 sec: 11059.2, 60 sec: 10854.4, 300 sec: 11052.2). Total num frames: 5160960. Throughput: 0: 11000.4. Samples: 5143944. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:32:14,158][72173] Avg episode reward: [(0, '4116.420')] [2023-09-21 12:32:14,203][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000010088_5165056.pth... [2023-09-21 12:32:14,211][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000009424_4825088.pth [2023-09-21 12:32:17,498][72809] Updated weights for policy 0, policy_version 10160 (0.0011) [2023-09-21 12:32:19,157][72173] Fps is (10 sec: 11059.1, 60 sec: 10990.9, 300 sec: 11080.0). Total num frames: 5218304. Throughput: 0: 11029.9. Samples: 5210852. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:32:19,158][72173] Avg episode reward: [(0, '4109.910')] [2023-09-21 12:32:20,984][72809] Updated weights for policy 0, policy_version 10240 (0.0014) [2023-09-21 12:32:24,157][72173] Fps is (10 sec: 11878.7, 60 sec: 11127.5, 300 sec: 11080.0). Total num frames: 5279744. Throughput: 0: 11110.0. Samples: 5246976. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:32:24,158][72173] Avg episode reward: [(0, '4125.858')] [2023-09-21 12:32:24,402][72809] Updated weights for policy 0, policy_version 10320 (0.0011) [2023-09-21 12:32:28,055][72809] Updated weights for policy 0, policy_version 10400 (0.0014) [2023-09-21 12:32:29,158][72173] Fps is (10 sec: 11878.2, 60 sec: 11127.4, 300 sec: 11093.9). Total num frames: 5337088. Throughput: 0: 11276.6. Samples: 5316216. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 12:32:29,159][72173] Avg episode reward: [(0, '4108.518')] [2023-09-21 12:32:29,166][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000010424_5337088.pth... [2023-09-21 12:32:29,170][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000009752_4993024.pth [2023-09-21 12:32:31,473][72809] Updated weights for policy 0, policy_version 10480 (0.0012) [2023-09-21 12:32:34,157][72173] Fps is (10 sec: 11878.2, 60 sec: 11264.0, 300 sec: 11107.8). Total num frames: 5398528. Throughput: 0: 11443.7. Samples: 5388724. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:32:34,158][72173] Avg episode reward: [(0, '4266.598')] [2023-09-21 12:32:34,159][72796] Saving new best policy, reward=4266.598! [2023-09-21 12:32:34,871][72809] Updated weights for policy 0, policy_version 10560 (0.0014) [2023-09-21 12:32:38,731][72809] Updated weights for policy 0, policy_version 10640 (0.0014) [2023-09-21 12:32:39,157][72173] Fps is (10 sec: 11469.3, 60 sec: 11332.3, 300 sec: 11093.9). Total num frames: 5451776. Throughput: 0: 11434.5. Samples: 5422972. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:32:39,158][72173] Avg episode reward: [(0, '4283.773')] [2023-09-21 12:32:39,159][72796] Saving new best policy, reward=4283.773! [2023-09-21 12:32:42,295][72809] Updated weights for policy 0, policy_version 10720 (0.0014) [2023-09-21 12:32:44,158][72173] Fps is (10 sec: 11058.8, 60 sec: 11400.5, 300 sec: 11107.8). Total num frames: 5509120. Throughput: 0: 11379.7. Samples: 5488704. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:32:44,159][72173] Avg episode reward: [(0, '4110.350')] [2023-09-21 12:32:44,194][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000010768_5513216.pth... [2023-09-21 12:32:44,198][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000010088_5165056.pth [2023-09-21 12:32:45,524][72809] Updated weights for policy 0, policy_version 10800 (0.0012) [2023-09-21 12:32:48,980][72809] Updated weights for policy 0, policy_version 10880 (0.0013) [2023-09-21 12:32:49,157][72173] Fps is (10 sec: 11878.1, 60 sec: 11468.8, 300 sec: 11121.7). Total num frames: 5570560. Throughput: 0: 11590.6. Samples: 5563940. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:32:49,158][72173] Avg episode reward: [(0, '4038.217')] [2023-09-21 12:32:52,329][72809] Updated weights for policy 0, policy_version 10960 (0.0013) [2023-09-21 12:32:54,157][72173] Fps is (10 sec: 12288.6, 60 sec: 11537.1, 300 sec: 11149.5). Total num frames: 5632000. Throughput: 0: 11638.8. Samples: 5599296. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:32:54,158][72173] Avg episode reward: [(0, '4059.889')] [2023-09-21 12:32:55,916][72809] Updated weights for policy 0, policy_version 11040 (0.0012) [2023-09-21 12:32:59,157][72173] Fps is (10 sec: 11878.4, 60 sec: 11605.3, 300 sec: 11135.6). Total num frames: 5689344. Throughput: 0: 11638.1. Samples: 5667660. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:32:59,158][72173] Avg episode reward: [(0, '4068.416')] [2023-09-21 12:32:59,166][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000011112_5689344.pth... [2023-09-21 12:32:59,173][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000010424_5337088.pth [2023-09-21 12:32:59,506][72809] Updated weights for policy 0, policy_version 11120 (0.0013) [2023-09-21 12:33:03,540][72809] Updated weights for policy 0, policy_version 11200 (0.0013) [2023-09-21 12:33:04,157][72173] Fps is (10 sec: 10649.4, 60 sec: 11468.8, 300 sec: 11107.8). Total num frames: 5738496. Throughput: 0: 11589.4. Samples: 5732372. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:33:04,158][72173] Avg episode reward: [(0, '4059.637')] [2023-09-21 12:33:07,490][72809] Updated weights for policy 0, policy_version 11280 (0.0013) [2023-09-21 12:33:09,157][72173] Fps is (10 sec: 10240.1, 60 sec: 11400.5, 300 sec: 11093.9). Total num frames: 5791744. Throughput: 0: 11465.8. Samples: 5762940. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:33:09,158][72173] Avg episode reward: [(0, '3999.917')] [2023-09-21 12:33:11,322][72809] Updated weights for policy 0, policy_version 11360 (0.0013) [2023-09-21 12:33:14,157][72173] Fps is (10 sec: 10649.5, 60 sec: 11400.5, 300 sec: 11080.0). Total num frames: 5844992. Throughput: 0: 11346.0. Samples: 5826784. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:33:14,158][72173] Avg episode reward: [(0, '4107.427')] [2023-09-21 12:33:14,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000011416_5844992.pth... [2023-09-21 12:33:14,173][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000010768_5513216.pth [2023-09-21 12:33:15,214][72809] Updated weights for policy 0, policy_version 11440 (0.0014) [2023-09-21 12:33:18,856][72809] Updated weights for policy 0, policy_version 11520 (0.0013) [2023-09-21 12:33:19,157][72173] Fps is (10 sec: 10649.6, 60 sec: 11332.3, 300 sec: 11066.1). Total num frames: 5898240. Throughput: 0: 11211.9. Samples: 5893260. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 12:33:19,158][72173] Avg episode reward: [(0, '4225.593')] [2023-09-21 12:33:22,503][72809] Updated weights for policy 0, policy_version 11600 (0.0013) [2023-09-21 12:33:24,157][72173] Fps is (10 sec: 11059.4, 60 sec: 11264.0, 300 sec: 11052.3). Total num frames: 5955584. Throughput: 0: 11153.2. Samples: 5924868. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 12:33:24,158][72173] Avg episode reward: [(0, '4248.899')] [2023-09-21 12:33:26,091][72809] Updated weights for policy 0, policy_version 11680 (0.0013) [2023-09-21 12:33:29,157][72173] Fps is (10 sec: 11468.9, 60 sec: 11264.1, 300 sec: 11066.1). Total num frames: 6012928. Throughput: 0: 11253.5. Samples: 5995108. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 12:33:29,158][72173] Avg episode reward: [(0, '4125.944')] [2023-09-21 12:33:29,164][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000011744_6012928.pth... [2023-09-21 12:33:29,167][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000011112_5689344.pth [2023-09-21 12:33:29,565][72809] Updated weights for policy 0, policy_version 11760 (0.0014) [2023-09-21 12:33:33,314][72809] Updated weights for policy 0, policy_version 11840 (0.0014) [2023-09-21 12:33:34,157][72173] Fps is (10 sec: 11468.8, 60 sec: 11195.7, 300 sec: 11066.1). Total num frames: 6070272. Throughput: 0: 11070.0. Samples: 6062088. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 12:33:34,158][72173] Avg episode reward: [(0, '4105.687')] [2023-09-21 12:33:37,181][72809] Updated weights for policy 0, policy_version 11920 (0.0012) [2023-09-21 12:33:39,157][72173] Fps is (10 sec: 11058.9, 60 sec: 11195.7, 300 sec: 11052.3). Total num frames: 6123520. Throughput: 0: 11009.2. Samples: 6094712. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:33:39,159][72173] Avg episode reward: [(0, '4217.479')] [2023-09-21 12:33:41,007][72809] Updated weights for policy 0, policy_version 12000 (0.0015) [2023-09-21 12:33:44,157][72173] Fps is (10 sec: 10649.4, 60 sec: 11127.5, 300 sec: 11024.5). Total num frames: 6176768. Throughput: 0: 10881.9. Samples: 6157348. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:33:44,158][72173] Avg episode reward: [(0, '4128.644')] [2023-09-21 12:33:44,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000012064_6176768.pth... [2023-09-21 12:33:44,177][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000011416_5844992.pth [2023-09-21 12:33:44,648][72809] Updated weights for policy 0, policy_version 12080 (0.0014) [2023-09-21 12:33:48,086][72809] Updated weights for policy 0, policy_version 12160 (0.0014) [2023-09-21 12:33:49,157][72173] Fps is (10 sec: 11059.5, 60 sec: 11059.2, 300 sec: 11038.4). Total num frames: 6234112. Throughput: 0: 11036.9. Samples: 6229032. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:33:49,158][72173] Avg episode reward: [(0, '4127.320')] [2023-09-21 12:33:51,828][72809] Updated weights for policy 0, policy_version 12240 (0.0014) [2023-09-21 12:33:54,157][72173] Fps is (10 sec: 11469.0, 60 sec: 10990.9, 300 sec: 11052.3). Total num frames: 6291456. Throughput: 0: 11093.5. Samples: 6262148. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-21 12:33:54,158][72173] Avg episode reward: [(0, '4172.459')] [2023-09-21 12:33:55,374][72809] Updated weights for policy 0, policy_version 12320 (0.0014) [2023-09-21 12:33:58,794][72809] Updated weights for policy 0, policy_version 12400 (0.0013) [2023-09-21 12:33:59,158][72173] Fps is (10 sec: 11468.3, 60 sec: 10990.9, 300 sec: 11066.1). Total num frames: 6348800. Throughput: 0: 11237.6. Samples: 6332480. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-21 12:33:59,159][72173] Avg episode reward: [(0, '4229.379')] [2023-09-21 12:33:59,198][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000012408_6352896.pth... [2023-09-21 12:33:59,201][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000011744_6012928.pth [2023-09-21 12:34:02,664][72809] Updated weights for policy 0, policy_version 12480 (0.0014) [2023-09-21 12:34:04,157][72173] Fps is (10 sec: 11468.8, 60 sec: 11127.5, 300 sec: 11080.0). Total num frames: 6406144. Throughput: 0: 11217.0. Samples: 6398024. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:34:04,158][72173] Avg episode reward: [(0, '4056.286')] [2023-09-21 12:34:06,035][72809] Updated weights for policy 0, policy_version 12560 (0.0013) [2023-09-21 12:34:09,157][72173] Fps is (10 sec: 11059.3, 60 sec: 11127.4, 300 sec: 11066.1). Total num frames: 6459392. Throughput: 0: 11329.9. Samples: 6434716. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:34:09,159][72173] Avg episode reward: [(0, '3761.835')] [2023-09-21 12:34:09,928][72809] Updated weights for policy 0, policy_version 12640 (0.0012) [2023-09-21 12:34:13,397][72809] Updated weights for policy 0, policy_version 12720 (0.0014) [2023-09-21 12:34:14,157][72173] Fps is (10 sec: 11468.7, 60 sec: 11264.0, 300 sec: 11093.9). Total num frames: 6520832. Throughput: 0: 11173.5. Samples: 6497916. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:34:14,158][72173] Avg episode reward: [(0, '3880.908')] [2023-09-21 12:34:14,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000012736_6520832.pth... [2023-09-21 12:34:14,174][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000012064_6176768.pth [2023-09-21 12:34:16,909][72809] Updated weights for policy 0, policy_version 12800 (0.0015) [2023-09-21 12:34:19,157][72173] Fps is (10 sec: 11469.3, 60 sec: 11264.0, 300 sec: 11080.0). Total num frames: 6574080. Throughput: 0: 11248.1. Samples: 6568252. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:34:19,158][72173] Avg episode reward: [(0, '4096.026')] [2023-09-21 12:34:20,735][72809] Updated weights for policy 0, policy_version 12880 (0.0012) [2023-09-21 12:34:24,157][72173] Fps is (10 sec: 11059.4, 60 sec: 11264.0, 300 sec: 11080.0). Total num frames: 6631424. Throughput: 0: 11239.0. Samples: 6600464. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-21 12:34:24,158][72173] Avg episode reward: [(0, '4005.082')] [2023-09-21 12:34:24,457][72809] Updated weights for policy 0, policy_version 12960 (0.0012) [2023-09-21 12:34:27,919][72809] Updated weights for policy 0, policy_version 13040 (0.0013) [2023-09-21 12:34:29,158][72173] Fps is (10 sec: 11468.3, 60 sec: 11263.9, 300 sec: 11107.8). Total num frames: 6688768. Throughput: 0: 11359.8. Samples: 6668540. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-21 12:34:29,159][72173] Avg episode reward: [(0, '3914.467')] [2023-09-21 12:34:29,166][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000013064_6688768.pth... [2023-09-21 12:34:29,174][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000012408_6352896.pth [2023-09-21 12:34:31,467][72809] Updated weights for policy 0, policy_version 13120 (0.0012) [2023-09-21 12:34:34,157][72173] Fps is (10 sec: 11468.7, 60 sec: 11264.0, 300 sec: 11121.7). Total num frames: 6746112. Throughput: 0: 11392.4. Samples: 6741692. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:34:34,158][72173] Avg episode reward: [(0, '4028.631')] [2023-09-21 12:34:34,841][72809] Updated weights for policy 0, policy_version 13200 (0.0012) [2023-09-21 12:34:38,606][72809] Updated weights for policy 0, policy_version 13280 (0.0013) [2023-09-21 12:34:39,157][72173] Fps is (10 sec: 11469.1, 60 sec: 11332.3, 300 sec: 11135.6). Total num frames: 6803456. Throughput: 0: 11393.3. Samples: 6774848. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:34:39,158][72173] Avg episode reward: [(0, '4071.803')] [2023-09-21 12:34:42,394][72809] Updated weights for policy 0, policy_version 13360 (0.0014) [2023-09-21 12:34:44,157][72173] Fps is (10 sec: 11468.7, 60 sec: 11400.5, 300 sec: 11163.3). Total num frames: 6860800. Throughput: 0: 11264.8. Samples: 6839396. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:34:44,158][72173] Avg episode reward: [(0, '4097.820')] [2023-09-21 12:34:44,166][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000013400_6860800.pth... [2023-09-21 12:34:44,174][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000012736_6520832.pth [2023-09-21 12:34:45,513][72809] Updated weights for policy 0, policy_version 13440 (0.0013) [2023-09-21 12:34:48,725][72809] Updated weights for policy 0, policy_version 13520 (0.0013) [2023-09-21 12:34:49,157][72173] Fps is (10 sec: 12288.0, 60 sec: 11537.0, 300 sec: 11205.0). Total num frames: 6926336. Throughput: 0: 11558.2. Samples: 6918144. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:34:49,158][72173] Avg episode reward: [(0, '3737.011')] [2023-09-21 12:34:52,731][72809] Updated weights for policy 0, policy_version 13600 (0.0015) [2023-09-21 12:34:54,157][72173] Fps is (10 sec: 11878.6, 60 sec: 11468.8, 300 sec: 11191.1). Total num frames: 6979584. Throughput: 0: 11447.1. Samples: 6949832. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 12:34:54,158][72173] Avg episode reward: [(0, '3698.712')] [2023-09-21 12:34:56,137][72809] Updated weights for policy 0, policy_version 13680 (0.0013) [2023-09-21 12:34:59,157][72173] Fps is (10 sec: 11059.2, 60 sec: 11468.8, 300 sec: 11191.1). Total num frames: 7036928. Throughput: 0: 11545.4. Samples: 7017460. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 12:34:59,158][72173] Avg episode reward: [(0, '3830.300')] [2023-09-21 12:34:59,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000013744_7036928.pth... [2023-09-21 12:34:59,170][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000013064_6688768.pth [2023-09-21 12:34:59,606][72809] Updated weights for policy 0, policy_version 13760 (0.0013) [2023-09-21 12:35:03,370][72809] Updated weights for policy 0, policy_version 13840 (0.0013) [2023-09-21 12:35:04,157][72173] Fps is (10 sec: 11059.2, 60 sec: 11400.5, 300 sec: 11191.1). Total num frames: 7090176. Throughput: 0: 11507.0. Samples: 7086068. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 12:35:04,158][72173] Avg episode reward: [(0, '4068.571')] [2023-09-21 12:35:07,053][72809] Updated weights for policy 0, policy_version 13920 (0.0014) [2023-09-21 12:35:09,157][72173] Fps is (10 sec: 11468.8, 60 sec: 11537.1, 300 sec: 11191.1). Total num frames: 7151616. Throughput: 0: 11503.0. Samples: 7118100. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-21 12:35:09,158][72173] Avg episode reward: [(0, '4306.333')] [2023-09-21 12:35:09,159][72796] Saving new best policy, reward=4306.333! [2023-09-21 12:35:10,180][72809] Updated weights for policy 0, policy_version 14000 (0.0012) [2023-09-21 12:35:13,649][72809] Updated weights for policy 0, policy_version 14080 (0.0014) [2023-09-21 12:35:14,157][72173] Fps is (10 sec: 12288.2, 60 sec: 11537.1, 300 sec: 11218.9). Total num frames: 7213056. Throughput: 0: 11652.4. Samples: 7192892. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 12:35:14,158][72173] Avg episode reward: [(0, '3780.295')] [2023-09-21 12:35:14,164][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000014088_7213056.pth... [2023-09-21 12:35:14,171][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000013400_6860800.pth [2023-09-21 12:35:17,362][72809] Updated weights for policy 0, policy_version 14160 (0.0012) [2023-09-21 12:35:19,157][72173] Fps is (10 sec: 11469.0, 60 sec: 11537.1, 300 sec: 11205.0). Total num frames: 7266304. Throughput: 0: 11499.7. Samples: 7259176. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 12:35:19,158][72173] Avg episode reward: [(0, '3808.011')] [2023-09-21 12:35:21,490][72809] Updated weights for policy 0, policy_version 14240 (0.0011) [2023-09-21 12:35:24,157][72173] Fps is (10 sec: 10649.6, 60 sec: 11468.8, 300 sec: 11205.0). Total num frames: 7319552. Throughput: 0: 11428.5. Samples: 7289128. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:35:24,158][72173] Avg episode reward: [(0, '3985.170')] [2023-09-21 12:35:25,151][72809] Updated weights for policy 0, policy_version 14320 (0.0013) [2023-09-21 12:35:28,961][72809] Updated weights for policy 0, policy_version 14400 (0.0014) [2023-09-21 12:35:29,157][72173] Fps is (10 sec: 10649.3, 60 sec: 11400.6, 300 sec: 11177.2). Total num frames: 7372800. Throughput: 0: 11489.3. Samples: 7356416. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:35:29,158][72173] Avg episode reward: [(0, '3921.898')] [2023-09-21 12:35:29,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000014400_7372800.pth... [2023-09-21 12:35:29,171][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000013744_7036928.pth [2023-09-21 12:35:32,622][72809] Updated weights for policy 0, policy_version 14480 (0.0013) [2023-09-21 12:35:34,157][72173] Fps is (10 sec: 11059.1, 60 sec: 11400.5, 300 sec: 11191.1). Total num frames: 7430144. Throughput: 0: 11197.2. Samples: 7422016. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:35:34,158][72173] Avg episode reward: [(0, '3850.574')] [2023-09-21 12:35:36,263][72809] Updated weights for policy 0, policy_version 14560 (0.0009) [2023-09-21 12:35:39,157][72173] Fps is (10 sec: 11059.1, 60 sec: 11332.2, 300 sec: 11177.2). Total num frames: 7483392. Throughput: 0: 11222.1. Samples: 7454828. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:35:39,159][72173] Avg episode reward: [(0, '3929.894')] [2023-09-21 12:35:40,148][72809] Updated weights for policy 0, policy_version 14640 (0.0016) [2023-09-21 12:35:43,760][72809] Updated weights for policy 0, policy_version 14720 (0.0015) [2023-09-21 12:35:44,157][72173] Fps is (10 sec: 11059.2, 60 sec: 11332.3, 300 sec: 11177.2). Total num frames: 7540736. Throughput: 0: 11111.3. Samples: 7517468. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:35:44,158][72173] Avg episode reward: [(0, '4208.995')] [2023-09-21 12:35:44,164][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000014728_7540736.pth... [2023-09-21 12:35:44,167][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000014088_7213056.pth [2023-09-21 12:35:47,497][72809] Updated weights for policy 0, policy_version 14800 (0.0011) [2023-09-21 12:35:49,157][72173] Fps is (10 sec: 11059.2, 60 sec: 11127.4, 300 sec: 11177.2). Total num frames: 7593984. Throughput: 0: 11104.9. Samples: 7585792. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:35:49,158][72173] Avg episode reward: [(0, '4126.367')] [2023-09-21 12:35:51,466][72809] Updated weights for policy 0, policy_version 14880 (0.0011) [2023-09-21 12:35:54,157][72173] Fps is (10 sec: 10240.1, 60 sec: 11059.2, 300 sec: 11177.2). Total num frames: 7643136. Throughput: 0: 11112.5. Samples: 7618160. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:35:54,158][72173] Avg episode reward: [(0, '4129.227')] [2023-09-21 12:35:55,434][72809] Updated weights for policy 0, policy_version 14960 (0.0015) [2023-09-21 12:35:58,994][72809] Updated weights for policy 0, policy_version 15040 (0.0013) [2023-09-21 12:35:59,157][72173] Fps is (10 sec: 10649.9, 60 sec: 11059.2, 300 sec: 11191.1). Total num frames: 7700480. Throughput: 0: 10827.8. Samples: 7680144. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:35:59,158][72173] Avg episode reward: [(0, '3801.150')] [2023-09-21 12:35:59,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000015040_7700480.pth... [2023-09-21 12:35:59,171][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000014400_7372800.pth [2023-09-21 12:36:02,736][72809] Updated weights for policy 0, policy_version 15120 (0.0015) [2023-09-21 12:36:04,157][72173] Fps is (10 sec: 11468.9, 60 sec: 11127.5, 300 sec: 11191.1). Total num frames: 7757824. Throughput: 0: 10886.6. Samples: 7749072. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:36:04,158][72173] Avg episode reward: [(0, '3491.404')] [2023-09-21 12:36:06,171][72809] Updated weights for policy 0, policy_version 15200 (0.0014) [2023-09-21 12:36:09,157][72173] Fps is (10 sec: 11468.7, 60 sec: 11059.2, 300 sec: 11205.0). Total num frames: 7815168. Throughput: 0: 10995.7. Samples: 7783936. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:36:09,158][72173] Avg episode reward: [(0, '3765.112')] [2023-09-21 12:36:09,775][72809] Updated weights for policy 0, policy_version 15280 (0.0014) [2023-09-21 12:36:13,532][72809] Updated weights for policy 0, policy_version 15360 (0.0014) [2023-09-21 12:36:14,157][72173] Fps is (10 sec: 11058.9, 60 sec: 10922.6, 300 sec: 11218.9). Total num frames: 7868416. Throughput: 0: 11013.7. Samples: 7852032. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:36:14,158][72173] Avg episode reward: [(0, '3889.817')] [2023-09-21 12:36:14,164][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000015368_7868416.pth... [2023-09-21 12:36:14,171][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000014728_7540736.pth [2023-09-21 12:36:17,481][72809] Updated weights for policy 0, policy_version 15440 (0.0013) [2023-09-21 12:36:19,157][72173] Fps is (10 sec: 10649.5, 60 sec: 10922.6, 300 sec: 11218.9). Total num frames: 7921664. Throughput: 0: 10962.7. Samples: 7915336. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 12:36:19,158][72173] Avg episode reward: [(0, '3945.192')] [2023-09-21 12:36:21,232][72809] Updated weights for policy 0, policy_version 15520 (0.0014) [2023-09-21 12:36:24,157][72173] Fps is (10 sec: 10649.7, 60 sec: 10922.6, 300 sec: 11205.0). Total num frames: 7974912. Throughput: 0: 10932.6. Samples: 7946796. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 12:36:24,159][72173] Avg episode reward: [(0, '3991.000')] [2023-09-21 12:36:24,939][72809] Updated weights for policy 0, policy_version 15600 (0.0014) [2023-09-21 12:36:28,707][72809] Updated weights for policy 0, policy_version 15680 (0.0009) [2023-09-21 12:36:29,157][72173] Fps is (10 sec: 11059.2, 60 sec: 10990.9, 300 sec: 11218.9). Total num frames: 8032256. Throughput: 0: 10983.1. Samples: 8011708. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 12:36:29,158][72173] Avg episode reward: [(0, '3975.821')] [2023-09-21 12:36:29,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000015688_8032256.pth... [2023-09-21 12:36:29,171][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000015040_7700480.pth [2023-09-21 12:36:32,690][72809] Updated weights for policy 0, policy_version 15760 (0.0013) [2023-09-21 12:36:34,157][72173] Fps is (10 sec: 10649.9, 60 sec: 10854.4, 300 sec: 11218.9). Total num frames: 8081408. Throughput: 0: 10886.8. Samples: 8075696. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 12:36:34,158][72173] Avg episode reward: [(0, '4060.384')] [2023-09-21 12:36:36,789][72809] Updated weights for policy 0, policy_version 15840 (0.0013) [2023-09-21 12:36:39,157][72173] Fps is (10 sec: 10240.2, 60 sec: 10854.5, 300 sec: 11218.9). Total num frames: 8134656. Throughput: 0: 10837.4. Samples: 8105844. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-21 12:36:39,158][72173] Avg episode reward: [(0, '4258.299')] [2023-09-21 12:36:40,483][72809] Updated weights for policy 0, policy_version 15920 (0.0014) [2023-09-21 12:36:44,079][72809] Updated weights for policy 0, policy_version 16000 (0.0014) [2023-09-21 12:36:44,157][72173] Fps is (10 sec: 11059.0, 60 sec: 10854.4, 300 sec: 11218.9). Total num frames: 8192000. Throughput: 0: 10927.5. Samples: 8171884. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:36:44,158][72173] Avg episode reward: [(0, '4255.140')] [2023-09-21 12:36:44,166][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000016000_8192000.pth... [2023-09-21 12:36:44,172][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000015368_7868416.pth [2023-09-21 12:36:47,413][72809] Updated weights for policy 0, policy_version 16080 (0.0013) [2023-09-21 12:36:49,157][72173] Fps is (10 sec: 11878.3, 60 sec: 10991.0, 300 sec: 11232.8). Total num frames: 8253440. Throughput: 0: 11013.2. Samples: 8244668. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:36:49,158][72173] Avg episode reward: [(0, '4108.615')] [2023-09-21 12:36:51,028][72809] Updated weights for policy 0, policy_version 16160 (0.0012) [2023-09-21 12:36:54,157][72173] Fps is (10 sec: 11469.2, 60 sec: 11059.2, 300 sec: 11232.8). Total num frames: 8306688. Throughput: 0: 10958.8. Samples: 8277080. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:36:54,157][72173] Avg episode reward: [(0, '4163.110')] [2023-09-21 12:36:54,673][72809] Updated weights for policy 0, policy_version 16240 (0.0012) [2023-09-21 12:36:58,454][72809] Updated weights for policy 0, policy_version 16320 (0.0014) [2023-09-21 12:36:59,157][72173] Fps is (10 sec: 11058.8, 60 sec: 11059.1, 300 sec: 11232.7). Total num frames: 8364032. Throughput: 0: 10922.8. Samples: 8343560. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:36:59,158][72173] Avg episode reward: [(0, '4148.485')] [2023-09-21 12:36:59,166][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000016336_8364032.pth... [2023-09-21 12:36:59,171][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000015688_8032256.pth [2023-09-21 12:37:02,003][72809] Updated weights for policy 0, policy_version 16400 (0.0012) [2023-09-21 12:37:04,157][72173] Fps is (10 sec: 11059.1, 60 sec: 10990.9, 300 sec: 11218.9). Total num frames: 8417280. Throughput: 0: 11008.1. Samples: 8410700. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 12:37:04,158][72173] Avg episode reward: [(0, '4101.246')] [2023-09-21 12:37:05,542][72809] Updated weights for policy 0, policy_version 16480 (0.0014) [2023-09-21 12:37:09,032][72809] Updated weights for policy 0, policy_version 16560 (0.0014) [2023-09-21 12:37:09,157][72173] Fps is (10 sec: 11469.4, 60 sec: 11059.2, 300 sec: 11246.7). Total num frames: 8478720. Throughput: 0: 11092.4. Samples: 8445952. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 12:37:09,158][72173] Avg episode reward: [(0, '4097.107')] [2023-09-21 12:37:12,512][72809] Updated weights for policy 0, policy_version 16640 (0.0014) [2023-09-21 12:37:14,157][72173] Fps is (10 sec: 11878.1, 60 sec: 11127.5, 300 sec: 11246.6). Total num frames: 8536064. Throughput: 0: 11237.1. Samples: 8517380. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 12:37:14,158][72173] Avg episode reward: [(0, '4193.353')] [2023-09-21 12:37:14,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000016672_8536064.pth... [2023-09-21 12:37:14,172][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000016000_8192000.pth [2023-09-21 12:37:16,308][72809] Updated weights for policy 0, policy_version 16720 (0.0014) [2023-09-21 12:37:19,157][72173] Fps is (10 sec: 11059.2, 60 sec: 11127.5, 300 sec: 11218.9). Total num frames: 8589312. Throughput: 0: 11248.2. Samples: 8581864. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 12:37:19,158][72173] Avg episode reward: [(0, '4311.623')] [2023-09-21 12:37:19,158][72796] Saving new best policy, reward=4311.623! [2023-09-21 12:37:19,985][72809] Updated weights for policy 0, policy_version 16800 (0.0014) [2023-09-21 12:37:23,480][72809] Updated weights for policy 0, policy_version 16880 (0.0013) [2023-09-21 12:37:24,157][72173] Fps is (10 sec: 11059.3, 60 sec: 11195.8, 300 sec: 11218.9). Total num frames: 8646656. Throughput: 0: 11382.5. Samples: 8618056. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 12:37:24,158][72173] Avg episode reward: [(0, '3967.590')] [2023-09-21 12:37:27,096][72809] Updated weights for policy 0, policy_version 16960 (0.0013) [2023-09-21 12:37:29,157][72173] Fps is (10 sec: 11468.5, 60 sec: 11195.7, 300 sec: 11205.0). Total num frames: 8704000. Throughput: 0: 11433.2. Samples: 8686376. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 12:37:29,158][72173] Avg episode reward: [(0, '3743.872')] [2023-09-21 12:37:29,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000017000_8704000.pth... [2023-09-21 12:37:29,172][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000016336_8364032.pth [2023-09-21 12:37:31,047][72809] Updated weights for policy 0, policy_version 17040 (0.0013) [2023-09-21 12:37:34,157][72173] Fps is (10 sec: 11059.1, 60 sec: 11264.0, 300 sec: 11205.0). Total num frames: 8757248. Throughput: 0: 11208.1. Samples: 8749036. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 12:37:34,158][72173] Avg episode reward: [(0, '3819.956')] [2023-09-21 12:37:34,677][72809] Updated weights for policy 0, policy_version 17120 (0.0014) [2023-09-21 12:37:38,294][72809] Updated weights for policy 0, policy_version 17200 (0.0014) [2023-09-21 12:37:39,157][72173] Fps is (10 sec: 11059.3, 60 sec: 11332.2, 300 sec: 11205.0). Total num frames: 8814592. Throughput: 0: 11243.6. Samples: 8783044. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 12:37:39,158][72173] Avg episode reward: [(0, '3800.435')] [2023-09-21 12:37:41,969][72809] Updated weights for policy 0, policy_version 17280 (0.0014) [2023-09-21 12:37:44,157][72173] Fps is (10 sec: 11468.8, 60 sec: 11332.3, 300 sec: 11191.1). Total num frames: 8871936. Throughput: 0: 11284.7. Samples: 8851368. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 12:37:44,158][72173] Avg episode reward: [(0, '4084.261')] [2023-09-21 12:37:44,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000017328_8871936.pth... [2023-09-21 12:37:44,171][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000016672_8536064.pth [2023-09-21 12:37:45,390][72809] Updated weights for policy 0, policy_version 17360 (0.0014) [2023-09-21 12:37:48,638][72809] Updated weights for policy 0, policy_version 17440 (0.0014) [2023-09-21 12:37:49,157][72173] Fps is (10 sec: 11878.4, 60 sec: 11332.3, 300 sec: 11191.1). Total num frames: 8933376. Throughput: 0: 11444.9. Samples: 8925720. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-21 12:37:49,158][72173] Avg episode reward: [(0, '3957.647')] [2023-09-21 12:37:52,587][72809] Updated weights for policy 0, policy_version 17520 (0.0012) [2023-09-21 12:37:54,157][72173] Fps is (10 sec: 11059.2, 60 sec: 11263.9, 300 sec: 11163.3). Total num frames: 8982528. Throughput: 0: 11374.4. Samples: 8957804. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:37:54,158][72173] Avg episode reward: [(0, '3898.653')] [2023-09-21 12:37:56,400][72809] Updated weights for policy 0, policy_version 17600 (0.0012) [2023-09-21 12:37:59,157][72173] Fps is (10 sec: 10649.5, 60 sec: 11264.0, 300 sec: 11191.1). Total num frames: 9039872. Throughput: 0: 11230.1. Samples: 9022732. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:37:59,158][72173] Avg episode reward: [(0, '4109.551')] [2023-09-21 12:37:59,162][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000017656_9039872.pth... [2023-09-21 12:37:59,171][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000017000_8704000.pth [2023-09-21 12:38:00,101][72809] Updated weights for policy 0, policy_version 17680 (0.0014) [2023-09-21 12:38:03,565][72809] Updated weights for policy 0, policy_version 17760 (0.0015) [2023-09-21 12:38:04,157][72173] Fps is (10 sec: 11469.0, 60 sec: 11332.3, 300 sec: 11205.0). Total num frames: 9097216. Throughput: 0: 11271.8. Samples: 9089096. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:38:04,158][72173] Avg episode reward: [(0, '4133.410')] [2023-09-21 12:38:07,475][72809] Updated weights for policy 0, policy_version 17840 (0.0014) [2023-09-21 12:38:09,157][72173] Fps is (10 sec: 11059.4, 60 sec: 11195.7, 300 sec: 11205.0). Total num frames: 9150464. Throughput: 0: 11180.4. Samples: 9121172. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:38:09,158][72173] Avg episode reward: [(0, '4159.218')] [2023-09-21 12:38:11,099][72809] Updated weights for policy 0, policy_version 17920 (0.0013) [2023-09-21 12:38:14,157][72173] Fps is (10 sec: 10649.3, 60 sec: 11127.4, 300 sec: 11205.0). Total num frames: 9203712. Throughput: 0: 11133.7. Samples: 9187392. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:38:14,158][72173] Avg episode reward: [(0, '4026.603')] [2023-09-21 12:38:14,210][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000017984_9207808.pth... [2023-09-21 12:38:14,214][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000017328_8871936.pth [2023-09-21 12:38:14,955][72809] Updated weights for policy 0, policy_version 18000 (0.0014) [2023-09-21 12:38:18,762][72809] Updated weights for policy 0, policy_version 18080 (0.0014) [2023-09-21 12:38:19,157][72173] Fps is (10 sec: 11059.1, 60 sec: 11195.7, 300 sec: 11205.0). Total num frames: 9261056. Throughput: 0: 11174.9. Samples: 9251904. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 12:38:19,158][72173] Avg episode reward: [(0, '4034.045')] [2023-09-21 12:38:22,499][72809] Updated weights for policy 0, policy_version 18160 (0.0012) [2023-09-21 12:38:24,157][72173] Fps is (10 sec: 11059.7, 60 sec: 11127.5, 300 sec: 11191.1). Total num frames: 9314304. Throughput: 0: 11168.9. Samples: 9285640. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 12:38:24,157][72173] Avg episode reward: [(0, '4071.806')] [2023-09-21 12:38:26,276][72809] Updated weights for policy 0, policy_version 18240 (0.0014) [2023-09-21 12:38:29,157][72173] Fps is (10 sec: 10649.4, 60 sec: 11059.2, 300 sec: 11177.2). Total num frames: 9367552. Throughput: 0: 11096.0. Samples: 9350688. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 12:38:29,158][72173] Avg episode reward: [(0, '4129.264')] [2023-09-21 12:38:29,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000018296_9367552.pth... [2023-09-21 12:38:29,168][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000017656_9039872.pth [2023-09-21 12:38:30,184][72809] Updated weights for policy 0, policy_version 18320 (0.0013) [2023-09-21 12:38:33,945][72809] Updated weights for policy 0, policy_version 18400 (0.0012) [2023-09-21 12:38:34,157][72173] Fps is (10 sec: 10649.3, 60 sec: 11059.2, 300 sec: 11177.2). Total num frames: 9420800. Throughput: 0: 10839.1. Samples: 9413480. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:38:34,158][72173] Avg episode reward: [(0, '4051.399')] [2023-09-21 12:38:37,935][72809] Updated weights for policy 0, policy_version 18480 (0.0013) [2023-09-21 12:38:39,157][72173] Fps is (10 sec: 10240.4, 60 sec: 10922.7, 300 sec: 11163.3). Total num frames: 9469952. Throughput: 0: 10819.0. Samples: 9444656. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:38:39,158][72173] Avg episode reward: [(0, '3758.242')] [2023-09-21 12:38:42,083][72809] Updated weights for policy 0, policy_version 18560 (0.0013) [2023-09-21 12:38:44,157][72173] Fps is (10 sec: 10240.1, 60 sec: 10854.4, 300 sec: 11149.4). Total num frames: 9523200. Throughput: 0: 10734.2. Samples: 9505772. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:38:44,158][72173] Avg episode reward: [(0, '3838.419')] [2023-09-21 12:38:44,163][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000018600_9523200.pth... [2023-09-21 12:38:44,169][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000017984_9207808.pth [2023-09-21 12:38:45,782][72809] Updated weights for policy 0, policy_version 18640 (0.0014) [2023-09-21 12:38:49,157][72173] Fps is (10 sec: 10649.4, 60 sec: 10717.9, 300 sec: 11135.6). Total num frames: 9576448. Throughput: 0: 10713.1. Samples: 9571188. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:38:49,158][72173] Avg episode reward: [(0, '3895.762')] [2023-09-21 12:38:49,656][72809] Updated weights for policy 0, policy_version 18720 (0.0013) [2023-09-21 12:38:53,261][72809] Updated weights for policy 0, policy_version 18800 (0.0014) [2023-09-21 12:38:54,157][72173] Fps is (10 sec: 11059.3, 60 sec: 10854.4, 300 sec: 11135.6). Total num frames: 9633792. Throughput: 0: 10686.7. Samples: 9602072. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:38:54,158][72173] Avg episode reward: [(0, '3881.774')] [2023-09-21 12:38:56,937][72809] Updated weights for policy 0, policy_version 18880 (0.0014) [2023-09-21 12:38:59,157][72173] Fps is (10 sec: 11059.0, 60 sec: 10786.1, 300 sec: 11121.7). Total num frames: 9687040. Throughput: 0: 10740.8. Samples: 9670728. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 12:38:59,158][72173] Avg episode reward: [(0, '3858.749')] [2023-09-21 12:38:59,165][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000018920_9687040.pth... [2023-09-21 12:38:59,172][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000018296_9367552.pth [2023-09-21 12:39:00,727][72809] Updated weights for policy 0, policy_version 18960 (0.0013) [2023-09-21 12:39:03,995][72809] Updated weights for policy 0, policy_version 19040 (0.0014) [2023-09-21 12:39:04,157][72173] Fps is (10 sec: 11468.8, 60 sec: 10854.4, 300 sec: 11149.5). Total num frames: 9748480. Throughput: 0: 10854.4. Samples: 9740352. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 12:39:04,158][72173] Avg episode reward: [(0, '3862.838')] [2023-09-21 12:39:07,639][72809] Updated weights for policy 0, policy_version 19120 (0.0015) [2023-09-21 12:39:09,157][72173] Fps is (10 sec: 11469.1, 60 sec: 10854.4, 300 sec: 11121.7). Total num frames: 9801728. Throughput: 0: 10862.3. Samples: 9774444. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-21 12:39:09,158][72173] Avg episode reward: [(0, '3780.776')] [2023-09-21 12:39:11,541][72809] Updated weights for policy 0, policy_version 19200 (0.0013) [2023-09-21 12:39:14,157][72173] Fps is (10 sec: 11058.7, 60 sec: 10922.6, 300 sec: 11135.5). Total num frames: 9859072. Throughput: 0: 10871.6. Samples: 9839912. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 12:39:14,159][72173] Avg episode reward: [(0, '3626.003')] [2023-09-21 12:39:14,167][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000019256_9859072.pth... [2023-09-21 12:39:14,174][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000018600_9523200.pth [2023-09-21 12:39:14,846][72809] Updated weights for policy 0, policy_version 19280 (0.0013) [2023-09-21 12:39:18,976][72809] Updated weights for policy 0, policy_version 19360 (0.0012) [2023-09-21 12:39:19,157][72173] Fps is (10 sec: 11059.2, 60 sec: 10854.4, 300 sec: 11121.7). Total num frames: 9912320. Throughput: 0: 10943.9. Samples: 9905952. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-21 12:39:19,158][72173] Avg episode reward: [(0, '3655.511')] [2023-09-21 12:39:22,760][72809] Updated weights for policy 0, policy_version 19440 (0.0013) [2023-09-21 12:39:24,157][72173] Fps is (10 sec: 10649.9, 60 sec: 10854.4, 300 sec: 11107.8). Total num frames: 9965568. Throughput: 0: 10968.2. Samples: 9938228. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-21 12:39:24,158][72173] Avg episode reward: [(0, '3766.262')] [2023-09-21 12:39:26,502][72809] Updated weights for policy 0, policy_version 19520 (0.0015) [2023-09-21 12:39:27,664][72796] Early stopping after 2 epochs (8 sgd steps), loss delta 0.0000000 [2023-09-21 12:39:27,666][72813] Stopping RolloutWorker_w4... [2023-09-21 12:39:27,666][72817] Stopping RolloutWorker_w1... [2023-09-21 12:39:27,666][72822] Stopping RolloutWorker_w5... [2023-09-21 12:39:27,666][72816] Stopping RolloutWorker_w2... [2023-09-21 12:39:27,666][72823] Stopping RolloutWorker_w6... [2023-09-21 12:39:27,666][72815] Stopping RolloutWorker_w3... [2023-09-21 12:39:27,666][72173] Component RolloutWorker_w4 stopped! [2023-09-21 12:39:27,666][72824] Stopping RolloutWorker_w7... [2023-09-21 12:39:27,666][72813] Loop rollout_proc4_evt_loop terminating... [2023-09-21 12:39:27,666][72810] Stopping RolloutWorker_w0... [2023-09-21 12:39:27,667][72817] Loop rollout_proc1_evt_loop terminating... [2023-09-21 12:39:27,667][72816] Loop rollout_proc2_evt_loop terminating... [2023-09-21 12:39:27,667][72822] Loop rollout_proc5_evt_loop terminating... [2023-09-21 12:39:27,667][72823] Loop rollout_proc6_evt_loop terminating... [2023-09-21 12:39:27,667][72815] Loop rollout_proc3_evt_loop terminating... [2023-09-21 12:39:27,667][72796] Stopping Batcher_0... [2023-09-21 12:39:27,667][72824] Loop rollout_proc7_evt_loop terminating... [2023-09-21 12:39:27,667][72810] Loop rollout_proc0_evt_loop terminating... [2023-09-21 12:39:27,667][72173] Component RolloutWorker_w1 stopped! [2023-09-21 12:39:27,667][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000019544_10006528.pth... [2023-09-21 12:39:27,668][72173] Component RolloutWorker_w5 stopped! [2023-09-21 12:39:27,668][72173] Component RolloutWorker_w2 stopped! [2023-09-21 12:39:27,669][72173] Component RolloutWorker_w6 stopped! [2023-09-21 12:39:27,669][72173] Component RolloutWorker_w3 stopped! [2023-09-21 12:39:27,670][72173] Component RolloutWorker_w7 stopped! [2023-09-21 12:39:27,670][72173] Component RolloutWorker_w0 stopped! [2023-09-21 12:39:27,671][72173] Component Batcher_0 stopped! [2023-09-21 12:39:27,668][72796] Loop batcher_evt_loop terminating... [2023-09-21 12:39:27,671][72796] Removing ./train_dir/Walker/checkpoint_p0/checkpoint_000018920_9687040.pth [2023-09-21 12:39:27,672][72796] Saving ./train_dir/Walker/checkpoint_p0/checkpoint_000019544_10006528.pth... [2023-09-21 12:39:27,675][72796] Stopping LearnerWorker_p0... [2023-09-21 12:39:27,675][72796] Loop learner_proc0_evt_loop terminating... [2023-09-21 12:39:27,675][72173] Component LearnerWorker_p0 stopped! [2023-09-21 12:39:27,735][72809] Weights refcount: 2 0 [2023-09-21 12:39:27,736][72809] Stopping InferenceWorker_p0-w0... [2023-09-21 12:39:27,736][72809] Loop inference_proc0-0_evt_loop terminating... [2023-09-21 12:39:27,736][72173] Component InferenceWorker_p0-w0 stopped! [2023-09-21 12:39:27,736][72173] Waiting for process learner_proc0 to stop... [2023-09-21 12:39:28,249][72173] Waiting for process inference_proc0-0 to join... [2023-09-21 12:39:28,279][72173] Waiting for process rollout_proc0 to join... [2023-09-21 12:39:28,280][72173] Waiting for process rollout_proc1 to join... [2023-09-21 12:39:28,280][72173] Waiting for process rollout_proc2 to join... [2023-09-21 12:39:28,281][72173] Waiting for process rollout_proc3 to join... [2023-09-21 12:39:28,282][72173] Waiting for process rollout_proc4 to join... [2023-09-21 12:39:28,282][72173] Waiting for process rollout_proc5 to join... [2023-09-21 12:39:28,283][72173] Waiting for process rollout_proc6 to join... [2023-09-21 12:39:28,283][72173] Waiting for process rollout_proc7 to join... [2023-09-21 12:39:28,284][72173] Batcher 0 profile tree view: batching: 4.8519, releasing_batches: 3.5739 [2023-09-21 12:39:28,285][72173] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0051 wait_policy_total: 219.4459 update_model: 11.3986 weight_update: 0.0014 one_step: 0.0012 handle_policy_step: 626.8253 deserialize: 20.0357, stack: 3.9861, obs_to_device_normalize: 122.2233, forward: 310.1299, send_messages: 50.3941 prepare_outputs: 84.8137 to_cpu: 41.1698 [2023-09-21 12:39:28,285][72173] Learner 0 profile tree view: misc: 0.0162, prepare_batch: 23.3182 train: 106.3903 epoch_init: 0.0651, minibatch_init: 1.6871, losses_postprocess: 2.8172, kl_divergence: 1.3111, after_optimizer: 1.3979 calculate_losses: 31.0489 losses_init: 0.0623, forward_head: 3.5209, bptt_initial: 0.2061, bptt: 0.2268, tail: 11.7539, advantages_returns: 1.5846, losses: 11.7971 update: 65.8504 clip: 8.1129 [2023-09-21 12:39:28,286][72173] RolloutWorker_w0 profile tree view: wait_for_trajectories: 1.0205, enqueue_policy_requests: 27.9168, env_step: 357.7840, overhead: 44.0474, complete_rollouts: 0.6943 save_policy_outputs: 77.3473 split_output_tensors: 26.9290 [2023-09-21 12:39:28,286][72173] RolloutWorker_w7 profile tree view: wait_for_trajectories: 1.0124, enqueue_policy_requests: 27.8256, env_step: 352.6212, overhead: 43.3919, complete_rollouts: 0.6843 save_policy_outputs: 75.8463 split_output_tensors: 26.5837 [2023-09-21 12:39:28,287][72173] Loop Runner_EvtLoop terminating... [2023-09-21 12:39:28,288][72173] Runner profile tree view: main_loop: 910.3339 [2023-09-21 12:39:28,289][72173] Collected {0: 10006528}, FPS: 10992.2