diff --git "a/sf_log.txt" "b/sf_log.txt"
new file mode 100644--- /dev/null
+++ "b/sf_log.txt"
@@ -0,0 +1,2255 @@
+[2023-09-19 19:40:05,557][52355] Saving configuration to ./train_dir/Pusher/config.json...
+[2023-09-19 19:40:05,673][52355] Rollout worker 0 uses device cpu
+[2023-09-19 19:40:05,673][52355] Rollout worker 1 uses device cpu
+[2023-09-19 19:40:05,674][52355] Rollout worker 2 uses device cpu
+[2023-09-19 19:40:05,674][52355] Rollout worker 3 uses device cpu
+[2023-09-19 19:40:05,674][52355] Rollout worker 4 uses device cpu
+[2023-09-19 19:40:05,674][52355] Rollout worker 5 uses device cpu
+[2023-09-19 19:40:05,675][52355] Rollout worker 6 uses device cpu
+[2023-09-19 19:40:05,675][52355] Rollout worker 7 uses device cpu
+[2023-09-19 19:40:05,675][52355] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1
+[2023-09-19 19:40:05,713][52355] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-09-19 19:40:05,713][52355] InferenceWorker_p0-w0: min num requests: 2
+[2023-09-19 19:40:05,736][52355] Starting all processes...
+[2023-09-19 19:40:05,737][52355] Starting process learner_proc0
+[2023-09-19 19:40:05,739][52355] Starting all processes...
+[2023-09-19 19:40:05,742][52355] Starting process inference_proc0-0
+[2023-09-19 19:40:05,743][52355] Starting process rollout_proc0
+[2023-09-19 19:40:05,744][52355] Starting process rollout_proc1
+[2023-09-19 19:40:05,745][52355] Starting process rollout_proc2
+[2023-09-19 19:40:05,745][52355] Starting process rollout_proc3
+[2023-09-19 19:40:05,745][52355] Starting process rollout_proc4
+[2023-09-19 19:40:05,745][52355] Starting process rollout_proc5
+[2023-09-19 19:40:05,747][52355] Starting process rollout_proc6
+[2023-09-19 19:40:05,748][52355] Starting process rollout_proc7
+[2023-09-19 19:40:07,545][52804] Worker 2 uses CPU cores [8, 9, 10, 11]
+[2023-09-19 19:40:07,569][52795] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-09-19 19:40:07,569][52795] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2023-09-19 19:40:07,584][52803] Worker 3 uses CPU cores [12, 13, 14, 15]
+[2023-09-19 19:40:07,587][52795] Num visible devices: 1
+[2023-09-19 19:40:07,597][52779] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-09-19 19:40:07,597][52779] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2023-09-19 19:40:07,603][52799] Worker 4 uses CPU cores [16, 17, 18, 19]
+[2023-09-19 19:40:07,628][52797] Worker 0 uses CPU cores [0, 1, 2, 3]
+[2023-09-19 19:40:07,636][52779] Num visible devices: 1
+[2023-09-19 19:40:07,655][52809] Worker 6 uses CPU cores [24, 25, 26, 27]
+[2023-09-19 19:40:07,664][52779] Starting seed is not provided
+[2023-09-19 19:40:07,665][52779] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-09-19 19:40:07,665][52779] Initializing actor-critic model on device cuda:0
+[2023-09-19 19:40:07,665][52779] RunningMeanStd input shape: (23,)
+[2023-09-19 19:40:07,666][52779] RunningMeanStd input shape: (1,)
+[2023-09-19 19:40:07,681][52796] Worker 1 uses CPU cores [4, 5, 6, 7]
+[2023-09-19 19:40:07,699][52806] Worker 5 uses CPU cores [20, 21, 22, 23]
+[2023-09-19 19:40:07,704][52808] Worker 7 uses CPU cores [28, 29, 30, 31]
+[2023-09-19 19:40:07,767][52779] Created Actor Critic model with architecture:
+[2023-09-19 19:40:07,767][52779] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): MultiInputEncoder(
+    (encoders): ModuleDict(
+      (obs): MlpEncoder(
+        (mlp_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=Tanh)
+          (2): RecursiveScriptModule(original_name=Linear)
+          (3): RecursiveScriptModule(original_name=Tanh)
+        )
+      )
+    )
+  )
+  (core): ModelCoreIdentity()
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=64, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationContinuousNonAdaptiveStddev(
+    (distribution_linear): Linear(in_features=64, out_features=7, bias=True)
+  )
+)
+[2023-09-19 19:40:08,307][52779] Using optimizer <class 'torch.optim.adam.Adam'>
+[2023-09-19 19:40:08,307][52779] No checkpoints found
+[2023-09-19 19:40:08,307][52779] Did not load from checkpoint, starting from scratch!
+[2023-09-19 19:40:08,308][52779] Initialized policy 0 weights for model version 0
+[2023-09-19 19:40:08,309][52779] LearnerWorker_p0 finished initialization!
+[2023-09-19 19:40:08,309][52779] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-09-19 19:40:08,881][52795] RunningMeanStd input shape: (23,)
+[2023-09-19 19:40:08,883][52795] RunningMeanStd input shape: (1,)
+[2023-09-19 19:40:08,915][52355] Inference worker 0-0 is ready!
+[2023-09-19 19:40:08,915][52355] All inference workers are ready! Signal rollout workers to start!
+[2023-09-19 19:40:09,006][52808] Decorrelating experience for 0 frames...
+[2023-09-19 19:40:09,006][52803] Decorrelating experience for 0 frames...
+[2023-09-19 19:40:09,006][52808] Decorrelating experience for 64 frames...
+[2023-09-19 19:40:09,006][52803] Decorrelating experience for 64 frames...
+[2023-09-19 19:40:09,017][52806] Decorrelating experience for 0 frames...
+[2023-09-19 19:40:09,018][52806] Decorrelating experience for 64 frames...
+[2023-09-19 19:40:09,020][52808] Decorrelating experience for 128 frames...
+[2023-09-19 19:40:09,020][52803] Decorrelating experience for 128 frames...
+[2023-09-19 19:40:09,021][52809] Decorrelating experience for 0 frames...
+[2023-09-19 19:40:09,022][52809] Decorrelating experience for 64 frames...
+[2023-09-19 19:40:09,022][52804] Decorrelating experience for 0 frames...
+[2023-09-19 19:40:09,023][52804] Decorrelating experience for 64 frames...
+[2023-09-19 19:40:09,028][52796] Decorrelating experience for 0 frames...
+[2023-09-19 19:40:09,029][52796] Decorrelating experience for 64 frames...
+[2023-09-19 19:40:09,031][52797] Decorrelating experience for 0 frames...
+[2023-09-19 19:40:09,031][52799] Decorrelating experience for 0 frames...
+[2023-09-19 19:40:09,032][52797] Decorrelating experience for 64 frames...
+[2023-09-19 19:40:09,032][52799] Decorrelating experience for 64 frames...
+[2023-09-19 19:40:09,041][52806] Decorrelating experience for 128 frames...
+[2023-09-19 19:40:09,042][52796] Decorrelating experience for 128 frames...
+[2023-09-19 19:40:09,046][52797] Decorrelating experience for 128 frames...
+[2023-09-19 19:40:09,046][52799] Decorrelating experience for 128 frames...
+[2023-09-19 19:40:09,046][52808] Decorrelating experience for 192 frames...
+[2023-09-19 19:40:09,046][52803] Decorrelating experience for 192 frames...
+[2023-09-19 19:40:09,046][52809] Decorrelating experience for 128 frames...
+[2023-09-19 19:40:09,048][52804] Decorrelating experience for 128 frames...
+[2023-09-19 19:40:09,068][52796] Decorrelating experience for 192 frames...
+[2023-09-19 19:40:09,072][52797] Decorrelating experience for 192 frames...
+[2023-09-19 19:40:09,072][52799] Decorrelating experience for 192 frames...
+[2023-09-19 19:40:09,085][52806] Decorrelating experience for 192 frames...
+[2023-09-19 19:40:09,093][52803] Decorrelating experience for 256 frames...
+[2023-09-19 19:40:09,093][52808] Decorrelating experience for 256 frames...
+[2023-09-19 19:40:09,094][52809] Decorrelating experience for 192 frames...
+[2023-09-19 19:40:09,096][52804] Decorrelating experience for 192 frames...
+[2023-09-19 19:40:09,115][52796] Decorrelating experience for 256 frames...
+[2023-09-19 19:40:09,120][52797] Decorrelating experience for 256 frames...
+[2023-09-19 19:40:09,121][52799] Decorrelating experience for 256 frames...
+[2023-09-19 19:40:09,145][52803] Decorrelating experience for 320 frames...
+[2023-09-19 19:40:09,145][52808] Decorrelating experience for 320 frames...
+[2023-09-19 19:40:09,165][52796] Decorrelating experience for 320 frames...
+[2023-09-19 19:40:09,165][52806] Decorrelating experience for 256 frames...
+[2023-09-19 19:40:09,171][52797] Decorrelating experience for 320 frames...
+[2023-09-19 19:40:09,173][52799] Decorrelating experience for 320 frames...
+[2023-09-19 19:40:09,179][52809] Decorrelating experience for 256 frames...
+[2023-09-19 19:40:09,182][52804] Decorrelating experience for 256 frames...
+[2023-09-19 19:40:09,208][52803] Decorrelating experience for 384 frames...
+[2023-09-19 19:40:09,209][52808] Decorrelating experience for 384 frames...
+[2023-09-19 19:40:09,228][52796] Decorrelating experience for 384 frames...
+[2023-09-19 19:40:09,236][52797] Decorrelating experience for 384 frames...
+[2023-09-19 19:40:09,239][52799] Decorrelating experience for 384 frames...
+[2023-09-19 19:40:09,240][52809] Decorrelating experience for 320 frames...
+[2023-09-19 19:40:09,249][52806] Decorrelating experience for 320 frames...
+[2023-09-19 19:40:09,250][52804] Decorrelating experience for 320 frames...
+[2023-09-19 19:40:09,282][52803] Decorrelating experience for 448 frames...
+[2023-09-19 19:40:09,284][52808] Decorrelating experience for 448 frames...
+[2023-09-19 19:40:09,303][52809] Decorrelating experience for 384 frames...
+[2023-09-19 19:40:09,305][52796] Decorrelating experience for 448 frames...
+[2023-09-19 19:40:09,311][52806] Decorrelating experience for 384 frames...
+[2023-09-19 19:40:09,314][52797] Decorrelating experience for 448 frames...
+[2023-09-19 19:40:09,316][52804] Decorrelating experience for 384 frames...
+[2023-09-19 19:40:09,317][52799] Decorrelating experience for 448 frames...
+[2023-09-19 19:40:09,385][52809] Decorrelating experience for 448 frames...
+[2023-09-19 19:40:09,386][52806] Decorrelating experience for 448 frames...
+[2023-09-19 19:40:09,393][52804] Decorrelating experience for 448 frames...
+[2023-09-19 19:40:11,969][52355] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4096. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 19:40:11,969][52355] Avg episode reward: [(0, '-79.524')]
+[2023-09-19 19:40:15,618][52795] Updated weights for policy 0, policy_version 80 (0.0014)
+[2023-09-19 19:40:16,969][52355] Fps is (10 sec: 9830.0, 60 sec: 9830.0, 300 sec: 9830.0). Total num frames: 53248. Throughput: 0: 6577.3. Samples: 32888. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:40:16,970][52355] Avg episode reward: [(0, '-106.562')]
+[2023-09-19 19:40:16,976][52779] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000000104_53248.pth...
+[2023-09-19 19:40:19,177][52795] Updated weights for policy 0, policy_version 160 (0.0013)
+[2023-09-19 19:40:21,969][52355] Fps is (10 sec: 10649.7, 60 sec: 10649.7, 300 sec: 10649.7). Total num frames: 110592. Throughput: 0: 10164.9. Samples: 101648. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 19:40:21,969][52355] Avg episode reward: [(0, '-78.986')]
+[2023-09-19 19:40:21,970][52779] Saving new best policy, reward=-78.986!
+[2023-09-19 19:40:22,869][52795] Updated weights for policy 0, policy_version 240 (0.0014)
+[2023-09-19 19:40:25,706][52355] Heartbeat connected on Batcher_0
+[2023-09-19 19:40:25,709][52355] Heartbeat connected on LearnerWorker_p0
+[2023-09-19 19:40:25,717][52355] Heartbeat connected on InferenceWorker_p0-w0
+[2023-09-19 19:40:25,720][52355] Heartbeat connected on RolloutWorker_w0
+[2023-09-19 19:40:25,721][52355] Heartbeat connected on RolloutWorker_w1
+[2023-09-19 19:40:25,724][52355] Heartbeat connected on RolloutWorker_w2
+[2023-09-19 19:40:25,726][52355] Heartbeat connected on RolloutWorker_w3
+[2023-09-19 19:40:25,728][52355] Heartbeat connected on RolloutWorker_w4
+[2023-09-19 19:40:25,730][52355] Heartbeat connected on RolloutWorker_w5
+[2023-09-19 19:40:25,733][52355] Heartbeat connected on RolloutWorker_w6
+[2023-09-19 19:40:25,736][52355] Heartbeat connected on RolloutWorker_w7
+[2023-09-19 19:40:26,344][52795] Updated weights for policy 0, policy_version 320 (0.0014)
+[2023-09-19 19:40:26,969][52355] Fps is (10 sec: 11469.1, 60 sec: 10922.7, 300 sec: 10922.7). Total num frames: 167936. Throughput: 0: 11441.3. Samples: 171620. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:40:26,969][52355] Avg episode reward: [(0, '-63.883')]
+[2023-09-19 19:40:26,970][52779] Saving new best policy, reward=-63.883!
+[2023-09-19 19:40:29,798][52795] Updated weights for policy 0, policy_version 400 (0.0015)
+[2023-09-19 19:40:31,969][52355] Fps is (10 sec: 11878.2, 60 sec: 11264.0, 300 sec: 11264.0). Total num frames: 229376. Throughput: 0: 10338.6. Samples: 206772. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:40:31,969][52355] Avg episode reward: [(0, '-59.327')]
+[2023-09-19 19:40:31,976][52779] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000000448_229376.pth...
+[2023-09-19 19:40:31,982][52779] Saving new best policy, reward=-59.327!
+[2023-09-19 19:40:33,380][52795] Updated weights for policy 0, policy_version 480 (0.0013)
+[2023-09-19 19:40:36,941][52795] Updated weights for policy 0, policy_version 560 (0.0010)
+[2023-09-19 19:40:36,969][52355] Fps is (10 sec: 11878.4, 60 sec: 11304.9, 300 sec: 11304.9). Total num frames: 286720. Throughput: 0: 11060.5. Samples: 276512. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
+[2023-09-19 19:40:36,969][52355] Avg episode reward: [(0, '-52.272')]
+[2023-09-19 19:40:36,970][52779] Saving new best policy, reward=-52.272!
+[2023-09-19 19:40:40,721][52795] Updated weights for policy 0, policy_version 640 (0.0015)
+[2023-09-19 19:40:41,969][52355] Fps is (10 sec: 11059.3, 60 sec: 11195.8, 300 sec: 11195.8). Total num frames: 339968. Throughput: 0: 11347.3. Samples: 340416. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:40:41,969][52355] Avg episode reward: [(0, '-49.400')]
+[2023-09-19 19:40:41,969][52779] Saving new best policy, reward=-49.400!
+[2023-09-19 19:40:44,133][52795] Updated weights for policy 0, policy_version 720 (0.0015)
+[2023-09-19 19:40:46,969][52355] Fps is (10 sec: 11059.1, 60 sec: 11234.7, 300 sec: 11234.7). Total num frames: 397312. Throughput: 0: 10807.9. Samples: 378276. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
+[2023-09-19 19:40:46,969][52355] Avg episode reward: [(0, '-48.013')]
+[2023-09-19 19:40:46,974][52779] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000000776_397312.pth...
+[2023-09-19 19:40:46,977][52779] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000000104_53248.pth
+[2023-09-19 19:40:46,978][52779] Saving new best policy, reward=-48.013!
+[2023-09-19 19:40:47,783][52795] Updated weights for policy 0, policy_version 800 (0.0016)
+[2023-09-19 19:40:51,362][52795] Updated weights for policy 0, policy_version 880 (0.0013)
+[2023-09-19 19:40:51,969][52355] Fps is (10 sec: 11468.7, 60 sec: 11264.0, 300 sec: 11264.0). Total num frames: 454656. Throughput: 0: 11161.6. Samples: 446464. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
+[2023-09-19 19:40:51,969][52355] Avg episode reward: [(0, '-45.155')]
+[2023-09-19 19:40:51,970][52779] Saving new best policy, reward=-45.155!
+[2023-09-19 19:40:54,966][52795] Updated weights for policy 0, policy_version 960 (0.0013)
+[2023-09-19 19:40:56,968][52355] Fps is (10 sec: 11469.1, 60 sec: 11286.8, 300 sec: 11286.8). Total num frames: 512000. Throughput: 0: 11408.1. Samples: 513364. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
+[2023-09-19 19:40:56,969][52355] Avg episode reward: [(0, '-44.323')]
+[2023-09-19 19:40:56,970][52779] Saving new best policy, reward=-44.323!
+[2023-09-19 19:40:58,573][52795] Updated weights for policy 0, policy_version 1040 (0.0011)
+[2023-09-19 19:41:01,969][52355] Fps is (10 sec: 11468.7, 60 sec: 11305.0, 300 sec: 11305.0). Total num frames: 569344. Throughput: 0: 11455.4. Samples: 548380. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:41:01,969][52355] Avg episode reward: [(0, '-43.599')]
+[2023-09-19 19:41:01,976][52779] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000001112_569344.pth...
+[2023-09-19 19:41:01,984][52779] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000000448_229376.pth
+[2023-09-19 19:41:01,985][52779] Saving new best policy, reward=-43.599!
+[2023-09-19 19:41:02,045][52795] Updated weights for policy 0, policy_version 1120 (0.0010)
+[2023-09-19 19:41:05,486][52795] Updated weights for policy 0, policy_version 1200 (0.0014)
+[2023-09-19 19:41:06,969][52355] Fps is (10 sec: 11468.6, 60 sec: 11319.8, 300 sec: 11319.8). Total num frames: 626688. Throughput: 0: 11490.8. Samples: 618736. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:41:06,969][52355] Avg episode reward: [(0, '-42.417')]
+[2023-09-19 19:41:06,978][52779] Saving new best policy, reward=-42.417!
+[2023-09-19 19:41:08,911][52795] Updated weights for policy 0, policy_version 1280 (0.0013)
+[2023-09-19 19:41:11,969][52355] Fps is (10 sec: 11878.5, 60 sec: 11400.5, 300 sec: 11400.5). Total num frames: 688128. Throughput: 0: 11495.5. Samples: 688916. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:41:11,969][52355] Avg episode reward: [(0, '-42.012')]
+[2023-09-19 19:41:11,970][52779] Saving new best policy, reward=-42.012!
+[2023-09-19 19:41:12,621][52795] Updated weights for policy 0, policy_version 1360 (0.0015)
+[2023-09-19 19:41:16,013][52795] Updated weights for policy 0, policy_version 1440 (0.0015)
+[2023-09-19 19:41:16,969][52355] Fps is (10 sec: 11878.4, 60 sec: 11537.1, 300 sec: 11405.8). Total num frames: 745472. Throughput: 0: 11517.4. Samples: 725056. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 19:41:16,969][52355] Avg episode reward: [(0, '-41.526')]
+[2023-09-19 19:41:16,974][52779] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000001456_745472.pth...
+[2023-09-19 19:41:16,982][52779] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000000776_397312.pth
+[2023-09-19 19:41:16,982][52779] Saving new best policy, reward=-41.526!
+[2023-09-19 19:41:19,560][52795] Updated weights for policy 0, policy_version 1520 (0.0013)
+[2023-09-19 19:41:21,969][52355] Fps is (10 sec: 11468.8, 60 sec: 11537.1, 300 sec: 11410.3). Total num frames: 802816. Throughput: 0: 11479.0. Samples: 793068. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:41:21,969][52355] Avg episode reward: [(0, '-39.984')]
+[2023-09-19 19:41:21,970][52779] Saving new best policy, reward=-39.984!
+[2023-09-19 19:41:23,356][52795] Updated weights for policy 0, policy_version 1600 (0.0014)
+[2023-09-19 19:41:26,899][52795] Updated weights for policy 0, policy_version 1680 (0.0014)
+[2023-09-19 19:41:26,969][52355] Fps is (10 sec: 11468.9, 60 sec: 11537.1, 300 sec: 11414.2). Total num frames: 860160. Throughput: 0: 11553.5. Samples: 860324. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 19:41:26,969][52355] Avg episode reward: [(0, '-39.825')]
+[2023-09-19 19:41:26,970][52779] Saving new best policy, reward=-39.825!
+[2023-09-19 19:41:30,450][52795] Updated weights for policy 0, policy_version 1760 (0.0014)
+[2023-09-19 19:41:31,969][52355] Fps is (10 sec: 11468.7, 60 sec: 11468.8, 300 sec: 11417.6). Total num frames: 917504. Throughput: 0: 11495.7. Samples: 895584. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 19:41:31,969][52355] Avg episode reward: [(0, '-39.396')]
+[2023-09-19 19:41:31,976][52779] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000001792_917504.pth...
+[2023-09-19 19:41:31,983][52779] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000001112_569344.pth
+[2023-09-19 19:41:31,984][52779] Saving new best policy, reward=-39.396!
+[2023-09-19 19:41:34,046][52795] Updated weights for policy 0, policy_version 1840 (0.0015)
+[2023-09-19 19:41:36,968][52355] Fps is (10 sec: 11469.0, 60 sec: 11468.8, 300 sec: 11420.6). Total num frames: 974848. Throughput: 0: 11517.0. Samples: 964728. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
+[2023-09-19 19:41:36,969][52355] Avg episode reward: [(0, '-40.400')]
+[2023-09-19 19:41:37,528][52795] Updated weights for policy 0, policy_version 1920 (0.0014)
+[2023-09-19 19:41:41,060][52795] Updated weights for policy 0, policy_version 2000 (0.0010)
+[2023-09-19 19:41:41,968][52355] Fps is (10 sec: 11469.1, 60 sec: 11537.1, 300 sec: 11423.3). Total num frames: 1032192. Throughput: 0: 11584.0. Samples: 1034644. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:41:41,969][52355] Avg episode reward: [(0, '-39.741')]
+[2023-09-19 19:41:44,612][52795] Updated weights for policy 0, policy_version 2080 (0.0014)
+[2023-09-19 19:41:46,969][52355] Fps is (10 sec: 11468.4, 60 sec: 11537.1, 300 sec: 11425.7). Total num frames: 1089536. Throughput: 0: 11570.7. Samples: 1069064. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:41:46,969][52355] Avg episode reward: [(0, '-39.068')]
+[2023-09-19 19:41:46,976][52779] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000002128_1089536.pth...
+[2023-09-19 19:41:46,983][52779] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000001456_745472.pth
+[2023-09-19 19:41:46,984][52779] Saving new best policy, reward=-39.068!
+[2023-09-19 19:41:48,029][52795] Updated weights for policy 0, policy_version 2160 (0.0015)
+[2023-09-19 19:41:51,619][52795] Updated weights for policy 0, policy_version 2240 (0.0014)
+[2023-09-19 19:41:51,969][52355] Fps is (10 sec: 11468.6, 60 sec: 11537.1, 300 sec: 11427.8). Total num frames: 1146880. Throughput: 0: 11566.1. Samples: 1139208. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:41:51,969][52355] Avg episode reward: [(0, '-38.956')]
+[2023-09-19 19:41:51,999][52779] Saving new best policy, reward=-38.956!
+[2023-09-19 19:41:53,905][52355] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 52355], exiting...
+[2023-09-19 19:41:53,906][52809] Stopping RolloutWorker_w6...
+[2023-09-19 19:41:53,906][52799] Stopping RolloutWorker_w4...
+[2023-09-19 19:41:53,906][52796] Stopping RolloutWorker_w1...
+[2023-09-19 19:41:53,906][52808] Stopping RolloutWorker_w7...
+[2023-09-19 19:41:53,906][52355] Runner profile tree view:
+main_loop: 108.1699
+[2023-09-19 19:41:53,906][52797] Stopping RolloutWorker_w0...
+[2023-09-19 19:41:53,906][52806] Stopping RolloutWorker_w5...
+[2023-09-19 19:41:53,906][52809] Loop rollout_proc6_evt_loop terminating...
+[2023-09-19 19:41:53,906][52799] Loop rollout_proc4_evt_loop terminating...
+[2023-09-19 19:41:53,907][52796] Loop rollout_proc1_evt_loop terminating...
+[2023-09-19 19:41:53,906][52804] Stopping RolloutWorker_w2...
+[2023-09-19 19:41:53,907][52808] Loop rollout_proc7_evt_loop terminating...
+[2023-09-19 19:41:53,907][52797] Loop rollout_proc0_evt_loop terminating...
+[2023-09-19 19:41:53,906][52355] Collected {0: 1171456}, FPS: 10829.8
+[2023-09-19 19:41:53,907][52806] Loop rollout_proc5_evt_loop terminating...
+[2023-09-19 19:41:53,907][52804] Loop rollout_proc2_evt_loop terminating...
+[2023-09-19 19:41:53,907][52779] Stopping Batcher_0...
+[2023-09-19 19:41:53,907][52803] Stopping RolloutWorker_w3...
+[2023-09-19 19:41:53,908][52779] Loop batcher_evt_loop terminating...
+[2023-09-19 19:41:53,908][52803] Loop rollout_proc3_evt_loop terminating...
+[2023-09-19 19:41:53,909][52779] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000002288_1171456.pth...
+[2023-09-19 19:41:53,916][52779] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000001792_917504.pth
+[2023-09-19 19:41:53,916][52779] Stopping LearnerWorker_p0...
+[2023-09-19 19:41:53,917][52779] Loop learner_proc0_evt_loop terminating...
+[2023-09-19 19:41:53,921][52795] Weights refcount: 2 0
+[2023-09-19 19:41:53,922][52795] Stopping InferenceWorker_p0-w0...
+[2023-09-19 19:41:53,922][52795] Loop inference_proc0-0_evt_loop terminating...
+[2023-09-19 19:43:47,377][67555] Saving configuration to ./train_dir/Pusher/config.json...
+[2023-09-19 19:43:47,412][67555] Rollout worker 0 uses device cpu
+[2023-09-19 19:43:47,413][67555] Rollout worker 1 uses device cpu
+[2023-09-19 19:43:47,414][67555] Rollout worker 2 uses device cpu
+[2023-09-19 19:43:47,414][67555] Rollout worker 3 uses device cpu
+[2023-09-19 19:43:47,415][67555] Rollout worker 4 uses device cpu
+[2023-09-19 19:43:47,416][67555] Rollout worker 5 uses device cpu
+[2023-09-19 19:43:47,416][67555] Rollout worker 6 uses device cpu
+[2023-09-19 19:43:47,417][67555] Rollout worker 7 uses device cpu
+[2023-09-19 19:43:47,417][67555] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1
+[2023-09-19 19:43:47,468][67555] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-09-19 19:43:47,469][67555] InferenceWorker_p0-w0: min num requests: 1
+[2023-09-19 19:43:47,472][67555] Using GPUs [1] for process 1 (actually maps to GPUs [1])
+[2023-09-19 19:43:47,472][67555] InferenceWorker_p1-w0: min num requests: 1
+[2023-09-19 19:43:47,496][67555] Starting all processes...
+[2023-09-19 19:43:47,496][67555] Starting process learner_proc0
+[2023-09-19 19:43:47,499][67555] Starting process learner_proc1
+[2023-09-19 19:43:47,546][67555] Starting all processes...
+[2023-09-19 19:43:47,552][67555] Starting process inference_proc0-0
+[2023-09-19 19:43:47,553][67555] Starting process inference_proc1-0
+[2023-09-19 19:43:47,553][67555] Starting process rollout_proc0
+[2023-09-19 19:43:47,553][67555] Starting process rollout_proc1
+[2023-09-19 19:43:47,554][67555] Starting process rollout_proc2
+[2023-09-19 19:43:47,557][67555] Starting process rollout_proc3
+[2023-09-19 19:43:47,560][67555] Starting process rollout_proc4
+[2023-09-19 19:43:47,568][67555] Starting process rollout_proc5
+[2023-09-19 19:43:47,569][67555] Starting process rollout_proc6
+[2023-09-19 19:43:47,569][67555] Starting process rollout_proc7
+[2023-09-19 19:43:49,442][68284] Worker 2 uses CPU cores [8, 9, 10, 11]
+[2023-09-19 19:43:49,459][68289] Worker 4 uses CPU cores [16, 17, 18, 19]
+[2023-09-19 19:43:49,460][68201] Using GPUs [1] for process 1 (actually maps to GPUs [1])
+[2023-09-19 19:43:49,460][68201] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for learning process 1
+[2023-09-19 19:43:49,467][68291] Worker 5 uses CPU cores [20, 21, 22, 23]
+[2023-09-19 19:43:49,468][68200] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-09-19 19:43:49,469][68200] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2023-09-19 19:43:49,480][68201] Num visible devices: 1
+[2023-09-19 19:43:49,487][68200] Num visible devices: 1
+[2023-09-19 19:43:49,489][68280] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-09-19 19:43:49,490][68280] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2023-09-19 19:43:49,521][68290] Worker 6 uses CPU cores [24, 25, 26, 27]
+[2023-09-19 19:43:49,528][68201] Starting seed is not provided
+[2023-09-19 19:43:49,528][68201] Using GPUs [0] for process 1 (actually maps to GPUs [1])
+[2023-09-19 19:43:49,528][68201] Initializing actor-critic model on device cuda:0
+[2023-09-19 19:43:49,528][68201] RunningMeanStd input shape: (23,)
+[2023-09-19 19:43:49,529][68201] RunningMeanStd input shape: (1,)
+[2023-09-19 19:43:49,530][68200] Starting seed is not provided
+[2023-09-19 19:43:49,531][68200] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-09-19 19:43:49,531][68200] Initializing actor-critic model on device cuda:0
+[2023-09-19 19:43:49,531][68200] RunningMeanStd input shape: (23,)
+[2023-09-19 19:43:49,532][68200] RunningMeanStd input shape: (1,)
+[2023-09-19 19:43:49,538][68280] Num visible devices: 1
+[2023-09-19 19:43:49,562][68292] Worker 7 uses CPU cores [28, 29, 30, 31]
+[2023-09-19 19:43:49,580][68281] Using GPUs [1] for process 1 (actually maps to GPUs [1])
+[2023-09-19 19:43:49,581][68281] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for inference process 1
+[2023-09-19 19:43:49,583][68201] Created Actor Critic model with architecture:
+[2023-09-19 19:43:49,583][68201] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): MultiInputEncoder(
+    (encoders): ModuleDict(
+      (obs): MlpEncoder(
+        (mlp_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=Tanh)
+          (2): RecursiveScriptModule(original_name=Linear)
+          (3): RecursiveScriptModule(original_name=Tanh)
+        )
+      )
+    )
+  )
+  (core): ModelCoreIdentity()
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=64, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationContinuousNonAdaptiveStddev(
+    (distribution_linear): Linear(in_features=64, out_features=7, bias=True)
+  )
+)
+[2023-09-19 19:43:49,602][68200] Created Actor Critic model with architecture:
+[2023-09-19 19:43:49,603][68200] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): MultiInputEncoder(
+    (encoders): ModuleDict(
+      (obs): MlpEncoder(
+        (mlp_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=Tanh)
+          (2): RecursiveScriptModule(original_name=Linear)
+          (3): RecursiveScriptModule(original_name=Tanh)
+        )
+      )
+    )
+  )
+  (core): ModelCoreIdentity()
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=64, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationContinuousNonAdaptiveStddev(
+    (distribution_linear): Linear(in_features=64, out_features=7, bias=True)
+  )
+)
+[2023-09-19 19:43:49,628][68281] Num visible devices: 1
+[2023-09-19 19:43:49,683][68286] Worker 3 uses CPU cores [12, 13, 14, 15]
+[2023-09-19 19:43:49,705][68282] Worker 0 uses CPU cores [0, 1, 2, 3]
+[2023-09-19 19:43:49,846][68283] Worker 1 uses CPU cores [4, 5, 6, 7]
+[2023-09-19 19:43:50,218][68201] Using optimizer <class 'torch.optim.adam.Adam'>
+[2023-09-19 19:43:50,219][68201] No checkpoints found
+[2023-09-19 19:43:50,219][68201] Did not load from checkpoint, starting from scratch!
+[2023-09-19 19:43:50,219][68201] Initialized policy 1 weights for model version 0
+[2023-09-19 19:43:50,221][68200] Using optimizer <class 'torch.optim.adam.Adam'>
+[2023-09-19 19:43:50,221][68201] LearnerWorker_p1 finished initialization!
+[2023-09-19 19:43:50,221][68200] Loading state from checkpoint ./train_dir/Pusher/checkpoint_p0/checkpoint_000002288_1171456.pth...
+[2023-09-19 19:43:50,221][68201] Using GPUs [0] for process 1 (actually maps to GPUs [1])
+[2023-09-19 19:43:50,226][68200] Loading model from checkpoint
+[2023-09-19 19:43:50,229][68200] Loaded experiment state at self.train_step=2288, self.env_steps=1171456
+[2023-09-19 19:43:50,229][68200] Initialized policy 0 weights for model version 2288
+[2023-09-19 19:43:50,230][68200] LearnerWorker_p0 finished initialization!
+[2023-09-19 19:43:50,230][68200] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-09-19 19:43:50,797][68281] RunningMeanStd input shape: (23,)
+[2023-09-19 19:43:50,798][68281] RunningMeanStd input shape: (1,)
+[2023-09-19 19:43:50,825][68280] RunningMeanStd input shape: (23,)
+[2023-09-19 19:43:50,826][68280] RunningMeanStd input shape: (1,)
+[2023-09-19 19:43:50,831][67555] Inference worker 1-0 is ready!
+[2023-09-19 19:43:50,857][67555] Inference worker 0-0 is ready!
+[2023-09-19 19:43:50,858][67555] All inference workers are ready! Signal rollout workers to start!
+[2023-09-19 19:43:50,948][68292] Decorrelating experience for 0 frames...
+[2023-09-19 19:43:50,948][68292] Decorrelating experience for 64 frames...
+[2023-09-19 19:43:50,953][68282] Decorrelating experience for 0 frames...
+[2023-09-19 19:43:50,953][68282] Decorrelating experience for 64 frames...
+[2023-09-19 19:43:50,954][68286] Decorrelating experience for 0 frames...
+[2023-09-19 19:43:50,955][68286] Decorrelating experience for 64 frames...
+[2023-09-19 19:43:50,955][68289] Decorrelating experience for 0 frames...
+[2023-09-19 19:43:50,956][68289] Decorrelating experience for 64 frames...
+[2023-09-19 19:43:50,960][68283] Decorrelating experience for 0 frames...
+[2023-09-19 19:43:50,961][68283] Decorrelating experience for 64 frames...
+[2023-09-19 19:43:50,961][68290] Decorrelating experience for 0 frames...
+[2023-09-19 19:43:50,962][68292] Decorrelating experience for 128 frames...
+[2023-09-19 19:43:50,962][68290] Decorrelating experience for 64 frames...
+[2023-09-19 19:43:50,962][68291] Decorrelating experience for 0 frames...
+[2023-09-19 19:43:50,963][68291] Decorrelating experience for 64 frames...
+[2023-09-19 19:43:50,967][68282] Decorrelating experience for 128 frames...
+[2023-09-19 19:43:50,967][68284] Decorrelating experience for 0 frames...
+[2023-09-19 19:43:50,968][68284] Decorrelating experience for 64 frames...
+[2023-09-19 19:43:50,969][68286] Decorrelating experience for 128 frames...
+[2023-09-19 19:43:50,970][68289] Decorrelating experience for 128 frames...
+[2023-09-19 19:43:50,975][68283] Decorrelating experience for 128 frames...
+[2023-09-19 19:43:50,986][68291] Decorrelating experience for 128 frames...
+[2023-09-19 19:43:50,987][68290] Decorrelating experience for 128 frames...
+[2023-09-19 19:43:50,988][68292] Decorrelating experience for 192 frames...
+[2023-09-19 19:43:50,993][68284] Decorrelating experience for 128 frames...
+[2023-09-19 19:43:50,993][68282] Decorrelating experience for 192 frames...
+[2023-09-19 19:43:50,996][68286] Decorrelating experience for 192 frames...
+[2023-09-19 19:43:50,997][68289] Decorrelating experience for 192 frames...
+[2023-09-19 19:43:51,002][68283] Decorrelating experience for 192 frames...
+[2023-09-19 19:43:51,029][68291] Decorrelating experience for 192 frames...
+[2023-09-19 19:43:51,034][68290] Decorrelating experience for 192 frames...
+[2023-09-19 19:43:51,036][68292] Decorrelating experience for 256 frames...
+[2023-09-19 19:43:51,040][68282] Decorrelating experience for 256 frames...
+[2023-09-19 19:43:51,041][68284] Decorrelating experience for 192 frames...
+[2023-09-19 19:43:51,044][68286] Decorrelating experience for 256 frames...
+[2023-09-19 19:43:51,045][68289] Decorrelating experience for 256 frames...
+[2023-09-19 19:43:51,050][68283] Decorrelating experience for 256 frames...
+[2023-09-19 19:43:51,080][68291] Decorrelating experience for 256 frames...
+[2023-09-19 19:43:51,087][68292] Decorrelating experience for 320 frames...
+[2023-09-19 19:43:51,090][68282] Decorrelating experience for 320 frames...
+[2023-09-19 19:43:51,096][68286] Decorrelating experience for 320 frames...
+[2023-09-19 19:43:51,096][68289] Decorrelating experience for 320 frames...
+[2023-09-19 19:43:51,101][68283] Decorrelating experience for 320 frames...
+[2023-09-19 19:43:51,102][68290] Decorrelating experience for 256 frames...
+[2023-09-19 19:43:51,112][68284] Decorrelating experience for 256 frames...
+[2023-09-19 19:43:51,130][68291] Decorrelating experience for 320 frames...
+[2023-09-19 19:43:51,150][68292] Decorrelating experience for 384 frames...
+[2023-09-19 19:43:51,152][68290] Decorrelating experience for 320 frames...
+[2023-09-19 19:43:51,153][68282] Decorrelating experience for 384 frames...
+[2023-09-19 19:43:51,160][68286] Decorrelating experience for 384 frames...
+[2023-09-19 19:43:51,160][68289] Decorrelating experience for 384 frames...
+[2023-09-19 19:43:51,165][68284] Decorrelating experience for 320 frames...
+[2023-09-19 19:43:51,165][68283] Decorrelating experience for 384 frames...
+[2023-09-19 19:43:51,191][68291] Decorrelating experience for 384 frames...
+[2023-09-19 19:43:51,214][68290] Decorrelating experience for 384 frames...
+[2023-09-19 19:43:51,226][68292] Decorrelating experience for 448 frames...
+[2023-09-19 19:43:51,228][68282] Decorrelating experience for 448 frames...
+[2023-09-19 19:43:51,231][68284] Decorrelating experience for 384 frames...
+[2023-09-19 19:43:51,237][68289] Decorrelating experience for 448 frames...
+[2023-09-19 19:43:51,237][68286] Decorrelating experience for 448 frames...
+[2023-09-19 19:43:51,241][68283] Decorrelating experience for 448 frames...
+[2023-09-19 19:43:51,265][68291] Decorrelating experience for 448 frames...
+[2023-09-19 19:43:51,289][68290] Decorrelating experience for 448 frames...
+[2023-09-19 19:43:51,312][68284] Decorrelating experience for 448 frames...
+[2023-09-19 19:43:53,687][67555] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 1179648. Throughput: 0: nan, 1: nan. Samples: 0. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:43:53,688][67555] Avg episode reward: [(0, '-29.418'), (1, '-78.373')]
+[2023-09-19 19:43:53,688][68200] Saving new best policy, reward=-29.418!
+[2023-09-19 19:43:58,687][67555] Fps is (10 sec: 8191.5, 60 sec: 8191.5, 300 sec: 8191.5). Total num frames: 1220608. Throughput: 0: 3410.2, 1: 3372.6. Samples: 33916. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 19:43:58,688][67555] Avg episode reward: [(0, '-39.392'), (1, '-110.073')]
+[2023-09-19 19:43:58,737][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000002344_1200128.pth...
+[2023-09-19 19:43:58,741][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000002128_1089536.pth
+[2023-09-19 19:43:58,748][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000000056_28672.pth...
+[2023-09-19 19:44:00,682][68280] Updated weights for policy 0, policy_version 2368 (0.0017)
+[2023-09-19 19:44:00,684][68281] Updated weights for policy 1, policy_version 80 (0.0017)
+[2023-09-19 19:44:03,687][67555] Fps is (10 sec: 10649.5, 60 sec: 10649.5, 300 sec: 10649.5). Total num frames: 1286144. Throughput: 0: 5464.1, 1: 5439.3. Samples: 109036. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 19:44:03,688][67555] Avg episode reward: [(0, '-37.888'), (1, '-99.629')]
+[2023-09-19 19:44:07,289][68281] Updated weights for policy 1, policy_version 160 (0.0015)
+[2023-09-19 19:44:07,290][68280] Updated weights for policy 0, policy_version 2448 (0.0015)
+[2023-09-19 19:44:07,455][67555] Heartbeat connected on Batcher_0
+[2023-09-19 19:44:07,458][67555] Heartbeat connected on LearnerWorker_p0
+[2023-09-19 19:44:07,462][67555] Heartbeat connected on Batcher_1
+[2023-09-19 19:44:07,464][67555] Heartbeat connected on LearnerWorker_p1
+[2023-09-19 19:44:07,470][67555] Heartbeat connected on InferenceWorker_p0-w0
+[2023-09-19 19:44:07,475][67555] Heartbeat connected on InferenceWorker_p1-w0
+[2023-09-19 19:44:07,476][67555] Heartbeat connected on RolloutWorker_w0
+[2023-09-19 19:44:07,480][67555] Heartbeat connected on RolloutWorker_w1
+[2023-09-19 19:44:07,482][67555] Heartbeat connected on RolloutWorker_w2
+[2023-09-19 19:44:07,488][67555] Heartbeat connected on RolloutWorker_w3
+[2023-09-19 19:44:07,489][67555] Heartbeat connected on RolloutWorker_w4
+[2023-09-19 19:44:07,490][67555] Heartbeat connected on RolloutWorker_w5
+[2023-09-19 19:44:07,493][67555] Heartbeat connected on RolloutWorker_w6
+[2023-09-19 19:44:07,496][67555] Heartbeat connected on RolloutWorker_w7
+[2023-09-19 19:44:08,687][67555] Fps is (10 sec: 13107.2, 60 sec: 11468.6, 300 sec: 11468.6). Total num frames: 1351680. Throughput: 0: 4914.9, 1: 4911.8. Samples: 147402. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:44:08,688][67555] Avg episode reward: [(0, '-36.555'), (1, '-83.895')]
+[2023-09-19 19:44:13,687][67555] Fps is (10 sec: 12288.0, 60 sec: 11468.8, 300 sec: 11468.8). Total num frames: 1409024. Throughput: 0: 5457.1, 1: 5446.4. Samples: 218070. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 19:44:13,688][67555] Avg episode reward: [(0, '-37.150'), (1, '-78.583')]
+[2023-09-19 19:44:13,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000002520_1290240.pth...
+[2023-09-19 19:44:13,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000000232_118784.pth...
+[2023-09-19 19:44:13,703][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000002288_1171456.pth
+[2023-09-19 19:44:13,703][68201] Saving new best policy, reward=-78.583!
+[2023-09-19 19:44:14,110][68281] Updated weights for policy 1, policy_version 240 (0.0012)
+[2023-09-19 19:44:14,110][68280] Updated weights for policy 0, policy_version 2528 (0.0009)
+[2023-09-19 19:44:18,687][67555] Fps is (10 sec: 12288.2, 60 sec: 11796.4, 300 sec: 11796.4). Total num frames: 1474560. Throughput: 0: 5900.9, 1: 5899.5. Samples: 295012. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:44:18,688][67555] Avg episode reward: [(0, '-38.075'), (1, '-69.495')]
+[2023-09-19 19:44:18,690][68201] Saving new best policy, reward=-69.495!
+[2023-09-19 19:44:20,364][68281] Updated weights for policy 1, policy_version 320 (0.0013)
+[2023-09-19 19:44:20,365][68280] Updated weights for policy 0, policy_version 2608 (0.0016)
+[2023-09-19 19:44:23,687][67555] Fps is (10 sec: 12287.9, 60 sec: 11741.8, 300 sec: 11741.8). Total num frames: 1531904. Throughput: 0: 5593.4, 1: 5592.0. Samples: 335564. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 19:44:23,688][67555] Avg episode reward: [(0, '-38.295'), (1, '-65.406')]
+[2023-09-19 19:44:23,690][68201] Saving new best policy, reward=-65.406!
+[2023-09-19 19:44:27,212][68280] Updated weights for policy 0, policy_version 2688 (0.0016)
+[2023-09-19 19:44:27,212][68281] Updated weights for policy 1, policy_version 400 (0.0012)
+[2023-09-19 19:44:28,687][67555] Fps is (10 sec: 12288.2, 60 sec: 11936.9, 300 sec: 11936.9). Total num frames: 1597440. Throughput: 0: 5797.3, 1: 5792.7. Samples: 405652. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:44:28,687][67555] Avg episode reward: [(0, '-38.760'), (1, '-62.303')]
+[2023-09-19 19:44:28,695][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000002704_1384448.pth...
+[2023-09-19 19:44:28,695][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000000416_212992.pth...
+[2023-09-19 19:44:28,703][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000002344_1200128.pth
+[2023-09-19 19:44:28,703][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000000056_28672.pth
+[2023-09-19 19:44:28,703][68201] Saving new best policy, reward=-62.303!
+[2023-09-19 19:44:33,687][67555] Fps is (10 sec: 12288.0, 60 sec: 11878.4, 300 sec: 11878.4). Total num frames: 1654784. Throughput: 0: 5946.5, 1: 5940.9. Samples: 475498. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:44:33,688][67555] Avg episode reward: [(0, '-36.678'), (1, '-58.731')]
+[2023-09-19 19:44:33,689][68201] Saving new best policy, reward=-58.731!
+[2023-09-19 19:44:34,052][68280] Updated weights for policy 0, policy_version 2768 (0.0014)
+[2023-09-19 19:44:34,053][68281] Updated weights for policy 1, policy_version 480 (0.0015)
+[2023-09-19 19:44:38,687][67555] Fps is (10 sec: 11468.7, 60 sec: 11832.9, 300 sec: 11832.9). Total num frames: 1712128. Throughput: 0: 5708.8, 1: 5704.5. Samples: 513602. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 19:44:38,688][67555] Avg episode reward: [(0, '-38.288'), (1, '-55.388')]
+[2023-09-19 19:44:38,688][68201] Saving new best policy, reward=-55.388!
+[2023-09-19 19:44:40,767][68281] Updated weights for policy 1, policy_version 560 (0.0015)
+[2023-09-19 19:44:40,767][68280] Updated weights for policy 0, policy_version 2848 (0.0013)
+[2023-09-19 19:44:43,687][67555] Fps is (10 sec: 12287.9, 60 sec: 11960.3, 300 sec: 11960.3). Total num frames: 1777664. Throughput: 0: 6157.4, 1: 6157.1. Samples: 588068. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:44:43,688][67555] Avg episode reward: [(0, '-37.910'), (1, '-50.738')]
+[2023-09-19 19:44:43,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000000592_303104.pth...
+[2023-09-19 19:44:43,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000002880_1474560.pth...
+[2023-09-19 19:44:43,700][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000000232_118784.pth
+[2023-09-19 19:44:43,701][68201] Saving new best policy, reward=-50.738!
+[2023-09-19 19:44:43,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000002520_1290240.pth
+[2023-09-19 19:44:47,305][68280] Updated weights for policy 0, policy_version 2928 (0.0014)
+[2023-09-19 19:44:47,306][68281] Updated weights for policy 1, policy_version 640 (0.0016)
+[2023-09-19 19:44:48,687][67555] Fps is (10 sec: 13106.9, 60 sec: 12064.5, 300 sec: 12064.5). Total num frames: 1843200. Throughput: 0: 6153.5, 1: 6153.9. Samples: 662872. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:44:48,688][67555] Avg episode reward: [(0, '-37.560'), (1, '-49.877')]
+[2023-09-19 19:44:48,690][68201] Saving new best policy, reward=-49.877!
+[2023-09-19 19:44:53,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12014.9, 300 sec: 12014.9). Total num frames: 1900544. Throughput: 0: 6107.8, 1: 6105.5. Samples: 697000. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:44:53,689][67555] Avg episode reward: [(0, '-36.344'), (1, '-50.457')]
+[2023-09-19 19:44:54,308][68280] Updated weights for policy 0, policy_version 3008 (0.0012)
+[2023-09-19 19:44:54,308][68281] Updated weights for policy 1, policy_version 720 (0.0015)
+[2023-09-19 19:44:58,687][67555] Fps is (10 sec: 11468.9, 60 sec: 12288.0, 300 sec: 11972.9). Total num frames: 1957888. Throughput: 0: 6127.7, 1: 6130.3. Samples: 769684. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
+[2023-09-19 19:44:58,688][67555] Avg episode reward: [(0, '-36.512'), (1, '-48.503')]
+[2023-09-19 19:44:58,695][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000003056_1564672.pth...
+[2023-09-19 19:44:58,696][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000000768_393216.pth...
+[2023-09-19 19:44:58,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000002704_1384448.pth
+[2023-09-19 19:44:58,704][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000000416_212992.pth
+[2023-09-19 19:44:58,705][68201] Saving new best policy, reward=-48.503!
+[2023-09-19 19:45:01,076][68280] Updated weights for policy 0, policy_version 3088 (0.0013)
+[2023-09-19 19:45:01,077][68281] Updated weights for policy 1, policy_version 800 (0.0014)
+[2023-09-19 19:45:03,687][67555] Fps is (10 sec: 11468.9, 60 sec: 12151.4, 300 sec: 11936.9). Total num frames: 2015232. Throughput: 0: 6080.9, 1: 6077.3. Samples: 842132. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 19:45:03,688][67555] Avg episode reward: [(0, '-38.216'), (1, '-47.198')]
+[2023-09-19 19:45:03,689][68201] Saving new best policy, reward=-47.198!
+[2023-09-19 19:45:07,805][68280] Updated weights for policy 0, policy_version 3168 (0.0015)
+[2023-09-19 19:45:07,806][68281] Updated weights for policy 1, policy_version 880 (0.0012)
+[2023-09-19 19:45:08,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12151.5, 300 sec: 12014.9). Total num frames: 2080768. Throughput: 0: 6022.8, 1: 6019.2. Samples: 877450. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
+[2023-09-19 19:45:08,688][67555] Avg episode reward: [(0, '-33.993'), (1, '-45.326')]
+[2023-09-19 19:45:08,689][68201] Saving new best policy, reward=-45.326!
+[2023-09-19 19:45:13,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12151.4, 300 sec: 11980.7). Total num frames: 2138112. Throughput: 0: 6049.5, 1: 6052.7. Samples: 950256. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 19:45:13,688][67555] Avg episode reward: [(0, '-36.698'), (1, '-43.613')]
+[2023-09-19 19:45:13,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000000944_483328.pth...
+[2023-09-19 19:45:13,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000003232_1654784.pth...
+[2023-09-19 19:45:13,706][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000000592_303104.pth
+[2023-09-19 19:45:13,707][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000002880_1474560.pth
+[2023-09-19 19:45:13,707][68201] Saving new best policy, reward=-43.613!
+[2023-09-19 19:45:14,694][68281] Updated weights for policy 1, policy_version 960 (0.0014)
+[2023-09-19 19:45:14,694][68280] Updated weights for policy 0, policy_version 3248 (0.0013)
+[2023-09-19 19:45:18,687][67555] Fps is (10 sec: 11468.6, 60 sec: 12014.9, 300 sec: 11950.6). Total num frames: 2195456. Throughput: 0: 6057.0, 1: 6058.1. Samples: 1020678. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 19:45:18,689][67555] Avg episode reward: [(0, '-36.102'), (1, '-42.213')]
+[2023-09-19 19:45:18,745][68201] Saving new best policy, reward=-42.213!
+[2023-09-19 19:45:21,441][68280] Updated weights for policy 0, policy_version 3328 (0.0012)
+[2023-09-19 19:45:21,441][68281] Updated weights for policy 1, policy_version 1040 (0.0012)
+[2023-09-19 19:45:23,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12151.5, 300 sec: 12014.9). Total num frames: 2260992. Throughput: 0: 6043.4, 1: 6043.0. Samples: 1057494. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 19:45:23,688][67555] Avg episode reward: [(0, '-36.233'), (1, '-42.467')]
+[2023-09-19 19:45:28,029][68280] Updated weights for policy 0, policy_version 3408 (0.0012)
+[2023-09-19 19:45:28,029][68281] Updated weights for policy 1, policy_version 1120 (0.0014)
+[2023-09-19 19:45:28,687][67555] Fps is (10 sec: 13107.2, 60 sec: 12151.4, 300 sec: 12072.4). Total num frames: 2326528. Throughput: 0: 6054.0, 1: 6053.6. Samples: 1132912. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:45:28,688][67555] Avg episode reward: [(0, '-34.312'), (1, '-42.326')]
+[2023-09-19 19:45:28,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000001128_577536.pth...
+[2023-09-19 19:45:28,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000003416_1748992.pth...
+[2023-09-19 19:45:28,702][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000000768_393216.pth
+[2023-09-19 19:45:28,707][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000003056_1564672.pth
+[2023-09-19 19:45:33,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12151.5, 300 sec: 12042.2). Total num frames: 2383872. Throughput: 0: 6028.0, 1: 6029.2. Samples: 1205444. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:45:33,688][67555] Avg episode reward: [(0, '-34.301'), (1, '-41.809')]
+[2023-09-19 19:45:33,690][68201] Saving new best policy, reward=-41.809!
+[2023-09-19 19:45:34,723][68281] Updated weights for policy 1, policy_version 1200 (0.0014)
+[2023-09-19 19:45:34,723][68280] Updated weights for policy 0, policy_version 3488 (0.0015)
+[2023-09-19 19:45:38,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12287.9, 300 sec: 12092.9). Total num frames: 2449408. Throughput: 0: 6082.2, 1: 6081.9. Samples: 1244384. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:45:38,688][67555] Avg episode reward: [(0, '-35.417'), (1, '-40.501')]
+[2023-09-19 19:45:38,690][68201] Saving new best policy, reward=-40.501!
+[2023-09-19 19:45:41,168][68280] Updated weights for policy 0, policy_version 3568 (0.0012)
+[2023-09-19 19:45:41,168][68281] Updated weights for policy 1, policy_version 1280 (0.0014)
+[2023-09-19 19:45:43,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.5, 300 sec: 12064.6). Total num frames: 2506752. Throughput: 0: 6102.2, 1: 6103.7. Samples: 1318950. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
+[2023-09-19 19:45:43,688][67555] Avg episode reward: [(0, '-33.777'), (1, '-41.354')]
+[2023-09-19 19:45:43,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000003592_1839104.pth...
+[2023-09-19 19:45:43,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000001304_667648.pth...
+[2023-09-19 19:45:43,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000003232_1654784.pth
+[2023-09-19 19:45:43,707][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000000944_483328.pth
+[2023-09-19 19:45:48,220][68281] Updated weights for policy 1, policy_version 1360 (0.0012)
+[2023-09-19 19:45:48,221][68280] Updated weights for policy 0, policy_version 3648 (0.0014)
+[2023-09-19 19:45:48,687][67555] Fps is (10 sec: 11469.1, 60 sec: 12015.0, 300 sec: 12038.7). Total num frames: 2564096. Throughput: 0: 6074.5, 1: 6075.5. Samples: 1388880. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:45:48,688][67555] Avg episode reward: [(0, '-36.662'), (1, '-40.910')]
+[2023-09-19 19:45:53,687][67555] Fps is (10 sec: 11468.9, 60 sec: 12015.0, 300 sec: 12014.9). Total num frames: 2621440. Throughput: 0: 6069.7, 1: 6070.3. Samples: 1423752. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 19:45:53,688][67555] Avg episode reward: [(0, '-36.083'), (1, '-41.393')]
+[2023-09-19 19:45:55,243][68280] Updated weights for policy 0, policy_version 3728 (0.0013)
+[2023-09-19 19:45:55,243][68281] Updated weights for policy 1, policy_version 1440 (0.0012)
+[2023-09-19 19:45:58,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12151.5, 300 sec: 12058.6). Total num frames: 2686976. Throughput: 0: 6052.8, 1: 6049.0. Samples: 1494836. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:45:58,688][67555] Avg episode reward: [(0, '-35.742'), (1, '-39.221')]
+[2023-09-19 19:45:58,696][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000001480_757760.pth...
+[2023-09-19 19:45:58,696][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000003768_1929216.pth...
+[2023-09-19 19:45:58,701][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000001128_577536.pth
+[2023-09-19 19:45:58,701][68201] Saving new best policy, reward=-39.221!
+[2023-09-19 19:45:58,702][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000003416_1748992.pth
+[2023-09-19 19:46:01,803][68281] Updated weights for policy 1, policy_version 1520 (0.0013)
+[2023-09-19 19:46:01,803][68280] Updated weights for policy 0, policy_version 3808 (0.0015)
+[2023-09-19 19:46:03,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12151.5, 300 sec: 12035.9). Total num frames: 2744320. Throughput: 0: 6108.1, 1: 6106.7. Samples: 1570338. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:46:03,687][67555] Avg episode reward: [(0, '-35.793'), (1, '-39.555')]
+[2023-09-19 19:46:08,358][68280] Updated weights for policy 0, policy_version 3888 (0.0014)
+[2023-09-19 19:46:08,358][68281] Updated weights for policy 1, policy_version 1600 (0.0012)
+[2023-09-19 19:46:08,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12151.5, 300 sec: 12075.6). Total num frames: 2809856. Throughput: 0: 6104.6, 1: 6104.3. Samples: 1606894. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:46:08,688][67555] Avg episode reward: [(0, '-33.763'), (1, '-40.074')]
+[2023-09-19 19:46:13,687][67555] Fps is (10 sec: 13107.0, 60 sec: 12288.1, 300 sec: 12112.4). Total num frames: 2875392. Throughput: 0: 6119.7, 1: 6120.5. Samples: 1683718. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
+[2023-09-19 19:46:13,688][67555] Avg episode reward: [(0, '-37.464'), (1, '-40.363')]
+[2023-09-19 19:46:13,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000001664_851968.pth...
+[2023-09-19 19:46:13,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000003952_2023424.pth...
+[2023-09-19 19:46:13,707][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000001304_667648.pth
+[2023-09-19 19:46:13,707][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000003592_1839104.pth
+[2023-09-19 19:46:14,878][68281] Updated weights for policy 1, policy_version 1680 (0.0012)
+[2023-09-19 19:46:14,879][68280] Updated weights for policy 0, policy_version 3968 (0.0016)
+[2023-09-19 19:46:18,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12288.0, 300 sec: 12090.2). Total num frames: 2932736. Throughput: 0: 6121.1, 1: 6120.7. Samples: 1756324. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:46:18,688][67555] Avg episode reward: [(0, '-37.896'), (1, '-39.345')]
+[2023-09-19 19:46:21,650][68280] Updated weights for policy 0, policy_version 4048 (0.0012)
+[2023-09-19 19:46:21,650][68281] Updated weights for policy 1, policy_version 1760 (0.0014)
+[2023-09-19 19:46:23,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12288.1, 300 sec: 12124.2). Total num frames: 2998272. Throughput: 0: 6102.4, 1: 6102.8. Samples: 1793612. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 19:46:23,687][67555] Avg episode reward: [(0, '-37.847'), (1, '-38.903')]
+[2023-09-19 19:46:23,688][68201] Saving new best policy, reward=-38.903!
+[2023-09-19 19:46:28,206][68281] Updated weights for policy 1, policy_version 1840 (0.0014)
+[2023-09-19 19:46:28,206][68280] Updated weights for policy 0, policy_version 4128 (0.0016)
+[2023-09-19 19:46:28,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12151.5, 300 sec: 12103.0). Total num frames: 3055616. Throughput: 0: 6097.7, 1: 6096.1. Samples: 1867674. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:46:28,688][67555] Avg episode reward: [(0, '-36.851'), (1, '-38.784')]
+[2023-09-19 19:46:28,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000004128_2113536.pth...
+[2023-09-19 19:46:28,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000001840_942080.pth...
+[2023-09-19 19:46:28,707][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000001480_757760.pth
+[2023-09-19 19:46:28,707][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000003768_1929216.pth
+[2023-09-19 19:46:28,707][68201] Saving new best policy, reward=-38.784!
+[2023-09-19 19:46:33,687][67555] Fps is (10 sec: 12287.7, 60 sec: 12288.0, 300 sec: 12134.4). Total num frames: 3121152. Throughput: 0: 6139.5, 1: 6142.6. Samples: 1941576. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:46:33,688][67555] Avg episode reward: [(0, '-36.852'), (1, '-38.858')]
+[2023-09-19 19:46:34,775][68281] Updated weights for policy 1, policy_version 1920 (0.0015)
+[2023-09-19 19:46:34,775][68280] Updated weights for policy 0, policy_version 4208 (0.0014)
+[2023-09-19 19:46:38,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12151.5, 300 sec: 12114.2). Total num frames: 3178496. Throughput: 0: 6173.8, 1: 6174.3. Samples: 1979416. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 19:46:38,688][67555] Avg episode reward: [(0, '-37.220'), (1, '-38.514')]
+[2023-09-19 19:46:38,690][68201] Saving new best policy, reward=-38.514!
+[2023-09-19 19:46:41,416][68281] Updated weights for policy 1, policy_version 2000 (0.0014)
+[2023-09-19 19:46:41,416][68280] Updated weights for policy 0, policy_version 4288 (0.0014)
+[2023-09-19 19:46:43,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12288.0, 300 sec: 12143.4). Total num frames: 3244032. Throughput: 0: 6217.7, 1: 6217.6. Samples: 2054428. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:46:43,688][67555] Avg episode reward: [(0, '-37.122'), (1, '-38.697')]
+[2023-09-19 19:46:43,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000004312_2207744.pth...
+[2023-09-19 19:46:43,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000002024_1036288.pth...
+[2023-09-19 19:46:43,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000003952_2023424.pth
+[2023-09-19 19:46:43,706][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000001664_851968.pth
+[2023-09-19 19:46:47,971][68281] Updated weights for policy 1, policy_version 2080 (0.0011)
+[2023-09-19 19:46:47,971][68280] Updated weights for policy 0, policy_version 4368 (0.0013)
+[2023-09-19 19:46:48,687][67555] Fps is (10 sec: 13107.4, 60 sec: 12424.5, 300 sec: 12171.0). Total num frames: 3309568. Throughput: 0: 6210.7, 1: 6211.1. Samples: 2129320. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
+[2023-09-19 19:46:48,688][67555] Avg episode reward: [(0, '-37.156'), (1, '-37.923')]
+[2023-09-19 19:46:48,688][68201] Saving new best policy, reward=-37.923!
+[2023-09-19 19:46:53,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12424.5, 300 sec: 12151.5). Total num frames: 3366912. Throughput: 0: 6232.2, 1: 6233.3. Samples: 2167844. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:46:53,688][67555] Avg episode reward: [(0, '-35.879'), (1, '-37.444')]
+[2023-09-19 19:46:53,689][68201] Saving new best policy, reward=-37.444!
+[2023-09-19 19:46:54,705][68281] Updated weights for policy 1, policy_version 2160 (0.0013)
+[2023-09-19 19:46:54,705][68280] Updated weights for policy 0, policy_version 4448 (0.0015)
+[2023-09-19 19:46:58,687][67555] Fps is (10 sec: 11468.4, 60 sec: 12287.9, 300 sec: 12133.0). Total num frames: 3424256. Throughput: 0: 6156.7, 1: 6156.7. Samples: 2237824. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:46:58,688][67555] Avg episode reward: [(0, '-36.737'), (1, '-38.154')]
+[2023-09-19 19:46:58,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000002208_1130496.pth...
+[2023-09-19 19:46:58,700][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000004496_2301952.pth...
+[2023-09-19 19:46:58,702][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000001840_942080.pth
+[2023-09-19 19:46:58,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000004128_2113536.pth
+[2023-09-19 19:47:01,286][68280] Updated weights for policy 0, policy_version 4528 (0.0014)
+[2023-09-19 19:47:01,286][68281] Updated weights for policy 1, policy_version 2240 (0.0013)
+[2023-09-19 19:47:03,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12424.5, 300 sec: 12158.7). Total num frames: 3489792. Throughput: 0: 6193.1, 1: 6192.3. Samples: 2313668. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:47:03,688][67555] Avg episode reward: [(0, '-36.721'), (1, '-37.375')]
+[2023-09-19 19:47:03,689][68201] Saving new best policy, reward=-37.375!
+[2023-09-19 19:47:08,095][68281] Updated weights for policy 1, policy_version 2320 (0.0014)
+[2023-09-19 19:47:08,095][68280] Updated weights for policy 0, policy_version 4608 (0.0013)
+[2023-09-19 19:47:08,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12287.9, 300 sec: 12140.9). Total num frames: 3547136. Throughput: 0: 6184.7, 1: 6183.8. Samples: 2350198. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
+[2023-09-19 19:47:08,688][67555] Avg episode reward: [(0, '-37.640'), (1, '-37.586')]
+[2023-09-19 19:47:13,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12288.0, 300 sec: 12165.1). Total num frames: 3612672. Throughput: 0: 6156.2, 1: 6154.0. Samples: 2421628. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:47:13,688][67555] Avg episode reward: [(0, '-37.766'), (1, '-36.365')]
+[2023-09-19 19:47:13,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000004672_2392064.pth...
+[2023-09-19 19:47:13,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000002384_1220608.pth...
+[2023-09-19 19:47:13,704][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000002024_1036288.pth
+[2023-09-19 19:47:13,705][68201] Saving new best policy, reward=-36.365!
+[2023-09-19 19:47:13,707][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000004312_2207744.pth
+[2023-09-19 19:47:14,957][68280] Updated weights for policy 0, policy_version 4688 (0.0014)
+[2023-09-19 19:47:14,957][68281] Updated weights for policy 1, policy_version 2400 (0.0012)
+[2023-09-19 19:47:18,687][67555] Fps is (10 sec: 12288.4, 60 sec: 12288.0, 300 sec: 12148.1). Total num frames: 3670016. Throughput: 0: 6120.4, 1: 6117.0. Samples: 2492258. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 19:47:18,687][67555] Avg episode reward: [(0, '-36.654'), (1, '-35.514')]
+[2023-09-19 19:47:18,688][68201] Saving new best policy, reward=-35.514!
+[2023-09-19 19:47:21,843][68280] Updated weights for policy 0, policy_version 4768 (0.0013)
+[2023-09-19 19:47:21,843][68281] Updated weights for policy 1, policy_version 2480 (0.0014)
+[2023-09-19 19:47:23,687][67555] Fps is (10 sec: 11468.9, 60 sec: 12151.5, 300 sec: 12132.0). Total num frames: 3727360. Throughput: 0: 6100.5, 1: 6099.6. Samples: 2528420. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 19:47:23,688][67555] Avg episode reward: [(0, '-37.327'), (1, '-35.147')]
+[2023-09-19 19:47:23,689][68201] Saving new best policy, reward=-35.147!
+[2023-09-19 19:47:28,585][68280] Updated weights for policy 0, policy_version 4848 (0.0011)
+[2023-09-19 19:47:28,586][68281] Updated weights for policy 1, policy_version 2560 (0.0013)
+[2023-09-19 19:47:28,687][67555] Fps is (10 sec: 12287.6, 60 sec: 12288.0, 300 sec: 12154.6). Total num frames: 3792896. Throughput: 0: 6056.5, 1: 6056.4. Samples: 2599510. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:47:28,688][67555] Avg episode reward: [(0, '-38.094'), (1, '-35.482')]
+[2023-09-19 19:47:28,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000002560_1310720.pth...
+[2023-09-19 19:47:28,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000004848_2482176.pth...
+[2023-09-19 19:47:28,703][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000002208_1130496.pth
+[2023-09-19 19:47:28,705][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000004496_2301952.pth
+[2023-09-19 19:47:33,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12151.5, 300 sec: 12139.0). Total num frames: 3850240. Throughput: 0: 6062.3, 1: 6061.2. Samples: 2674878. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:47:33,688][67555] Avg episode reward: [(0, '-37.131'), (1, '-35.727')]
+[2023-09-19 19:47:35,246][68280] Updated weights for policy 0, policy_version 4928 (0.0014)
+[2023-09-19 19:47:35,247][68281] Updated weights for policy 1, policy_version 2640 (0.0013)
+[2023-09-19 19:47:38,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12288.0, 300 sec: 12160.6). Total num frames: 3915776. Throughput: 0: 6038.9, 1: 6041.6. Samples: 2711468. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:47:38,688][67555] Avg episode reward: [(0, '-37.206'), (1, '-35.617')]
+[2023-09-19 19:47:41,983][68281] Updated weights for policy 1, policy_version 2720 (0.0014)
+[2023-09-19 19:47:41,983][68280] Updated weights for policy 0, policy_version 5008 (0.0011)
+[2023-09-19 19:47:43,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.5, 300 sec: 12145.5). Total num frames: 3973120. Throughput: 0: 6078.9, 1: 6081.2. Samples: 2785032. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:47:43,688][67555] Avg episode reward: [(0, '-38.774'), (1, '-35.344')]
+[2023-09-19 19:47:43,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000005024_2572288.pth...
+[2023-09-19 19:47:43,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000002736_1400832.pth...
+[2023-09-19 19:47:43,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000004672_2392064.pth
+[2023-09-19 19:47:43,705][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000002384_1220608.pth
+[2023-09-19 19:47:48,687][67555] Fps is (10 sec: 11468.8, 60 sec: 12014.9, 300 sec: 12131.1). Total num frames: 4030464. Throughput: 0: 6012.1, 1: 6012.3. Samples: 2854766. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
+[2023-09-19 19:47:48,688][67555] Avg episode reward: [(0, '-38.294'), (1, '-36.190')]
+[2023-09-19 19:47:48,966][68280] Updated weights for policy 0, policy_version 5088 (0.0013)
+[2023-09-19 19:47:48,967][68281] Updated weights for policy 1, policy_version 2800 (0.0012)
+[2023-09-19 19:47:53,687][67555] Fps is (10 sec: 11469.0, 60 sec: 12014.9, 300 sec: 12117.3). Total num frames: 4087808. Throughput: 0: 6015.5, 1: 6019.1. Samples: 2891750. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 19:47:53,688][67555] Avg episode reward: [(0, '-39.474'), (1, '-35.152')]
+[2023-09-19 19:47:55,766][68281] Updated weights for policy 1, policy_version 2880 (0.0014)
+[2023-09-19 19:47:55,766][68280] Updated weights for policy 0, policy_version 5168 (0.0012)
+[2023-09-19 19:47:58,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.5, 300 sec: 12137.5). Total num frames: 4153344. Throughput: 0: 6004.0, 1: 6004.3. Samples: 2962006. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:47:58,688][67555] Avg episode reward: [(0, '-38.480'), (1, '-35.145')]
+[2023-09-19 19:47:58,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000005200_2662400.pth...
+[2023-09-19 19:47:58,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000002912_1490944.pth...
+[2023-09-19 19:47:58,702][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000004848_2482176.pth
+[2023-09-19 19:47:58,705][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000002560_1310720.pth
+[2023-09-19 19:47:58,706][68201] Saving new best policy, reward=-35.145!
+[2023-09-19 19:48:02,652][68280] Updated weights for policy 0, policy_version 5248 (0.0012)
+[2023-09-19 19:48:02,653][68281] Updated weights for policy 1, policy_version 2960 (0.0012)
+[2023-09-19 19:48:03,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12014.9, 300 sec: 12124.2). Total num frames: 4210688. Throughput: 0: 6012.8, 1: 6013.1. Samples: 3033422. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:48:03,688][67555] Avg episode reward: [(0, '-38.036'), (1, '-35.560')]
+[2023-09-19 19:48:08,687][67555] Fps is (10 sec: 11469.0, 60 sec: 12015.0, 300 sec: 12111.3). Total num frames: 4268032. Throughput: 0: 6025.8, 1: 6027.0. Samples: 3070798. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:48:08,688][67555] Avg episode reward: [(0, '-37.632'), (1, '-34.924')]
+[2023-09-19 19:48:08,711][68201] Saving new best policy, reward=-34.924!
+[2023-09-19 19:48:09,384][68280] Updated weights for policy 0, policy_version 5328 (0.0011)
+[2023-09-19 19:48:09,385][68281] Updated weights for policy 1, policy_version 3040 (0.0015)
+[2023-09-19 19:48:13,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12014.9, 300 sec: 12130.5). Total num frames: 4333568. Throughput: 0: 6041.3, 1: 6040.6. Samples: 3143196. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
+[2023-09-19 19:48:13,688][67555] Avg episode reward: [(0, '-37.472'), (1, '-34.476')]
+[2023-09-19 19:48:13,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000005376_2752512.pth...
+[2023-09-19 19:48:13,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000003088_1581056.pth...
+[2023-09-19 19:48:13,705][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000005024_2572288.pth
+[2023-09-19 19:48:13,705][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000002736_1400832.pth
+[2023-09-19 19:48:13,706][68201] Saving new best policy, reward=-34.476!
+[2023-09-19 19:48:16,036][68280] Updated weights for policy 0, policy_version 5408 (0.0015)
+[2023-09-19 19:48:16,036][68281] Updated weights for policy 1, policy_version 3120 (0.0016)
+[2023-09-19 19:48:18,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12014.9, 300 sec: 12118.0). Total num frames: 4390912. Throughput: 0: 6021.7, 1: 6022.0. Samples: 3216842. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
+[2023-09-19 19:48:18,688][67555] Avg episode reward: [(0, '-38.237'), (1, '-33.533')]
+[2023-09-19 19:48:18,688][68201] Saving new best policy, reward=-33.533!
+[2023-09-19 19:48:22,744][68281] Updated weights for policy 1, policy_version 3200 (0.0011)
+[2023-09-19 19:48:22,744][68280] Updated weights for policy 0, policy_version 5488 (0.0012)
+[2023-09-19 19:48:23,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12151.5, 300 sec: 12136.3). Total num frames: 4456448. Throughput: 0: 6029.0, 1: 6025.4. Samples: 3253918. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:48:23,688][67555] Avg episode reward: [(0, '-38.436'), (1, '-33.889')]
+[2023-09-19 19:48:28,687][67555] Fps is (10 sec: 13107.1, 60 sec: 12151.5, 300 sec: 12153.9). Total num frames: 4521984. Throughput: 0: 6026.4, 1: 6023.4. Samples: 3327272. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 19:48:28,688][67555] Avg episode reward: [(0, '-36.515'), (1, '-33.751')]
+[2023-09-19 19:48:28,695][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000003272_1675264.pth...
+[2023-09-19 19:48:28,696][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000005560_2846720.pth...
+[2023-09-19 19:48:28,700][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000002912_1490944.pth
+[2023-09-19 19:48:28,703][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000005200_2662400.pth
+[2023-09-19 19:48:29,170][68280] Updated weights for policy 0, policy_version 5568 (0.0011)
+[2023-09-19 19:48:29,171][68281] Updated weights for policy 1, policy_version 3280 (0.0014)
+[2023-09-19 19:48:33,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12151.5, 300 sec: 12141.7). Total num frames: 4579328. Throughput: 0: 6090.4, 1: 6090.7. Samples: 3402914. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 19:48:33,688][67555] Avg episode reward: [(0, '-37.061'), (1, '-34.048')]
+[2023-09-19 19:48:35,894][68280] Updated weights for policy 0, policy_version 5648 (0.0014)
+[2023-09-19 19:48:35,894][68281] Updated weights for policy 1, policy_version 3360 (0.0015)
+[2023-09-19 19:48:38,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.5, 300 sec: 12158.6). Total num frames: 4644864. Throughput: 0: 6099.1, 1: 6099.1. Samples: 3440674. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:48:38,688][67555] Avg episode reward: [(0, '-37.325'), (1, '-34.764')]
+[2023-09-19 19:48:42,564][68280] Updated weights for policy 0, policy_version 5728 (0.0012)
+[2023-09-19 19:48:42,564][68281] Updated weights for policy 1, policy_version 3440 (0.0014)
+[2023-09-19 19:48:43,687][67555] Fps is (10 sec: 12287.7, 60 sec: 12151.5, 300 sec: 12146.7). Total num frames: 4702208. Throughput: 0: 6136.1, 1: 6140.1. Samples: 3514434. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:48:43,688][67555] Avg episode reward: [(0, '-37.036'), (1, '-35.818')]
+[2023-09-19 19:48:43,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000003448_1765376.pth...
+[2023-09-19 19:48:43,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000005736_2936832.pth...
+[2023-09-19 19:48:43,704][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000003088_1581056.pth
+[2023-09-19 19:48:43,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000005376_2752512.pth
+[2023-09-19 19:48:48,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12288.0, 300 sec: 12163.0). Total num frames: 4767744. Throughput: 0: 6166.4, 1: 6165.8. Samples: 3588374. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:48:48,688][67555] Avg episode reward: [(0, '-35.359'), (1, '-36.017')]
+[2023-09-19 19:48:49,175][68281] Updated weights for policy 1, policy_version 3520 (0.0011)
+[2023-09-19 19:48:49,175][68280] Updated weights for policy 0, policy_version 5808 (0.0014)
+[2023-09-19 19:48:53,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12288.0, 300 sec: 12218.6). Total num frames: 4825088. Throughput: 0: 6155.6, 1: 6155.2. Samples: 3624784. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:48:53,688][67555] Avg episode reward: [(0, '-36.565'), (1, '-36.082')]
+[2023-09-19 19:48:55,908][68281] Updated weights for policy 1, policy_version 3600 (0.0011)
+[2023-09-19 19:48:55,908][68280] Updated weights for policy 0, policy_version 5888 (0.0012)
+[2023-09-19 19:48:58,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12288.1, 300 sec: 12218.6). Total num frames: 4890624. Throughput: 0: 6168.4, 1: 6168.5. Samples: 3698354. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:48:58,688][67555] Avg episode reward: [(0, '-39.417'), (1, '-35.635')]
+[2023-09-19 19:48:58,696][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000003632_1859584.pth...
+[2023-09-19 19:48:58,696][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000005920_3031040.pth...
+[2023-09-19 19:48:58,701][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000003272_1675264.pth
+[2023-09-19 19:48:58,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000005560_2846720.pth
+[2023-09-19 19:49:02,474][68281] Updated weights for policy 1, policy_version 3680 (0.0013)
+[2023-09-19 19:49:02,475][68280] Updated weights for policy 0, policy_version 5968 (0.0012)
+[2023-09-19 19:49:03,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12288.0, 300 sec: 12190.8). Total num frames: 4947968. Throughput: 0: 6180.6, 1: 6182.1. Samples: 3773166. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:49:03,688][67555] Avg episode reward: [(0, '-38.110'), (1, '-34.512')]
+[2023-09-19 19:49:08,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12424.5, 300 sec: 12218.6). Total num frames: 5013504. Throughput: 0: 6173.9, 1: 6174.5. Samples: 3809598. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:49:08,688][67555] Avg episode reward: [(0, '-36.970'), (1, '-34.670')]
+[2023-09-19 19:49:09,349][68281] Updated weights for policy 1, policy_version 3760 (0.0011)
+[2023-09-19 19:49:09,350][68280] Updated weights for policy 0, policy_version 6048 (0.0012)
+[2023-09-19 19:49:13,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12288.0, 300 sec: 12190.8). Total num frames: 5070848. Throughput: 0: 6162.9, 1: 6162.5. Samples: 3881916. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 19:49:13,688][67555] Avg episode reward: [(0, '-36.670'), (1, '-35.140')]
+[2023-09-19 19:49:13,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000006096_3121152.pth...
+[2023-09-19 19:49:13,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000003808_1949696.pth...
+[2023-09-19 19:49:13,705][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000005736_2936832.pth
+[2023-09-19 19:49:13,705][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000003448_1765376.pth
+[2023-09-19 19:49:15,931][68280] Updated weights for policy 0, policy_version 6128 (0.0014)
+[2023-09-19 19:49:15,932][68281] Updated weights for policy 1, policy_version 3840 (0.0014)
+[2023-09-19 19:49:18,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12424.5, 300 sec: 12218.6). Total num frames: 5136384. Throughput: 0: 6152.0, 1: 6155.8. Samples: 3956770. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 19:49:18,688][67555] Avg episode reward: [(0, '-35.922'), (1, '-34.418')]
+[2023-09-19 19:49:22,601][68281] Updated weights for policy 1, policy_version 3920 (0.0015)
+[2023-09-19 19:49:22,601][68280] Updated weights for policy 0, policy_version 6208 (0.0013)
+[2023-09-19 19:49:23,687][67555] Fps is (10 sec: 12288.3, 60 sec: 12288.0, 300 sec: 12190.8). Total num frames: 5193728. Throughput: 0: 6144.3, 1: 6140.4. Samples: 3993484. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 19:49:23,687][67555] Avg episode reward: [(0, '-36.112'), (1, '-34.312')]
+[2023-09-19 19:49:28,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12288.0, 300 sec: 12218.6). Total num frames: 5259264. Throughput: 0: 6116.7, 1: 6112.1. Samples: 4064730. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:49:28,688][67555] Avg episode reward: [(0, '-37.653'), (1, '-33.914')]
+[2023-09-19 19:49:28,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000006280_3215360.pth...
+[2023-09-19 19:49:28,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000003992_2043904.pth...
+[2023-09-19 19:49:28,705][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000003632_1859584.pth
+[2023-09-19 19:49:28,709][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000005920_3031040.pth
+[2023-09-19 19:49:29,266][68281] Updated weights for policy 1, policy_version 4000 (0.0012)
+[2023-09-19 19:49:29,268][68280] Updated weights for policy 0, policy_version 6288 (0.0015)
+[2023-09-19 19:49:33,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12288.0, 300 sec: 12218.6). Total num frames: 5316608. Throughput: 0: 6137.6, 1: 6136.8. Samples: 4140724. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 19:49:33,688][67555] Avg episode reward: [(0, '-37.849'), (1, '-35.267')]
+[2023-09-19 19:49:35,702][68281] Updated weights for policy 1, policy_version 4080 (0.0012)
+[2023-09-19 19:49:35,703][68280] Updated weights for policy 0, policy_version 6368 (0.0013)
+[2023-09-19 19:49:38,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12288.0, 300 sec: 12218.6). Total num frames: 5382144. Throughput: 0: 6172.0, 1: 6170.3. Samples: 4180188. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:49:38,688][67555] Avg episode reward: [(0, '-38.625'), (1, '-34.601')]
+[2023-09-19 19:49:42,667][68281] Updated weights for policy 1, policy_version 4160 (0.0013)
+[2023-09-19 19:49:42,668][68280] Updated weights for policy 0, policy_version 6448 (0.0013)
+[2023-09-19 19:49:43,687][67555] Fps is (10 sec: 12288.4, 60 sec: 12288.1, 300 sec: 12190.8). Total num frames: 5439488. Throughput: 0: 6144.1, 1: 6147.9. Samples: 4251492. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:49:43,687][67555] Avg episode reward: [(0, '-36.141'), (1, '-34.736')]
+[2023-09-19 19:49:43,696][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000006456_3305472.pth...
+[2023-09-19 19:49:43,696][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000004168_2134016.pth...
+[2023-09-19 19:49:43,700][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000006096_3121152.pth
+[2023-09-19 19:49:43,702][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000003808_1949696.pth
+[2023-09-19 19:49:48,687][67555] Fps is (10 sec: 11469.2, 60 sec: 12151.5, 300 sec: 12190.8). Total num frames: 5496832. Throughput: 0: 6089.9, 1: 6089.4. Samples: 4321230. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:49:48,687][67555] Avg episode reward: [(0, '-34.951'), (1, '-35.583')]
+[2023-09-19 19:49:49,618][68280] Updated weights for policy 0, policy_version 6528 (0.0014)
+[2023-09-19 19:49:49,619][68281] Updated weights for policy 1, policy_version 4240 (0.0014)
+[2023-09-19 19:49:53,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12288.0, 300 sec: 12218.6). Total num frames: 5562368. Throughput: 0: 6092.7, 1: 6095.8. Samples: 4358080. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:49:53,688][67555] Avg episode reward: [(0, '-37.300'), (1, '-35.275')]
+[2023-09-19 19:49:56,433][68280] Updated weights for policy 0, policy_version 6608 (0.0013)
+[2023-09-19 19:49:56,433][68281] Updated weights for policy 1, policy_version 4320 (0.0014)
+[2023-09-19 19:49:58,687][67555] Fps is (10 sec: 12287.4, 60 sec: 12151.4, 300 sec: 12218.6). Total num frames: 5619712. Throughput: 0: 6088.2, 1: 6089.0. Samples: 4429890. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 19:49:58,688][67555] Avg episode reward: [(0, '-35.316'), (1, '-35.074')]
+[2023-09-19 19:49:58,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000004344_2224128.pth...
+[2023-09-19 19:49:58,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000006632_3395584.pth...
+[2023-09-19 19:49:58,702][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000003992_2043904.pth
+[2023-09-19 19:49:58,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000006280_3215360.pth
+[2023-09-19 19:50:02,993][68280] Updated weights for policy 0, policy_version 6688 (0.0014)
+[2023-09-19 19:50:02,993][68281] Updated weights for policy 1, policy_version 4400 (0.0014)
+[2023-09-19 19:50:03,687][67555] Fps is (10 sec: 11878.3, 60 sec: 12219.7, 300 sec: 12204.7). Total num frames: 5681152. Throughput: 0: 6094.9, 1: 6093.5. Samples: 4505244. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 19:50:03,688][67555] Avg episode reward: [(0, '-36.488'), (1, '-35.441')]
+[2023-09-19 19:50:08,687][67555] Fps is (10 sec: 12288.4, 60 sec: 12151.5, 300 sec: 12218.6). Total num frames: 5742592. Throughput: 0: 6083.9, 1: 6084.1. Samples: 4541042. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 19:50:08,688][67555] Avg episode reward: [(0, '-37.134'), (1, '-34.226')]
+[2023-09-19 19:50:09,859][68280] Updated weights for policy 0, policy_version 6768 (0.0014)
+[2023-09-19 19:50:09,860][68281] Updated weights for policy 1, policy_version 4480 (0.0015)
+[2023-09-19 19:50:13,687][67555] Fps is (10 sec: 11878.2, 60 sec: 12151.5, 300 sec: 12218.6). Total num frames: 5799936. Throughput: 0: 6076.2, 1: 6075.9. Samples: 4611576. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:50:13,688][67555] Avg episode reward: [(0, '-37.017'), (1, '-34.516')]
+[2023-09-19 19:50:13,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000006808_3485696.pth...
+[2023-09-19 19:50:13,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000004520_2314240.pth...
+[2023-09-19 19:50:13,701][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000006456_3305472.pth
+[2023-09-19 19:50:13,704][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000004168_2134016.pth
+[2023-09-19 19:50:16,733][68281] Updated weights for policy 1, policy_version 4560 (0.0012)
+[2023-09-19 19:50:16,735][68280] Updated weights for policy 0, policy_version 6848 (0.0014)
+[2023-09-19 19:50:18,687][67555] Fps is (10 sec: 11468.6, 60 sec: 12014.9, 300 sec: 12190.8). Total num frames: 5857280. Throughput: 0: 6036.8, 1: 6037.7. Samples: 4684078. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 19:50:18,688][67555] Avg episode reward: [(0, '-37.908'), (1, '-35.130')]
+[2023-09-19 19:50:23,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12151.5, 300 sec: 12190.8). Total num frames: 5922816. Throughput: 0: 5979.7, 1: 5983.4. Samples: 4718524. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 19:50:23,688][67555] Avg episode reward: [(0, '-36.770'), (1, '-35.026')]
+[2023-09-19 19:50:23,689][68281] Updated weights for policy 1, policy_version 4640 (0.0013)
+[2023-09-19 19:50:23,690][68280] Updated weights for policy 0, policy_version 6928 (0.0014)
+[2023-09-19 19:50:28,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12015.0, 300 sec: 12190.8). Total num frames: 5980160. Throughput: 0: 5989.1, 1: 5986.8. Samples: 4790412. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:50:28,688][67555] Avg episode reward: [(0, '-37.593'), (1, '-35.265')]
+[2023-09-19 19:50:28,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000006984_3575808.pth...
+[2023-09-19 19:50:28,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000004696_2404352.pth...
+[2023-09-19 19:50:28,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000006632_3395584.pth
+[2023-09-19 19:50:28,707][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000004344_2224128.pth
+[2023-09-19 19:50:30,369][68281] Updated weights for policy 1, policy_version 4720 (0.0014)
+[2023-09-19 19:50:30,369][68280] Updated weights for policy 0, policy_version 7008 (0.0014)
+[2023-09-19 19:50:33,687][67555] Fps is (10 sec: 11468.7, 60 sec: 12014.9, 300 sec: 12163.0). Total num frames: 6037504. Throughput: 0: 6043.9, 1: 6045.4. Samples: 4865252. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:50:33,688][67555] Avg episode reward: [(0, '-35.771'), (1, '-35.894')]
+[2023-09-19 19:50:37,057][68280] Updated weights for policy 0, policy_version 7088 (0.0014)
+[2023-09-19 19:50:37,057][68281] Updated weights for policy 1, policy_version 4800 (0.0014)
+[2023-09-19 19:50:38,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12014.9, 300 sec: 12190.8). Total num frames: 6103040. Throughput: 0: 6035.3, 1: 6031.2. Samples: 4901072. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:50:38,688][67555] Avg episode reward: [(0, '-36.738'), (1, '-33.815')]
+[2023-09-19 19:50:43,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12014.9, 300 sec: 12190.8). Total num frames: 6160384. Throughput: 0: 6024.3, 1: 6025.5. Samples: 4972128. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:50:43,688][67555] Avg episode reward: [(0, '-38.003'), (1, '-33.963')]
+[2023-09-19 19:50:43,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000007160_3665920.pth...
+[2023-09-19 19:50:43,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000004872_2494464.pth...
+[2023-09-19 19:50:43,701][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000006808_3485696.pth
+[2023-09-19 19:50:43,703][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000004520_2314240.pth
+[2023-09-19 19:50:44,040][68281] Updated weights for policy 1, policy_version 4880 (0.0017)
+[2023-09-19 19:50:44,040][68280] Updated weights for policy 0, policy_version 7168 (0.0011)
+[2023-09-19 19:50:48,687][67555] Fps is (10 sec: 11469.0, 60 sec: 12014.9, 300 sec: 12190.8). Total num frames: 6217728. Throughput: 0: 5993.4, 1: 5990.9. Samples: 5044534. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 19:50:48,688][67555] Avg episode reward: [(0, '-35.409'), (1, '-34.356')]
+[2023-09-19 19:50:50,718][68280] Updated weights for policy 0, policy_version 7248 (0.0012)
+[2023-09-19 19:50:50,719][68281] Updated weights for policy 1, policy_version 4960 (0.0011)
+[2023-09-19 19:50:53,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12014.9, 300 sec: 12190.8). Total num frames: 6283264. Throughput: 0: 5994.6, 1: 5995.4. Samples: 5080592. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 19:50:53,688][67555] Avg episode reward: [(0, '-36.593'), (1, '-35.705')]
+[2023-09-19 19:50:57,280][68280] Updated weights for policy 0, policy_version 7328 (0.0015)
+[2023-09-19 19:50:57,280][68281] Updated weights for policy 1, policy_version 5040 (0.0016)
+[2023-09-19 19:50:58,687][67555] Fps is (10 sec: 13107.0, 60 sec: 12151.5, 300 sec: 12218.6). Total num frames: 6348800. Throughput: 0: 6043.3, 1: 6044.2. Samples: 5155512. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 19:50:58,688][67555] Avg episode reward: [(0, '-36.561'), (1, '-35.105')]
+[2023-09-19 19:50:58,696][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000007344_3760128.pth...
+[2023-09-19 19:50:58,696][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000005056_2588672.pth...
+[2023-09-19 19:50:58,704][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000004696_2404352.pth
+[2023-09-19 19:50:58,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000006984_3575808.pth
+[2023-09-19 19:51:03,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12083.2, 300 sec: 12190.8). Total num frames: 6406144. Throughput: 0: 6107.8, 1: 6107.2. Samples: 5233752. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 19:51:03,687][67555] Avg episode reward: [(0, '-35.558'), (1, '-35.313')]
+[2023-09-19 19:51:03,726][68280] Updated weights for policy 0, policy_version 7408 (0.0013)
+[2023-09-19 19:51:03,726][68281] Updated weights for policy 1, policy_version 5120 (0.0016)
+[2023-09-19 19:51:08,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.4, 300 sec: 12190.8). Total num frames: 6471680. Throughput: 0: 6104.1, 1: 6101.5. Samples: 5267774. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:51:08,688][67555] Avg episode reward: [(0, '-35.747'), (1, '-36.395')]
+[2023-09-19 19:51:10,517][68280] Updated weights for policy 0, policy_version 7488 (0.0013)
+[2023-09-19 19:51:10,517][68281] Updated weights for policy 1, policy_version 5200 (0.0014)
+[2023-09-19 19:51:13,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12151.5, 300 sec: 12190.8). Total num frames: 6529024. Throughput: 0: 6117.9, 1: 6119.6. Samples: 5341098. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:51:13,688][67555] Avg episode reward: [(0, '-34.442'), (1, '-34.594')]
+[2023-09-19 19:51:13,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000007520_3850240.pth...
+[2023-09-19 19:51:13,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000005232_2678784.pth...
+[2023-09-19 19:51:13,703][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000004872_2494464.pth
+[2023-09-19 19:51:13,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000007160_3665920.pth
+[2023-09-19 19:51:17,305][68281] Updated weights for policy 1, policy_version 5280 (0.0011)
+[2023-09-19 19:51:17,305][68280] Updated weights for policy 0, policy_version 7568 (0.0015)
+[2023-09-19 19:51:18,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12288.0, 300 sec: 12190.8). Total num frames: 6594560. Throughput: 0: 6105.7, 1: 6108.1. Samples: 5414874. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:51:18,688][67555] Avg episode reward: [(0, '-35.942'), (1, '-35.780')]
+[2023-09-19 19:51:23,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12151.5, 300 sec: 12190.8). Total num frames: 6651904. Throughput: 0: 6104.4, 1: 6105.6. Samples: 5450520. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 19:51:23,688][67555] Avg episode reward: [(0, '-36.007'), (1, '-35.704')]
+[2023-09-19 19:51:24,022][68281] Updated weights for policy 1, policy_version 5360 (0.0014)
+[2023-09-19 19:51:24,023][68280] Updated weights for policy 0, policy_version 7648 (0.0015)
+[2023-09-19 19:51:28,687][67555] Fps is (10 sec: 11468.6, 60 sec: 12151.4, 300 sec: 12163.0). Total num frames: 6709248. Throughput: 0: 6102.1, 1: 6104.8. Samples: 5521442. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 19:51:28,688][67555] Avg episode reward: [(0, '-36.654'), (1, '-35.907')]
+[2023-09-19 19:51:28,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000007696_3940352.pth...
+[2023-09-19 19:51:28,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000005408_2768896.pth...
+[2023-09-19 19:51:28,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000007344_3760128.pth
+[2023-09-19 19:51:28,706][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000005056_2588672.pth
+[2023-09-19 19:51:31,063][68281] Updated weights for policy 1, policy_version 5440 (0.0016)
+[2023-09-19 19:51:31,064][68280] Updated weights for policy 0, policy_version 7728 (0.0011)
+[2023-09-19 19:51:33,687][67555] Fps is (10 sec: 11468.8, 60 sec: 12151.5, 300 sec: 12163.0). Total num frames: 6766592. Throughput: 0: 6102.4, 1: 6102.2. Samples: 5593742. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 19:51:33,688][67555] Avg episode reward: [(0, '-35.791'), (1, '-35.423')]
+[2023-09-19 19:51:37,970][68280] Updated weights for policy 0, policy_version 7808 (0.0012)
+[2023-09-19 19:51:37,971][68281] Updated weights for policy 1, policy_version 5520 (0.0014)
+[2023-09-19 19:51:38,687][67555] Fps is (10 sec: 11469.1, 60 sec: 12015.0, 300 sec: 12135.3). Total num frames: 6823936. Throughput: 0: 6080.2, 1: 6083.1. Samples: 5627938. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:51:38,688][67555] Avg episode reward: [(0, '-35.685'), (1, '-35.905')]
+[2023-09-19 19:51:43,687][67555] Fps is (10 sec: 12287.7, 60 sec: 12151.4, 300 sec: 12135.3). Total num frames: 6889472. Throughput: 0: 6020.9, 1: 6020.9. Samples: 5697396. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
+[2023-09-19 19:51:43,688][67555] Avg episode reward: [(0, '-35.386'), (1, '-34.197')]
+[2023-09-19 19:51:43,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000007872_4030464.pth...
+[2023-09-19 19:51:43,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000005584_2859008.pth...
+[2023-09-19 19:51:43,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000007520_3850240.pth
+[2023-09-19 19:51:43,707][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000005232_2678784.pth
+[2023-09-19 19:51:45,009][68281] Updated weights for policy 1, policy_version 5600 (0.0012)
+[2023-09-19 19:51:45,010][68280] Updated weights for policy 0, policy_version 7888 (0.0011)
+[2023-09-19 19:51:48,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12151.4, 300 sec: 12135.3). Total num frames: 6946816. Throughput: 0: 5940.3, 1: 5941.2. Samples: 5768422. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
+[2023-09-19 19:51:48,688][67555] Avg episode reward: [(0, '-34.569'), (1, '-35.202')]
+[2023-09-19 19:51:51,996][68280] Updated weights for policy 0, policy_version 7968 (0.0013)
+[2023-09-19 19:51:51,997][68281] Updated weights for policy 1, policy_version 5680 (0.0016)
+[2023-09-19 19:51:53,687][67555] Fps is (10 sec: 11469.0, 60 sec: 12014.9, 300 sec: 12135.3). Total num frames: 7004160. Throughput: 0: 5955.4, 1: 5955.4. Samples: 5803758. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:51:53,688][67555] Avg episode reward: [(0, '-33.331'), (1, '-34.793')]
+[2023-09-19 19:51:58,668][68280] Updated weights for policy 0, policy_version 8048 (0.0013)
+[2023-09-19 19:51:58,669][68281] Updated weights for policy 1, policy_version 5760 (0.0012)
+[2023-09-19 19:51:58,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12014.9, 300 sec: 12135.3). Total num frames: 7069696. Throughput: 0: 5917.2, 1: 5919.1. Samples: 5873730. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:51:58,688][67555] Avg episode reward: [(0, '-34.901'), (1, '-34.602')]
+[2023-09-19 19:51:58,694][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000008048_4120576.pth...
+[2023-09-19 19:51:58,695][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000005760_2949120.pth...
+[2023-09-19 19:51:58,698][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000007696_3940352.pth
+[2023-09-19 19:51:58,700][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000005408_2768896.pth
+[2023-09-19 19:52:03,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12014.9, 300 sec: 12135.3). Total num frames: 7127040. Throughput: 0: 5926.8, 1: 5923.8. Samples: 5948150. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:52:03,688][67555] Avg episode reward: [(0, '-35.471'), (1, '-34.399')]
+[2023-09-19 19:52:05,562][68281] Updated weights for policy 1, policy_version 5840 (0.0011)
+[2023-09-19 19:52:05,563][68280] Updated weights for policy 0, policy_version 8128 (0.0014)
+[2023-09-19 19:52:08,687][67555] Fps is (10 sec: 11468.8, 60 sec: 11878.4, 300 sec: 12107.5). Total num frames: 7184384. Throughput: 0: 5925.9, 1: 5924.8. Samples: 5983800. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:52:08,688][67555] Avg episode reward: [(0, '-33.999'), (1, '-34.225')]
+[2023-09-19 19:52:12,091][68280] Updated weights for policy 0, policy_version 8208 (0.0015)
+[2023-09-19 19:52:12,092][68281] Updated weights for policy 1, policy_version 5920 (0.0011)
+[2023-09-19 19:52:13,687][67555] Fps is (10 sec: 12287.7, 60 sec: 12014.9, 300 sec: 12135.3). Total num frames: 7249920. Throughput: 0: 5988.9, 1: 5984.7. Samples: 6060254. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:52:13,688][67555] Avg episode reward: [(0, '-34.678'), (1, '-35.260')]
+[2023-09-19 19:52:13,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000008224_4210688.pth...
+[2023-09-19 19:52:13,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000005936_3039232.pth...
+[2023-09-19 19:52:13,705][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000005584_2859008.pth
+[2023-09-19 19:52:13,705][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000007872_4030464.pth
+[2023-09-19 19:52:18,603][68280] Updated weights for policy 0, policy_version 8288 (0.0014)
+[2023-09-19 19:52:18,604][68281] Updated weights for policy 1, policy_version 6000 (0.0013)
+[2023-09-19 19:52:18,687][67555] Fps is (10 sec: 13106.9, 60 sec: 12014.9, 300 sec: 12163.0). Total num frames: 7315456. Throughput: 0: 6011.8, 1: 6011.4. Samples: 6134790. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:52:18,688][67555] Avg episode reward: [(0, '-35.901'), (1, '-34.726')]
+[2023-09-19 19:52:23,687][67555] Fps is (10 sec: 12288.3, 60 sec: 12014.9, 300 sec: 12135.3). Total num frames: 7372800. Throughput: 0: 6071.8, 1: 6068.4. Samples: 6174252. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:52:23,688][67555] Avg episode reward: [(0, '-33.994'), (1, '-34.701')]
+[2023-09-19 19:52:25,272][68281] Updated weights for policy 1, policy_version 6080 (0.0014)
+[2023-09-19 19:52:25,273][68280] Updated weights for policy 0, policy_version 8368 (0.0015)
+[2023-09-19 19:52:28,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.5, 300 sec: 12163.0). Total num frames: 7438336. Throughput: 0: 6102.5, 1: 6101.0. Samples: 6246552. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
+[2023-09-19 19:52:28,688][67555] Avg episode reward: [(0, '-33.747'), (1, '-34.972')]
+[2023-09-19 19:52:28,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000006120_3133440.pth...
+[2023-09-19 19:52:28,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000008408_4304896.pth...
+[2023-09-19 19:52:28,702][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000008048_4120576.pth
+[2023-09-19 19:52:28,705][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000005760_2949120.pth
+[2023-09-19 19:52:31,878][68280] Updated weights for policy 0, policy_version 8448 (0.0012)
+[2023-09-19 19:52:31,878][68281] Updated weights for policy 1, policy_version 6160 (0.0015)
+[2023-09-19 19:52:33,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 7495680. Throughput: 0: 6127.4, 1: 6126.2. Samples: 6319834. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 19:52:33,688][67555] Avg episode reward: [(0, '-33.221'), (1, '-35.081')]
+[2023-09-19 19:52:38,492][68281] Updated weights for policy 1, policy_version 6240 (0.0013)
+[2023-09-19 19:52:38,492][68280] Updated weights for policy 0, policy_version 8528 (0.0015)
+[2023-09-19 19:52:38,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12288.0, 300 sec: 12163.0). Total num frames: 7561216. Throughput: 0: 6135.4, 1: 6134.7. Samples: 6355912. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 19:52:38,688][67555] Avg episode reward: [(0, '-33.303'), (1, '-34.569')]
+[2023-09-19 19:52:43,687][67555] Fps is (10 sec: 12287.5, 60 sec: 12151.5, 300 sec: 12163.0). Total num frames: 7618560. Throughput: 0: 6188.8, 1: 6188.5. Samples: 6430712. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
+[2023-09-19 19:52:43,688][67555] Avg episode reward: [(0, '-35.232'), (1, '-34.998')]
+[2023-09-19 19:52:43,696][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000006296_3223552.pth...
+[2023-09-19 19:52:43,696][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000008584_4395008.pth...
+[2023-09-19 19:52:43,701][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000008224_4210688.pth
+[2023-09-19 19:52:43,701][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000005936_3039232.pth
+[2023-09-19 19:52:45,385][68281] Updated weights for policy 1, policy_version 6320 (0.0015)
+[2023-09-19 19:52:45,385][68280] Updated weights for policy 0, policy_version 8608 (0.0014)
+[2023-09-19 19:52:48,687][67555] Fps is (10 sec: 11468.7, 60 sec: 12151.4, 300 sec: 12163.0). Total num frames: 7675904. Throughput: 0: 6161.7, 1: 6160.8. Samples: 6502666. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 19:52:48,688][67555] Avg episode reward: [(0, '-34.941'), (1, '-35.622')]
+[2023-09-19 19:52:52,213][68281] Updated weights for policy 1, policy_version 6400 (0.0013)
+[2023-09-19 19:52:52,214][68280] Updated weights for policy 0, policy_version 8688 (0.0014)
+[2023-09-19 19:52:53,687][67555] Fps is (10 sec: 12288.5, 60 sec: 12288.0, 300 sec: 12163.0). Total num frames: 7741440. Throughput: 0: 6151.3, 1: 6152.4. Samples: 6537468. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 19:52:53,688][67555] Avg episode reward: [(0, '-34.803'), (1, '-35.358')]
+[2023-09-19 19:52:58,687][67555] Fps is (10 sec: 12288.3, 60 sec: 12151.5, 300 sec: 12163.0). Total num frames: 7798784. Throughput: 0: 6116.5, 1: 6121.3. Samples: 6610946. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:52:58,688][67555] Avg episode reward: [(0, '-35.106'), (1, '-34.437')]
+[2023-09-19 19:52:58,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000008760_4485120.pth...
+[2023-09-19 19:52:58,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000006472_3313664.pth...
+[2023-09-19 19:52:58,704][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000006120_3133440.pth
+[2023-09-19 19:52:58,707][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000008408_4304896.pth
+[2023-09-19 19:52:58,922][68280] Updated weights for policy 0, policy_version 8768 (0.0011)
+[2023-09-19 19:52:58,922][68281] Updated weights for policy 1, policy_version 6480 (0.0014)
+[2023-09-19 19:53:03,687][67555] Fps is (10 sec: 11468.9, 60 sec: 12151.5, 300 sec: 12163.0). Total num frames: 7856128. Throughput: 0: 6068.1, 1: 6070.0. Samples: 6681000. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 19:53:03,688][67555] Avg episode reward: [(0, '-34.753'), (1, '-34.395')]
+[2023-09-19 19:53:05,902][68281] Updated weights for policy 1, policy_version 6560 (0.0012)
+[2023-09-19 19:53:05,903][68280] Updated weights for policy 0, policy_version 8848 (0.0012)
+[2023-09-19 19:53:08,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12288.0, 300 sec: 12163.0). Total num frames: 7921664. Throughput: 0: 6034.1, 1: 6037.5. Samples: 6717474. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 19:53:08,688][67555] Avg episode reward: [(0, '-35.282'), (1, '-34.356')]
+[2023-09-19 19:53:12,520][68280] Updated weights for policy 0, policy_version 8928 (0.0017)
+[2023-09-19 19:53:12,521][68281] Updated weights for policy 1, policy_version 6640 (0.0012)
+[2023-09-19 19:53:13,687][67555] Fps is (10 sec: 12287.7, 60 sec: 12151.5, 300 sec: 12163.0). Total num frames: 7979008. Throughput: 0: 6049.3, 1: 6054.1. Samples: 6791202. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 19:53:13,688][67555] Avg episode reward: [(0, '-34.342'), (1, '-34.615')]
+[2023-09-19 19:53:13,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000006648_3403776.pth...
+[2023-09-19 19:53:13,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000008936_4575232.pth...
+[2023-09-19 19:53:13,706][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000006296_3223552.pth
+[2023-09-19 19:53:13,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000008584_4395008.pth
+[2023-09-19 19:53:18,687][67555] Fps is (10 sec: 11468.7, 60 sec: 12014.9, 300 sec: 12135.3). Total num frames: 8036352. Throughput: 0: 6017.3, 1: 6018.1. Samples: 6861426. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:53:18,688][67555] Avg episode reward: [(0, '-35.659'), (1, '-34.241')]
+[2023-09-19 19:53:19,615][68281] Updated weights for policy 1, policy_version 6720 (0.0013)
+[2023-09-19 19:53:19,616][68280] Updated weights for policy 0, policy_version 9008 (0.0016)
+[2023-09-19 19:53:23,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 8101888. Throughput: 0: 6010.1, 1: 6010.6. Samples: 6896844. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:53:23,688][67555] Avg episode reward: [(0, '-35.183'), (1, '-33.926')]
+[2023-09-19 19:53:26,255][68281] Updated weights for policy 1, policy_version 6800 (0.0012)
+[2023-09-19 19:53:26,255][68280] Updated weights for policy 0, policy_version 9088 (0.0014)
+[2023-09-19 19:53:28,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12014.9, 300 sec: 12135.3). Total num frames: 8159232. Throughput: 0: 6008.5, 1: 6008.5. Samples: 6971478. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:53:28,688][67555] Avg episode reward: [(0, '-34.954'), (1, '-34.239')]
+[2023-09-19 19:53:28,754][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000009120_4669440.pth...
+[2023-09-19 19:53:28,760][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000008760_4485120.pth
+[2023-09-19 19:53:28,761][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000006832_3497984.pth...
+[2023-09-19 19:53:28,769][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000006472_3313664.pth
+[2023-09-19 19:53:32,619][68280] Updated weights for policy 0, policy_version 9168 (0.0014)
+[2023-09-19 19:53:32,619][68281] Updated weights for policy 1, policy_version 6880 (0.0015)
+[2023-09-19 19:53:33,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 8224768. Throughput: 0: 6068.0, 1: 6067.5. Samples: 7048764. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:53:33,688][67555] Avg episode reward: [(0, '-35.228'), (1, '-33.536')]
+[2023-09-19 19:53:38,687][67555] Fps is (10 sec: 13107.5, 60 sec: 12151.5, 300 sec: 12163.0). Total num frames: 8290304. Throughput: 0: 6076.1, 1: 6076.2. Samples: 7084324. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:53:38,688][67555] Avg episode reward: [(0, '-36.047'), (1, '-33.069')]
+[2023-09-19 19:53:38,688][68201] Saving new best policy, reward=-33.069!
+[2023-09-19 19:53:39,201][68280] Updated weights for policy 0, policy_version 9248 (0.0014)
+[2023-09-19 19:53:39,201][68281] Updated weights for policy 1, policy_version 6960 (0.0013)
+[2023-09-19 19:53:43,687][67555] Fps is (10 sec: 12287.4, 60 sec: 12151.4, 300 sec: 12135.3). Total num frames: 8347648. Throughput: 0: 6099.7, 1: 6099.1. Samples: 7159898. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:53:43,689][67555] Avg episode reward: [(0, '-34.896'), (1, '-34.011')]
+[2023-09-19 19:53:43,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000007008_3588096.pth...
+[2023-09-19 19:53:43,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000009296_4759552.pth...
+[2023-09-19 19:53:43,703][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000006648_3403776.pth
+[2023-09-19 19:53:43,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000008936_4575232.pth
+[2023-09-19 19:53:45,794][68281] Updated weights for policy 1, policy_version 7040 (0.0013)
+[2023-09-19 19:53:45,795][68280] Updated weights for policy 0, policy_version 9328 (0.0016)
+[2023-09-19 19:53:48,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12288.0, 300 sec: 12163.0). Total num frames: 8413184. Throughput: 0: 6137.6, 1: 6141.0. Samples: 7233540. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:53:48,688][67555] Avg episode reward: [(0, '-33.923'), (1, '-34.620')]
+[2023-09-19 19:53:52,615][68280] Updated weights for policy 0, policy_version 9408 (0.0013)
+[2023-09-19 19:53:52,615][68281] Updated weights for policy 1, policy_version 7120 (0.0016)
+[2023-09-19 19:53:53,687][67555] Fps is (10 sec: 12288.4, 60 sec: 12151.4, 300 sec: 12135.3). Total num frames: 8470528. Throughput: 0: 6139.5, 1: 6136.1. Samples: 7269876. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:53:53,688][67555] Avg episode reward: [(0, '-35.477'), (1, '-33.636')]
+[2023-09-19 19:53:58,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12288.0, 300 sec: 12163.0). Total num frames: 8536064. Throughput: 0: 6136.0, 1: 6131.8. Samples: 7343250. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 19:53:58,688][67555] Avg episode reward: [(0, '-35.273'), (1, '-33.901')]
+[2023-09-19 19:53:58,696][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000009480_4853760.pth...
+[2023-09-19 19:53:58,696][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000007192_3682304.pth...
+[2023-09-19 19:53:58,703][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000006832_3497984.pth
+[2023-09-19 19:53:58,703][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000009120_4669440.pth
+[2023-09-19 19:53:59,164][68280] Updated weights for policy 0, policy_version 9488 (0.0013)
+[2023-09-19 19:53:59,164][68281] Updated weights for policy 1, policy_version 7200 (0.0013)
+[2023-09-19 19:54:03,687][67555] Fps is (10 sec: 12288.4, 60 sec: 12288.0, 300 sec: 12135.3). Total num frames: 8593408. Throughput: 0: 6168.5, 1: 6166.9. Samples: 7416510. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 19:54:03,687][67555] Avg episode reward: [(0, '-34.775'), (1, '-33.814')]
+[2023-09-19 19:54:05,976][68281] Updated weights for policy 1, policy_version 7280 (0.0014)
+[2023-09-19 19:54:05,977][68280] Updated weights for policy 0, policy_version 9568 (0.0013)
+[2023-09-19 19:54:08,687][67555] Fps is (10 sec: 11468.7, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 8650752. Throughput: 0: 6195.2, 1: 6197.3. Samples: 7454506. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 19:54:08,688][67555] Avg episode reward: [(0, '-34.687'), (1, '-35.417')]
+[2023-09-19 19:54:12,845][68280] Updated weights for policy 0, policy_version 9648 (0.0014)
+[2023-09-19 19:54:12,845][68281] Updated weights for policy 1, policy_version 7360 (0.0014)
+[2023-09-19 19:54:13,687][67555] Fps is (10 sec: 12287.5, 60 sec: 12288.0, 300 sec: 12135.3). Total num frames: 8716288. Throughput: 0: 6153.2, 1: 6149.6. Samples: 7525102. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 19:54:13,688][67555] Avg episode reward: [(0, '-33.253'), (1, '-34.913')]
+[2023-09-19 19:54:13,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000009656_4943872.pth...
+[2023-09-19 19:54:13,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000007368_3772416.pth...
+[2023-09-19 19:54:13,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000009296_4759552.pth
+[2023-09-19 19:54:13,707][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000007008_3588096.pth
+[2023-09-19 19:54:18,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12288.0, 300 sec: 12135.3). Total num frames: 8773632. Throughput: 0: 6125.3, 1: 6125.7. Samples: 7600062. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 19:54:18,688][67555] Avg episode reward: [(0, '-33.859'), (1, '-34.318')]
+[2023-09-19 19:54:19,422][68281] Updated weights for policy 1, policy_version 7440 (0.0013)
+[2023-09-19 19:54:19,422][68280] Updated weights for policy 0, policy_version 9728 (0.0015)
+[2023-09-19 19:54:23,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12288.0, 300 sec: 12135.3). Total num frames: 8839168. Throughput: 0: 6125.4, 1: 6125.8. Samples: 7635626. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:54:23,688][67555] Avg episode reward: [(0, '-34.772'), (1, '-34.165')]
+[2023-09-19 19:54:25,814][68281] Updated weights for policy 1, policy_version 7520 (0.0011)
+[2023-09-19 19:54:25,814][68280] Updated weights for policy 0, policy_version 9808 (0.0012)
+[2023-09-19 19:54:28,687][67555] Fps is (10 sec: 13107.2, 60 sec: 12424.6, 300 sec: 12163.0). Total num frames: 8904704. Throughput: 0: 6171.2, 1: 6166.3. Samples: 7715078. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:54:28,688][67555] Avg episode reward: [(0, '-33.917'), (1, '-34.499')]
+[2023-09-19 19:54:28,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000007552_3866624.pth...
+[2023-09-19 19:54:28,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000009840_5038080.pth...
+[2023-09-19 19:54:28,701][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000007192_3682304.pth
+[2023-09-19 19:54:28,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000009480_4853760.pth
+[2023-09-19 19:54:32,181][68280] Updated weights for policy 0, policy_version 9888 (0.0015)
+[2023-09-19 19:54:32,181][68281] Updated weights for policy 1, policy_version 7600 (0.0016)
+[2023-09-19 19:54:33,687][67555] Fps is (10 sec: 13107.1, 60 sec: 12424.5, 300 sec: 12163.0). Total num frames: 8970240. Throughput: 0: 6190.2, 1: 6190.2. Samples: 7790658. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 19:54:33,688][67555] Avg episode reward: [(0, '-35.934'), (1, '-34.346')]
+[2023-09-19 19:54:38,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12288.0, 300 sec: 12163.0). Total num frames: 9027584. Throughput: 0: 6184.2, 1: 6184.8. Samples: 7826480. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:54:38,688][67555] Avg episode reward: [(0, '-34.288'), (1, '-33.994')]
+[2023-09-19 19:54:39,113][68280] Updated weights for policy 0, policy_version 9968 (0.0016)
+[2023-09-19 19:54:39,113][68281] Updated weights for policy 1, policy_version 7680 (0.0016)
+[2023-09-19 19:54:43,687][67555] Fps is (10 sec: 11468.8, 60 sec: 12288.0, 300 sec: 12163.0). Total num frames: 9084928. Throughput: 0: 6157.1, 1: 6158.5. Samples: 7897454. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:54:43,688][67555] Avg episode reward: [(0, '-33.863'), (1, '-34.492')]
+[2023-09-19 19:54:43,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000010016_5128192.pth...
+[2023-09-19 19:54:43,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000007728_3956736.pth...
+[2023-09-19 19:54:43,703][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000007368_3772416.pth
+[2023-09-19 19:54:43,703][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000009656_4943872.pth
+[2023-09-19 19:54:45,819][68280] Updated weights for policy 0, policy_version 10048 (0.0014)
+[2023-09-19 19:54:45,820][68281] Updated weights for policy 1, policy_version 7760 (0.0015)
+[2023-09-19 19:54:48,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12288.0, 300 sec: 12163.0). Total num frames: 9150464. Throughput: 0: 6168.2, 1: 6169.7. Samples: 7971720. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 19:54:48,688][67555] Avg episode reward: [(0, '-35.027'), (1, '-33.714')]
+[2023-09-19 19:54:52,362][68281] Updated weights for policy 1, policy_version 7840 (0.0014)
+[2023-09-19 19:54:52,362][68280] Updated weights for policy 0, policy_version 10128 (0.0014)
+[2023-09-19 19:54:53,687][67555] Fps is (10 sec: 13107.6, 60 sec: 12424.6, 300 sec: 12190.8). Total num frames: 9216000. Throughput: 0: 6180.9, 1: 6178.7. Samples: 8010690. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 19:54:53,688][67555] Avg episode reward: [(0, '-34.164'), (1, '-32.718')]
+[2023-09-19 19:54:53,688][68201] Saving new best policy, reward=-32.718!
+[2023-09-19 19:54:58,687][67555] Fps is (10 sec: 12288.3, 60 sec: 12288.0, 300 sec: 12176.9). Total num frames: 9273344. Throughput: 0: 6190.3, 1: 6188.1. Samples: 8082126. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:54:58,687][67555] Avg episode reward: [(0, '-35.059'), (1, '-33.665')]
+[2023-09-19 19:54:58,696][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000007912_4050944.pth...
+[2023-09-19 19:54:58,696][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000010200_5222400.pth...
+[2023-09-19 19:54:58,701][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000007552_3866624.pth
+[2023-09-19 19:54:58,709][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000009840_5038080.pth
+[2023-09-19 19:54:59,137][68281] Updated weights for policy 1, policy_version 7920 (0.0012)
+[2023-09-19 19:54:59,137][68280] Updated weights for policy 0, policy_version 10208 (0.0014)
+[2023-09-19 19:55:03,687][67555] Fps is (10 sec: 11468.6, 60 sec: 12287.9, 300 sec: 12163.0). Total num frames: 9330688. Throughput: 0: 6178.2, 1: 6178.6. Samples: 8156116. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 19:55:03,688][67555] Avg episode reward: [(0, '-33.927'), (1, '-34.451')]
+[2023-09-19 19:55:05,788][68280] Updated weights for policy 0, policy_version 10288 (0.0013)
+[2023-09-19 19:55:05,788][68281] Updated weights for policy 1, policy_version 8000 (0.0014)
+[2023-09-19 19:55:08,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12424.5, 300 sec: 12190.8). Total num frames: 9396224. Throughput: 0: 6190.4, 1: 6189.7. Samples: 8192730. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 19:55:08,688][67555] Avg episode reward: [(0, '-34.409'), (1, '-34.154')]
+[2023-09-19 19:55:12,407][68280] Updated weights for policy 0, policy_version 10368 (0.0013)
+[2023-09-19 19:55:12,407][68281] Updated weights for policy 1, policy_version 8080 (0.0015)
+[2023-09-19 19:55:13,687][67555] Fps is (10 sec: 13107.2, 60 sec: 12424.6, 300 sec: 12218.6). Total num frames: 9461760. Throughput: 0: 6136.3, 1: 6138.4. Samples: 8267436. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:55:13,688][67555] Avg episode reward: [(0, '-33.288'), (1, '-34.130')]
+[2023-09-19 19:55:13,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000008096_4145152.pth...
+[2023-09-19 19:55:13,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000010384_5316608.pth...
+[2023-09-19 19:55:13,700][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000007728_3956736.pth
+[2023-09-19 19:55:13,702][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000010016_5128192.pth
+[2023-09-19 19:55:18,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12424.6, 300 sec: 12190.8). Total num frames: 9519104. Throughput: 0: 6121.6, 1: 6117.5. Samples: 8341414. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:55:18,688][67555] Avg episode reward: [(0, '-34.378'), (1, '-33.800')]
+[2023-09-19 19:55:19,070][68281] Updated weights for policy 1, policy_version 8160 (0.0013)
+[2023-09-19 19:55:19,070][68280] Updated weights for policy 0, policy_version 10448 (0.0014)
+[2023-09-19 19:55:23,687][67555] Fps is (10 sec: 11468.9, 60 sec: 12288.0, 300 sec: 12190.8). Total num frames: 9576448. Throughput: 0: 6135.3, 1: 6134.9. Samples: 8378636. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
+[2023-09-19 19:55:23,688][67555] Avg episode reward: [(0, '-34.628'), (1, '-33.182')]
+[2023-09-19 19:55:25,809][68281] Updated weights for policy 1, policy_version 8240 (0.0010)
+[2023-09-19 19:55:25,810][68280] Updated weights for policy 0, policy_version 10528 (0.0014)
+[2023-09-19 19:55:28,687][67555] Fps is (10 sec: 12287.7, 60 sec: 12288.0, 300 sec: 12218.6). Total num frames: 9641984. Throughput: 0: 6155.6, 1: 6154.3. Samples: 8451400. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:55:28,688][67555] Avg episode reward: [(0, '-34.027'), (1, '-34.462')]
+[2023-09-19 19:55:28,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000008272_4235264.pth...
+[2023-09-19 19:55:28,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000010560_5406720.pth...
+[2023-09-19 19:55:28,704][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000007912_4050944.pth
+[2023-09-19 19:55:28,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000010200_5222400.pth
+[2023-09-19 19:55:32,508][68280] Updated weights for policy 0, policy_version 10608 (0.0013)
+[2023-09-19 19:55:32,509][68281] Updated weights for policy 1, policy_version 8320 (0.0010)
+[2023-09-19 19:55:33,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12151.5, 300 sec: 12190.8). Total num frames: 9699328. Throughput: 0: 6143.6, 1: 6143.4. Samples: 8524630. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:55:33,688][67555] Avg episode reward: [(0, '-34.465'), (1, '-34.388')]
+[2023-09-19 19:55:38,687][67555] Fps is (10 sec: 11468.8, 60 sec: 12151.5, 300 sec: 12190.8). Total num frames: 9756672. Throughput: 0: 6094.7, 1: 6095.2. Samples: 8559240. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 19:55:38,688][67555] Avg episode reward: [(0, '-35.009'), (1, '-35.056')]
+[2023-09-19 19:55:39,496][68280] Updated weights for policy 0, policy_version 10688 (0.0014)
+[2023-09-19 19:55:39,496][68281] Updated weights for policy 1, policy_version 8400 (0.0014)
+[2023-09-19 19:55:43,687][67555] Fps is (10 sec: 12287.7, 60 sec: 12288.0, 300 sec: 12218.6). Total num frames: 9822208. Throughput: 0: 6104.6, 1: 6107.3. Samples: 8631668. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 19:55:43,688][67555] Avg episode reward: [(0, '-33.088'), (1, '-32.283')]
+[2023-09-19 19:55:43,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000010736_5496832.pth...
+[2023-09-19 19:55:43,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000008448_4325376.pth...
+[2023-09-19 19:55:43,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000010384_5316608.pth
+[2023-09-19 19:55:43,706][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000008096_4145152.pth
+[2023-09-19 19:55:43,707][68201] Saving new best policy, reward=-32.283!
+[2023-09-19 19:55:46,222][68280] Updated weights for policy 0, policy_version 10768 (0.0013)
+[2023-09-19 19:55:46,222][68281] Updated weights for policy 1, policy_version 8480 (0.0013)
+[2023-09-19 19:55:48,687][67555] Fps is (10 sec: 12288.3, 60 sec: 12151.5, 300 sec: 12190.8). Total num frames: 9879552. Throughput: 0: 6100.4, 1: 6101.4. Samples: 8705198. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 19:55:48,688][67555] Avg episode reward: [(0, '-36.678'), (1, '-34.146')]
+[2023-09-19 19:55:53,109][68281] Updated weights for policy 1, policy_version 8560 (0.0015)
+[2023-09-19 19:55:53,109][68280] Updated weights for policy 0, policy_version 10848 (0.0012)
+[2023-09-19 19:55:53,687][67555] Fps is (10 sec: 11468.9, 60 sec: 12014.9, 300 sec: 12163.0). Total num frames: 9936896. Throughput: 0: 6085.6, 1: 6087.0. Samples: 8740498. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 19:55:53,688][67555] Avg episode reward: [(0, '-36.206'), (1, '-33.018')]
+[2023-09-19 19:55:58,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12151.4, 300 sec: 12190.8). Total num frames: 10002432. Throughput: 0: 6065.8, 1: 6065.2. Samples: 8813330. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:55:58,688][67555] Avg episode reward: [(0, '-36.099'), (1, '-33.649')]
+[2023-09-19 19:55:58,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000010912_5586944.pth...
+[2023-09-19 19:55:58,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000008624_4415488.pth...
+[2023-09-19 19:55:58,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000010560_5406720.pth
+[2023-09-19 19:55:58,704][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000008272_4235264.pth
+[2023-09-19 19:55:59,606][68281] Updated weights for policy 1, policy_version 8640 (0.0014)
+[2023-09-19 19:55:59,606][68280] Updated weights for policy 0, policy_version 10928 (0.0011)
+[2023-09-19 19:56:03,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12151.5, 300 sec: 12163.0). Total num frames: 10059776. Throughput: 0: 6061.1, 1: 6060.8. Samples: 8886902. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 19:56:03,688][67555] Avg episode reward: [(0, '-34.256'), (1, '-33.956')]
+[2023-09-19 19:56:06,460][68280] Updated weights for policy 0, policy_version 11008 (0.0011)
+[2023-09-19 19:56:06,460][68281] Updated weights for policy 1, policy_version 8720 (0.0013)
+[2023-09-19 19:56:08,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12151.5, 300 sec: 12190.8). Total num frames: 10125312. Throughput: 0: 6038.8, 1: 6037.6. Samples: 8922076. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 19:56:08,688][67555] Avg episode reward: [(0, '-35.092'), (1, '-33.287')]
+[2023-09-19 19:56:12,965][68280] Updated weights for policy 0, policy_version 11088 (0.0014)
+[2023-09-19 19:56:12,965][68281] Updated weights for policy 1, policy_version 8800 (0.0013)
+[2023-09-19 19:56:13,687][67555] Fps is (10 sec: 13107.5, 60 sec: 12151.5, 300 sec: 12190.8). Total num frames: 10190848. Throughput: 0: 6072.4, 1: 6073.1. Samples: 8997946. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:56:13,687][67555] Avg episode reward: [(0, '-36.582'), (1, '-32.654')]
+[2023-09-19 19:56:13,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000011096_5681152.pth...
+[2023-09-19 19:56:13,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000008808_4509696.pth...
+[2023-09-19 19:56:13,703][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000010736_5496832.pth
+[2023-09-19 19:56:13,705][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000008448_4325376.pth
+[2023-09-19 19:56:18,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12151.4, 300 sec: 12190.8). Total num frames: 10248192. Throughput: 0: 6053.3, 1: 6052.1. Samples: 9069374. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:56:18,688][67555] Avg episode reward: [(0, '-34.788'), (1, '-33.279')]
+[2023-09-19 19:56:19,744][68280] Updated weights for policy 0, policy_version 11168 (0.0008)
+[2023-09-19 19:56:19,744][68281] Updated weights for policy 1, policy_version 8880 (0.0014)
+[2023-09-19 19:56:23,687][67555] Fps is (10 sec: 11468.5, 60 sec: 12151.4, 300 sec: 12190.8). Total num frames: 10305536. Throughput: 0: 6094.0, 1: 6094.4. Samples: 9107716. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:56:23,688][67555] Avg episode reward: [(0, '-35.712'), (1, '-32.261')]
+[2023-09-19 19:56:23,717][68201] Saving new best policy, reward=-32.261!
+[2023-09-19 19:56:26,366][68281] Updated weights for policy 1, policy_version 8960 (0.0013)
+[2023-09-19 19:56:26,367][68280] Updated weights for policy 0, policy_version 11248 (0.0014)
+[2023-09-19 19:56:28,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12151.5, 300 sec: 12218.6). Total num frames: 10371072. Throughput: 0: 6124.8, 1: 6125.7. Samples: 9182940. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:56:28,688][67555] Avg episode reward: [(0, '-35.808'), (1, '-32.588')]
+[2023-09-19 19:56:28,729][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000011280_5775360.pth...
+[2023-09-19 19:56:28,733][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000010912_5586944.pth
+[2023-09-19 19:56:28,734][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000008992_4603904.pth...
+[2023-09-19 19:56:28,737][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000008624_4415488.pth
+[2023-09-19 19:56:32,564][68281] Updated weights for policy 1, policy_version 9040 (0.0008)
+[2023-09-19 19:56:32,565][68280] Updated weights for policy 0, policy_version 11328 (0.0015)
+[2023-09-19 19:56:33,687][67555] Fps is (10 sec: 13107.4, 60 sec: 12288.0, 300 sec: 12246.3). Total num frames: 10436608. Throughput: 0: 6185.3, 1: 6182.9. Samples: 9261768. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:56:33,688][67555] Avg episode reward: [(0, '-35.932'), (1, '-33.007')]
+[2023-09-19 19:56:38,689][67555] Fps is (10 sec: 13104.0, 60 sec: 12424.1, 300 sec: 12246.3). Total num frames: 10502144. Throughput: 0: 6191.6, 1: 6191.5. Samples: 9297768. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 19:56:38,692][67555] Avg episode reward: [(0, '-35.716'), (1, '-32.809')]
+[2023-09-19 19:56:39,355][68280] Updated weights for policy 0, policy_version 11408 (0.0013)
+[2023-09-19 19:56:39,355][68281] Updated weights for policy 1, policy_version 9120 (0.0012)
+[2023-09-19 19:56:43,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12288.0, 300 sec: 12246.3). Total num frames: 10559488. Throughput: 0: 6174.7, 1: 6174.3. Samples: 9369036. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 19:56:43,688][67555] Avg episode reward: [(0, '-36.556'), (1, '-33.253')]
+[2023-09-19 19:56:43,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000009168_4694016.pth...
+[2023-09-19 19:56:43,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000011456_5865472.pth...
+[2023-09-19 19:56:43,704][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000008808_4509696.pth
+[2023-09-19 19:56:43,705][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000011096_5681152.pth
+[2023-09-19 19:56:46,011][68280] Updated weights for policy 0, policy_version 11488 (0.0014)
+[2023-09-19 19:56:46,011][68281] Updated weights for policy 1, policy_version 9200 (0.0014)
+[2023-09-19 19:56:48,687][67555] Fps is (10 sec: 11471.6, 60 sec: 12288.0, 300 sec: 12246.4). Total num frames: 10616832. Throughput: 0: 6183.0, 1: 6183.6. Samples: 9443394. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:56:48,688][67555] Avg episode reward: [(0, '-35.987'), (1, '-34.076')]
+[2023-09-19 19:56:52,762][68280] Updated weights for policy 0, policy_version 11568 (0.0014)
+[2023-09-19 19:56:52,762][68281] Updated weights for policy 1, policy_version 9280 (0.0012)
+[2023-09-19 19:56:53,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12424.6, 300 sec: 12246.3). Total num frames: 10682368. Throughput: 0: 6198.5, 1: 6199.1. Samples: 9479966. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:56:53,688][67555] Avg episode reward: [(0, '-35.794'), (1, '-32.705')]
+[2023-09-19 19:56:58,687][67555] Fps is (10 sec: 12287.7, 60 sec: 12288.0, 300 sec: 12246.3). Total num frames: 10739712. Throughput: 0: 6158.6, 1: 6158.0. Samples: 9552194. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:56:58,688][67555] Avg episode reward: [(0, '-36.550'), (1, '-31.533')]
+[2023-09-19 19:56:58,750][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000009352_4788224.pth...
+[2023-09-19 19:56:58,753][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000008992_4603904.pth
+[2023-09-19 19:56:58,753][68201] Saving new best policy, reward=-31.533!
+[2023-09-19 19:56:58,754][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000011640_5959680.pth...
+[2023-09-19 19:56:58,760][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000011280_5775360.pth
+[2023-09-19 19:56:59,423][68280] Updated weights for policy 0, policy_version 11648 (0.0011)
+[2023-09-19 19:56:59,424][68281] Updated weights for policy 1, policy_version 9360 (0.0014)
+[2023-09-19 19:57:03,687][67555] Fps is (10 sec: 12287.7, 60 sec: 12424.5, 300 sec: 12274.1). Total num frames: 10805248. Throughput: 0: 6177.6, 1: 6182.3. Samples: 9625572. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:57:03,689][67555] Avg episode reward: [(0, '-37.523'), (1, '-32.967')]
+[2023-09-19 19:57:06,189][68281] Updated weights for policy 1, policy_version 9440 (0.0013)
+[2023-09-19 19:57:06,190][68280] Updated weights for policy 0, policy_version 11728 (0.0012)
+[2023-09-19 19:57:08,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12287.9, 300 sec: 12246.4). Total num frames: 10862592. Throughput: 0: 6168.3, 1: 6167.0. Samples: 9662802. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:57:08,688][67555] Avg episode reward: [(0, '-36.212'), (1, '-33.156')]
+[2023-09-19 19:57:13,141][68280] Updated weights for policy 0, policy_version 11808 (0.0014)
+[2023-09-19 19:57:13,141][68281] Updated weights for policy 1, policy_version 9520 (0.0014)
+[2023-09-19 19:57:13,687][67555] Fps is (10 sec: 11468.8, 60 sec: 12151.4, 300 sec: 12218.6). Total num frames: 10919936. Throughput: 0: 6108.9, 1: 6106.9. Samples: 9732652. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:57:13,688][67555] Avg episode reward: [(0, '-36.428'), (1, '-32.866')]
+[2023-09-19 19:57:13,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000011808_6045696.pth...
+[2023-09-19 19:57:13,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000009520_4874240.pth...
+[2023-09-19 19:57:13,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000011456_5865472.pth
+[2023-09-19 19:57:13,707][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000009168_4694016.pth
+[2023-09-19 19:57:18,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12288.0, 300 sec: 12246.3). Total num frames: 10985472. Throughput: 0: 6042.2, 1: 6047.9. Samples: 9805826. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 19:57:18,688][67555] Avg episode reward: [(0, '-34.842'), (1, '-32.092')]
+[2023-09-19 19:57:19,924][68281] Updated weights for policy 1, policy_version 9600 (0.0013)
+[2023-09-19 19:57:19,924][68280] Updated weights for policy 0, policy_version 11888 (0.0013)
+[2023-09-19 19:57:23,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12288.0, 300 sec: 12218.6). Total num frames: 11042816. Throughput: 0: 6041.3, 1: 6040.6. Samples: 9841426. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 19:57:23,688][67555] Avg episode reward: [(0, '-36.095'), (1, '-32.546')]
+[2023-09-19 19:57:26,676][68280] Updated weights for policy 0, policy_version 11968 (0.0012)
+[2023-09-19 19:57:26,677][68281] Updated weights for policy 1, policy_version 9680 (0.0010)
+[2023-09-19 19:57:28,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12288.0, 300 sec: 12246.3). Total num frames: 11108352. Throughput: 0: 6062.8, 1: 6062.5. Samples: 9914674. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:57:28,688][67555] Avg episode reward: [(0, '-35.634'), (1, '-33.209')]
+[2023-09-19 19:57:28,696][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000009704_4968448.pth...
+[2023-09-19 19:57:28,696][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000011992_6139904.pth...
+[2023-09-19 19:57:28,701][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000009352_4788224.pth
+[2023-09-19 19:57:28,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000011640_5959680.pth
+[2023-09-19 19:57:33,166][68280] Updated weights for policy 0, policy_version 12048 (0.0013)
+[2023-09-19 19:57:33,166][68281] Updated weights for policy 1, policy_version 9760 (0.0012)
+[2023-09-19 19:57:33,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12151.5, 300 sec: 12218.6). Total num frames: 11165696. Throughput: 0: 6090.4, 1: 6089.4. Samples: 9991488. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:57:33,688][67555] Avg episode reward: [(0, '-35.096'), (1, '-33.668')]
+[2023-09-19 19:57:38,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12151.9, 300 sec: 12246.4). Total num frames: 11231232. Throughput: 0: 6076.7, 1: 6080.5. Samples: 10027042. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
+[2023-09-19 19:57:38,688][67555] Avg episode reward: [(0, '-33.648'), (1, '-33.411')]
+[2023-09-19 19:57:39,921][68281] Updated weights for policy 1, policy_version 9840 (0.0014)
+[2023-09-19 19:57:39,921][68280] Updated weights for policy 0, policy_version 12128 (0.0014)
+[2023-09-19 19:57:43,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.5, 300 sec: 12246.3). Total num frames: 11288576. Throughput: 0: 6089.1, 1: 6090.7. Samples: 10100284. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
+[2023-09-19 19:57:43,688][67555] Avg episode reward: [(0, '-35.779'), (1, '-33.242')]
+[2023-09-19 19:57:43,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000009880_5058560.pth...
+[2023-09-19 19:57:43,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000012168_6230016.pth...
+[2023-09-19 19:57:43,704][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000009520_4874240.pth
+[2023-09-19 19:57:43,705][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000011808_6045696.pth
+[2023-09-19 19:57:46,709][68281] Updated weights for policy 1, policy_version 9920 (0.0014)
+[2023-09-19 19:57:46,709][68280] Updated weights for policy 0, policy_version 12208 (0.0013)
+[2023-09-19 19:57:48,687][67555] Fps is (10 sec: 11468.8, 60 sec: 12151.4, 300 sec: 12218.6). Total num frames: 11345920. Throughput: 0: 6095.8, 1: 6094.7. Samples: 10174142. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:57:48,688][67555] Avg episode reward: [(0, '-35.151'), (1, '-33.014')]
+[2023-09-19 19:57:53,377][68281] Updated weights for policy 1, policy_version 10000 (0.0011)
+[2023-09-19 19:57:53,378][68280] Updated weights for policy 0, policy_version 12288 (0.0013)
+[2023-09-19 19:57:53,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12151.4, 300 sec: 12246.3). Total num frames: 11411456. Throughput: 0: 6077.9, 1: 6078.5. Samples: 10209840. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:57:53,688][67555] Avg episode reward: [(0, '-35.206'), (1, '-33.135')]
+[2023-09-19 19:57:58,687][67555] Fps is (10 sec: 13107.5, 60 sec: 12288.1, 300 sec: 12274.1). Total num frames: 11476992. Throughput: 0: 6129.5, 1: 6128.2. Samples: 10284246. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:57:58,688][67555] Avg episode reward: [(0, '-35.488'), (1, '-31.984')]
+[2023-09-19 19:57:58,696][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000012352_6324224.pth...
+[2023-09-19 19:57:58,696][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000010064_5152768.pth...
+[2023-09-19 19:57:58,703][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000009704_4968448.pth
+[2023-09-19 19:57:58,710][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000011992_6139904.pth
+[2023-09-19 19:57:59,956][68280] Updated weights for policy 0, policy_version 12368 (0.0012)
+[2023-09-19 19:57:59,957][68281] Updated weights for policy 1, policy_version 10080 (0.0013)
+[2023-09-19 19:58:03,687][67555] Fps is (10 sec: 12288.4, 60 sec: 12151.5, 300 sec: 12246.4). Total num frames: 11534336. Throughput: 0: 6134.6, 1: 6129.2. Samples: 10357692. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 19:58:03,688][67555] Avg episode reward: [(0, '-36.391'), (1, '-32.632')]
+[2023-09-19 19:58:06,586][68280] Updated weights for policy 0, policy_version 12448 (0.0010)
+[2023-09-19 19:58:06,586][68281] Updated weights for policy 1, policy_version 10160 (0.0015)
+[2023-09-19 19:58:08,687][67555] Fps is (10 sec: 12287.7, 60 sec: 12288.0, 300 sec: 12274.1). Total num frames: 11599872. Throughput: 0: 6156.1, 1: 6159.8. Samples: 10395642. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 19:58:08,688][67555] Avg episode reward: [(0, '-35.267'), (1, '-31.925')]
+[2023-09-19 19:58:13,288][68280] Updated weights for policy 0, policy_version 12528 (0.0014)
+[2023-09-19 19:58:13,288][68281] Updated weights for policy 1, policy_version 10240 (0.0014)
+[2023-09-19 19:58:13,687][67555] Fps is (10 sec: 12287.7, 60 sec: 12288.0, 300 sec: 12274.1). Total num frames: 11657216. Throughput: 0: 6161.3, 1: 6165.7. Samples: 10469390. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:58:13,688][67555] Avg episode reward: [(0, '-35.307'), (1, '-33.287')]
+[2023-09-19 19:58:13,695][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000012528_6414336.pth...
+[2023-09-19 19:58:13,695][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000010240_5242880.pth...
+[2023-09-19 19:58:13,700][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000012168_6230016.pth
+[2023-09-19 19:58:13,701][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000009880_5058560.pth
+[2023-09-19 19:58:18,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12288.0, 300 sec: 12274.1). Total num frames: 11722752. Throughput: 0: 6123.0, 1: 6125.9. Samples: 10542688. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:58:18,688][67555] Avg episode reward: [(0, '-35.811'), (1, '-33.526')]
+[2023-09-19 19:58:20,099][68280] Updated weights for policy 0, policy_version 12608 (0.0013)
+[2023-09-19 19:58:20,099][68281] Updated weights for policy 1, policy_version 10320 (0.0013)
+[2023-09-19 19:58:23,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12288.0, 300 sec: 12274.1). Total num frames: 11780096. Throughput: 0: 6117.2, 1: 6113.8. Samples: 10577434. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 19:58:23,688][67555] Avg episode reward: [(0, '-35.459'), (1, '-32.848')]
+[2023-09-19 19:58:26,608][68280] Updated weights for policy 0, policy_version 12688 (0.0013)
+[2023-09-19 19:58:26,608][68281] Updated weights for policy 1, policy_version 10400 (0.0014)
+[2023-09-19 19:58:28,687][67555] Fps is (10 sec: 11469.0, 60 sec: 12151.5, 300 sec: 12246.3). Total num frames: 11837440. Throughput: 0: 6145.9, 1: 6144.6. Samples: 10653358. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 19:58:28,687][67555] Avg episode reward: [(0, '-35.917'), (1, '-32.238')]
+[2023-09-19 19:58:28,696][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000012712_6508544.pth...
+[2023-09-19 19:58:28,700][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000012352_6324224.pth
+[2023-09-19 19:58:28,703][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000010424_5337088.pth...
+[2023-09-19 19:58:28,707][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000010064_5152768.pth
+[2023-09-19 19:58:33,589][68280] Updated weights for policy 0, policy_version 12768 (0.0014)
+[2023-09-19 19:58:33,589][68281] Updated weights for policy 1, policy_version 10480 (0.0013)
+[2023-09-19 19:58:33,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12288.0, 300 sec: 12246.3). Total num frames: 11902976. Throughput: 0: 6101.2, 1: 6103.0. Samples: 10723328. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
+[2023-09-19 19:58:33,688][67555] Avg episode reward: [(0, '-35.193'), (1, '-33.148')]
+[2023-09-19 19:58:38,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12151.5, 300 sec: 12246.4). Total num frames: 11960320. Throughput: 0: 6106.0, 1: 6106.7. Samples: 10759408. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
+[2023-09-19 19:58:38,688][67555] Avg episode reward: [(0, '-35.111'), (1, '-33.293')]
+[2023-09-19 19:58:40,291][68280] Updated weights for policy 0, policy_version 12848 (0.0009)
+[2023-09-19 19:58:40,292][68281] Updated weights for policy 1, policy_version 10560 (0.0013)
+[2023-09-19 19:58:43,688][67555] Fps is (10 sec: 12287.3, 60 sec: 12287.9, 300 sec: 12246.3). Total num frames: 12025856. Throughput: 0: 6108.4, 1: 6110.7. Samples: 10834114. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:58:43,689][67555] Avg episode reward: [(0, '-36.770'), (1, '-32.631')]
+[2023-09-19 19:58:43,701][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000012888_6598656.pth...
+[2023-09-19 19:58:43,701][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000010600_5427200.pth...
+[2023-09-19 19:58:43,709][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000010240_5242880.pth
+[2023-09-19 19:58:43,709][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000012528_6414336.pth
+[2023-09-19 19:58:47,076][68280] Updated weights for policy 0, policy_version 12928 (0.0011)
+[2023-09-19 19:58:47,077][68281] Updated weights for policy 1, policy_version 10640 (0.0013)
+[2023-09-19 19:58:48,687][67555] Fps is (10 sec: 12287.7, 60 sec: 12288.0, 300 sec: 12246.3). Total num frames: 12083200. Throughput: 0: 6076.5, 1: 6078.5. Samples: 10904672. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:58:48,688][67555] Avg episode reward: [(0, '-35.108'), (1, '-33.778')]
+[2023-09-19 19:58:53,687][67555] Fps is (10 sec: 11469.2, 60 sec: 12151.5, 300 sec: 12218.6). Total num frames: 12140544. Throughput: 0: 6065.4, 1: 6061.9. Samples: 10941372. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:58:53,688][67555] Avg episode reward: [(0, '-36.152'), (1, '-32.478')]
+[2023-09-19 19:58:53,793][68281] Updated weights for policy 1, policy_version 10720 (0.0012)
+[2023-09-19 19:58:53,793][68280] Updated weights for policy 0, policy_version 13008 (0.0014)
+[2023-09-19 19:58:58,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12151.4, 300 sec: 12246.3). Total num frames: 12206080. Throughput: 0: 6087.7, 1: 6083.1. Samples: 11017078. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:58:58,688][67555] Avg episode reward: [(0, '-36.467'), (1, '-33.123')]
+[2023-09-19 19:58:58,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000013064_6688768.pth...
+[2023-09-19 19:58:58,700][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000010776_5517312.pth...
+[2023-09-19 19:58:58,707][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000012712_6508544.pth
+[2023-09-19 19:58:58,708][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000010424_5337088.pth
+[2023-09-19 19:59:00,260][68280] Updated weights for policy 0, policy_version 13088 (0.0013)
+[2023-09-19 19:59:00,260][68281] Updated weights for policy 1, policy_version 10800 (0.0012)
+[2023-09-19 19:59:03,687][67555] Fps is (10 sec: 13107.2, 60 sec: 12287.9, 300 sec: 12274.1). Total num frames: 12271616. Throughput: 0: 6102.8, 1: 6104.2. Samples: 11092002. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:59:03,688][67555] Avg episode reward: [(0, '-35.628'), (1, '-32.873')]
+[2023-09-19 19:59:06,814][68281] Updated weights for policy 1, policy_version 10880 (0.0013)
+[2023-09-19 19:59:06,814][68280] Updated weights for policy 0, policy_version 13168 (0.0013)
+[2023-09-19 19:59:08,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12151.5, 300 sec: 12246.4). Total num frames: 12328960. Throughput: 0: 6145.9, 1: 6145.3. Samples: 11130540. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:59:08,688][67555] Avg episode reward: [(0, '-34.812'), (1, '-32.751')]
+[2023-09-19 19:59:13,581][68280] Updated weights for policy 0, policy_version 13248 (0.0014)
+[2023-09-19 19:59:13,581][68281] Updated weights for policy 1, policy_version 10960 (0.0015)
+[2023-09-19 19:59:13,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12288.0, 300 sec: 12274.1). Total num frames: 12394496. Throughput: 0: 6093.0, 1: 6093.6. Samples: 11201758. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:59:13,688][67555] Avg episode reward: [(0, '-35.066'), (1, '-32.707')]
+[2023-09-19 19:59:13,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000010960_5611520.pth...
+[2023-09-19 19:59:13,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000013248_6782976.pth...
+[2023-09-19 19:59:13,703][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000010600_5427200.pth
+[2023-09-19 19:59:13,705][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000012888_6598656.pth
+[2023-09-19 19:59:18,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12151.5, 300 sec: 12246.3). Total num frames: 12451840. Throughput: 0: 6162.4, 1: 6157.1. Samples: 11277708. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:59:18,688][67555] Avg episode reward: [(0, '-35.180'), (1, '-32.176')]
+[2023-09-19 19:59:20,183][68281] Updated weights for policy 1, policy_version 11040 (0.0011)
+[2023-09-19 19:59:20,183][68280] Updated weights for policy 0, policy_version 13328 (0.0014)
+[2023-09-19 19:59:23,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12288.0, 300 sec: 12246.3). Total num frames: 12517376. Throughput: 0: 6151.0, 1: 6154.5. Samples: 11313154. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 19:59:23,688][67555] Avg episode reward: [(0, '-35.026'), (1, '-32.346')]
+[2023-09-19 19:59:27,082][68281] Updated weights for policy 1, policy_version 11120 (0.0015)
+[2023-09-19 19:59:27,082][68280] Updated weights for policy 0, policy_version 13408 (0.0013)
+[2023-09-19 19:59:28,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12288.0, 300 sec: 12218.6). Total num frames: 12574720. Throughput: 0: 6123.7, 1: 6123.1. Samples: 11385212. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 19:59:28,688][67555] Avg episode reward: [(0, '-35.189'), (1, '-32.711')]
+[2023-09-19 19:59:28,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000013424_6873088.pth...
+[2023-09-19 19:59:28,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000011136_5701632.pth...
+[2023-09-19 19:59:28,703][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000013064_6688768.pth
+[2023-09-19 19:59:28,704][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000010776_5517312.pth
+[2023-09-19 19:59:33,588][68280] Updated weights for policy 0, policy_version 13488 (0.0011)
+[2023-09-19 19:59:33,588][68281] Updated weights for policy 1, policy_version 11200 (0.0012)
+[2023-09-19 19:59:33,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12288.0, 300 sec: 12246.4). Total num frames: 12640256. Throughput: 0: 6175.4, 1: 6178.2. Samples: 11460582. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:59:33,688][67555] Avg episode reward: [(0, '-34.047'), (1, '-32.560')]
+[2023-09-19 19:59:38,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12288.0, 300 sec: 12246.4). Total num frames: 12697600. Throughput: 0: 6173.4, 1: 6173.6. Samples: 11496984. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:59:38,688][67555] Avg episode reward: [(0, '-35.500'), (1, '-31.910')]
+[2023-09-19 19:59:40,344][68281] Updated weights for policy 1, policy_version 11280 (0.0016)
+[2023-09-19 19:59:40,344][68280] Updated weights for policy 0, policy_version 13568 (0.0017)
+[2023-09-19 19:59:43,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12288.1, 300 sec: 12246.4). Total num frames: 12763136. Throughput: 0: 6133.4, 1: 6134.3. Samples: 11569122. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 19:59:43,687][67555] Avg episode reward: [(0, '-36.246'), (1, '-32.597')]
+[2023-09-19 19:59:43,696][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000013608_6967296.pth...
+[2023-09-19 19:59:43,696][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000011320_5795840.pth...
+[2023-09-19 19:59:43,700][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000013248_6782976.pth
+[2023-09-19 19:59:43,701][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000010960_5611520.pth
+[2023-09-19 19:59:46,921][68280] Updated weights for policy 0, policy_version 13648 (0.0014)
+[2023-09-19 19:59:46,921][68281] Updated weights for policy 1, policy_version 11360 (0.0014)
+[2023-09-19 19:59:48,687][67555] Fps is (10 sec: 12287.6, 60 sec: 12288.0, 300 sec: 12218.6). Total num frames: 12820480. Throughput: 0: 6145.0, 1: 6142.2. Samples: 11644928. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 19:59:48,688][67555] Avg episode reward: [(0, '-35.605'), (1, '-32.488')]
+[2023-09-19 19:59:53,580][68281] Updated weights for policy 1, policy_version 11440 (0.0011)
+[2023-09-19 19:59:53,580][68280] Updated weights for policy 0, policy_version 13728 (0.0014)
+[2023-09-19 19:59:53,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12424.5, 300 sec: 12246.3). Total num frames: 12886016. Throughput: 0: 6117.5, 1: 6118.6. Samples: 11681164. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:59:53,688][67555] Avg episode reward: [(0, '-34.499'), (1, '-33.323')]
+[2023-09-19 19:59:58,687][67555] Fps is (10 sec: 12288.4, 60 sec: 12288.1, 300 sec: 12246.4). Total num frames: 12943360. Throughput: 0: 6159.3, 1: 6158.3. Samples: 11756044. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 19:59:58,687][67555] Avg episode reward: [(0, '-35.106'), (1, '-32.020')]
+[2023-09-19 19:59:58,693][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000013784_7057408.pth...
+[2023-09-19 19:59:58,693][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000011496_5885952.pth...
+[2023-09-19 19:59:58,698][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000013424_6873088.pth
+[2023-09-19 19:59:58,698][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000011136_5701632.pth
+[2023-09-19 20:00:00,250][68280] Updated weights for policy 0, policy_version 13808 (0.0011)
+[2023-09-19 20:00:00,251][68281] Updated weights for policy 1, policy_version 11520 (0.0015)
+[2023-09-19 20:00:03,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12288.0, 300 sec: 12246.3). Total num frames: 13008896. Throughput: 0: 6145.8, 1: 6146.6. Samples: 11830866. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:00:03,688][67555] Avg episode reward: [(0, '-36.146'), (1, '-32.932')]
+[2023-09-19 20:00:06,691][68281] Updated weights for policy 1, policy_version 11600 (0.0012)
+[2023-09-19 20:00:06,691][68280] Updated weights for policy 0, policy_version 13888 (0.0011)
+[2023-09-19 20:00:08,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12288.0, 300 sec: 12218.6). Total num frames: 13066240. Throughput: 0: 6186.2, 1: 6182.3. Samples: 11869738. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:00:08,688][67555] Avg episode reward: [(0, '-34.949'), (1, '-33.061')]
+[2023-09-19 20:00:13,413][68280] Updated weights for policy 0, policy_version 13968 (0.0014)
+[2023-09-19 20:00:13,413][68281] Updated weights for policy 1, policy_version 11680 (0.0014)
+[2023-09-19 20:00:13,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12288.0, 300 sec: 12246.3). Total num frames: 13131776. Throughput: 0: 6188.1, 1: 6187.6. Samples: 11942122. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:00:13,688][67555] Avg episode reward: [(0, '-33.444'), (1, '-32.497')]
+[2023-09-19 20:00:13,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000013968_7151616.pth...
+[2023-09-19 20:00:13,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000011680_5980160.pth...
+[2023-09-19 20:00:13,707][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000013608_6967296.pth
+[2023-09-19 20:00:13,708][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000011320_5795840.pth
+[2023-09-19 20:00:18,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12288.0, 300 sec: 12246.3). Total num frames: 13189120. Throughput: 0: 6158.8, 1: 6155.1. Samples: 12014712. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:00:18,688][67555] Avg episode reward: [(0, '-33.436'), (1, '-33.041')]
+[2023-09-19 20:00:20,183][68280] Updated weights for policy 0, policy_version 14048 (0.0013)
+[2023-09-19 20:00:20,183][68281] Updated weights for policy 1, policy_version 11760 (0.0013)
+[2023-09-19 20:00:23,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12288.0, 300 sec: 12246.3). Total num frames: 13254656. Throughput: 0: 6153.2, 1: 6152.5. Samples: 12050740. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:00:23,688][67555] Avg episode reward: [(0, '-34.993'), (1, '-32.602')]
+[2023-09-19 20:00:26,900][68281] Updated weights for policy 1, policy_version 11840 (0.0017)
+[2023-09-19 20:00:26,901][68280] Updated weights for policy 0, policy_version 14128 (0.0016)
+[2023-09-19 20:00:28,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12287.9, 300 sec: 12246.3). Total num frames: 13312000. Throughput: 0: 6165.8, 1: 6169.1. Samples: 12124194. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 20:00:28,688][67555] Avg episode reward: [(0, '-34.765'), (1, '-32.233')]
+[2023-09-19 20:00:28,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000014144_7241728.pth...
+[2023-09-19 20:00:28,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000011856_6070272.pth...
+[2023-09-19 20:00:28,705][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000011496_5885952.pth
+[2023-09-19 20:00:28,710][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000013784_7057408.pth
+[2023-09-19 20:00:33,499][68281] Updated weights for policy 1, policy_version 11920 (0.0015)
+[2023-09-19 20:00:33,499][68280] Updated weights for policy 0, policy_version 14208 (0.0013)
+[2023-09-19 20:00:33,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12288.0, 300 sec: 12274.1). Total num frames: 13377536. Throughput: 0: 6151.6, 1: 6150.0. Samples: 12198500. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 20:00:33,688][67555] Avg episode reward: [(0, '-34.244'), (1, '-32.350')]
+[2023-09-19 20:00:38,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12288.0, 300 sec: 12246.3). Total num frames: 13434880. Throughput: 0: 6142.1, 1: 6140.9. Samples: 12233898. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:00:38,688][67555] Avg episode reward: [(0, '-35.149'), (1, '-32.586')]
+[2023-09-19 20:00:40,107][68281] Updated weights for policy 1, policy_version 12000 (0.0014)
+[2023-09-19 20:00:40,107][68280] Updated weights for policy 0, policy_version 14288 (0.0014)
+[2023-09-19 20:00:43,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12287.9, 300 sec: 12274.1). Total num frames: 13500416. Throughput: 0: 6141.5, 1: 6143.2. Samples: 12308858. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:00:43,688][67555] Avg episode reward: [(0, '-33.327'), (1, '-31.723')]
+[2023-09-19 20:00:43,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000014328_7335936.pth...
+[2023-09-19 20:00:43,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000012040_6164480.pth...
+[2023-09-19 20:00:43,706][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000011680_5980160.pth
+[2023-09-19 20:00:43,707][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000013968_7151616.pth
+[2023-09-19 20:00:46,902][68280] Updated weights for policy 0, policy_version 14368 (0.0013)
+[2023-09-19 20:00:46,902][68281] Updated weights for policy 1, policy_version 12080 (0.0014)
+[2023-09-19 20:00:48,687][67555] Fps is (10 sec: 12288.5, 60 sec: 12288.1, 300 sec: 12274.1). Total num frames: 13557760. Throughput: 0: 6110.6, 1: 6110.7. Samples: 12380820. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:00:48,687][67555] Avg episode reward: [(0, '-33.584'), (1, '-32.643')]
+[2023-09-19 20:00:53,687][67555] Fps is (10 sec: 11469.1, 60 sec: 12151.5, 300 sec: 12246.4). Total num frames: 13615104. Throughput: 0: 6080.2, 1: 6079.5. Samples: 12416926. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:00:53,688][67555] Avg episode reward: [(0, '-33.801'), (1, '-32.845')]
+[2023-09-19 20:00:53,790][68281] Updated weights for policy 1, policy_version 12160 (0.0012)
+[2023-09-19 20:00:53,790][68280] Updated weights for policy 0, policy_version 14448 (0.0012)
+[2023-09-19 20:00:58,687][67555] Fps is (10 sec: 12287.4, 60 sec: 12287.9, 300 sec: 12274.1). Total num frames: 13680640. Throughput: 0: 6068.1, 1: 6067.7. Samples: 12488234. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:00:58,688][67555] Avg episode reward: [(0, '-33.434'), (1, '-32.398')]
+[2023-09-19 20:00:58,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000012216_6254592.pth...
+[2023-09-19 20:00:58,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000014504_7426048.pth...
+[2023-09-19 20:00:58,703][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000014144_7241728.pth
+[2023-09-19 20:00:58,703][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000011856_6070272.pth
+[2023-09-19 20:01:00,505][68280] Updated weights for policy 0, policy_version 14528 (0.0014)
+[2023-09-19 20:01:00,505][68281] Updated weights for policy 1, policy_version 12240 (0.0014)
+[2023-09-19 20:01:03,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.5, 300 sec: 12246.3). Total num frames: 13737984. Throughput: 0: 6080.9, 1: 6081.1. Samples: 12562004. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:01:03,688][67555] Avg episode reward: [(0, '-32.794'), (1, '-31.999')]
+[2023-09-19 20:01:07,220][68280] Updated weights for policy 0, policy_version 14608 (0.0015)
+[2023-09-19 20:01:07,220][68281] Updated weights for policy 1, policy_version 12320 (0.0014)
+[2023-09-19 20:01:08,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12288.0, 300 sec: 12246.3). Total num frames: 13803520. Throughput: 0: 6093.8, 1: 6097.8. Samples: 12599362. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 20:01:08,688][67555] Avg episode reward: [(0, '-32.294'), (1, '-32.111')]
+[2023-09-19 20:01:13,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12151.5, 300 sec: 12246.3). Total num frames: 13860864. Throughput: 0: 6081.8, 1: 6079.0. Samples: 12671428. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 20:01:13,688][67555] Avg episode reward: [(0, '-33.360'), (1, '-33.023')]
+[2023-09-19 20:01:13,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000014680_7516160.pth...
+[2023-09-19 20:01:13,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000012392_6344704.pth...
+[2023-09-19 20:01:13,705][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000012040_6164480.pth
+[2023-09-19 20:01:13,705][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000014328_7335936.pth
+[2023-09-19 20:01:14,144][68280] Updated weights for policy 0, policy_version 14688 (0.0014)
+[2023-09-19 20:01:14,144][68281] Updated weights for policy 1, policy_version 12400 (0.0014)
+[2023-09-19 20:01:18,687][67555] Fps is (10 sec: 11468.6, 60 sec: 12151.4, 300 sec: 12246.3). Total num frames: 13918208. Throughput: 0: 6051.1, 1: 6051.4. Samples: 12743114. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 20:01:18,689][67555] Avg episode reward: [(0, '-33.192'), (1, '-32.166')]
+[2023-09-19 20:01:20,880][68281] Updated weights for policy 1, policy_version 12480 (0.0013)
+[2023-09-19 20:01:20,880][68280] Updated weights for policy 0, policy_version 14768 (0.0012)
+[2023-09-19 20:01:23,687][67555] Fps is (10 sec: 12288.5, 60 sec: 12151.6, 300 sec: 12246.4). Total num frames: 13983744. Throughput: 0: 6060.9, 1: 6064.9. Samples: 12779554. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:01:23,687][67555] Avg episode reward: [(0, '-33.290'), (1, '-32.321')]
+[2023-09-19 20:01:27,680][68281] Updated weights for policy 1, policy_version 12560 (0.0011)
+[2023-09-19 20:01:27,680][68280] Updated weights for policy 0, policy_version 14848 (0.0014)
+[2023-09-19 20:01:28,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12151.5, 300 sec: 12218.6). Total num frames: 14041088. Throughput: 0: 6034.2, 1: 6032.6. Samples: 12851866. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:01:28,688][67555] Avg episode reward: [(0, '-33.946'), (1, '-32.833')]
+[2023-09-19 20:01:28,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000014856_7606272.pth...
+[2023-09-19 20:01:28,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000012568_6434816.pth...
+[2023-09-19 20:01:28,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000014504_7426048.pth
+[2023-09-19 20:01:28,706][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000012216_6254592.pth
+[2023-09-19 20:01:33,687][67555] Fps is (10 sec: 11468.4, 60 sec: 12014.9, 300 sec: 12190.9). Total num frames: 14098432. Throughput: 0: 6047.6, 1: 6047.4. Samples: 12925096. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 20:01:33,688][67555] Avg episode reward: [(0, '-33.079'), (1, '-32.274')]
+[2023-09-19 20:01:34,467][68281] Updated weights for policy 1, policy_version 12640 (0.0014)
+[2023-09-19 20:01:34,469][68280] Updated weights for policy 0, policy_version 14928 (0.0016)
+[2023-09-19 20:01:38,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12151.5, 300 sec: 12218.6). Total num frames: 14163968. Throughput: 0: 6032.6, 1: 6034.4. Samples: 12959940. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 20:01:38,688][67555] Avg episode reward: [(0, '-31.943'), (1, '-31.238')]
+[2023-09-19 20:01:38,690][68201] Saving new best policy, reward=-31.308!
+[2023-09-19 20:01:41,246][68280] Updated weights for policy 0, policy_version 15008 (0.0015)
+[2023-09-19 20:01:41,246][68281] Updated weights for policy 1, policy_version 12720 (0.0016)
+[2023-09-19 20:01:43,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12015.0, 300 sec: 12218.6). Total num frames: 14221312. Throughput: 0: 6053.2, 1: 6055.0. Samples: 13033104. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 20:01:43,688][67555] Avg episode reward: [(0, '-33.790'), (1, '-31.732')]
+[2023-09-19 20:01:43,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000012744_6524928.pth...
+[2023-09-19 20:01:43,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000015032_7696384.pth...
+[2023-09-19 20:01:43,704][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000012392_6344704.pth
+[2023-09-19 20:01:43,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000014680_7516160.pth
+[2023-09-19 20:01:48,015][68281] Updated weights for policy 1, policy_version 12800 (0.0011)
+[2023-09-19 20:01:48,015][68280] Updated weights for policy 0, policy_version 15088 (0.0014)
+[2023-09-19 20:01:48,687][67555] Fps is (10 sec: 11468.7, 60 sec: 12014.9, 300 sec: 12190.8). Total num frames: 14278656. Throughput: 0: 6047.1, 1: 6048.1. Samples: 13106286. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 20:01:48,688][67555] Avg episode reward: [(0, '-33.320'), (1, '-31.388')]
+[2023-09-19 20:01:53,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.4, 300 sec: 12218.6). Total num frames: 14344192. Throughput: 0: 6003.3, 1: 5999.5. Samples: 13139486. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:01:53,688][67555] Avg episode reward: [(0, '-33.269'), (1, '-31.177')]
+[2023-09-19 20:01:53,690][68201] Saving new best policy, reward=-31.177!
+[2023-09-19 20:01:55,049][68280] Updated weights for policy 0, policy_version 15168 (0.0014)
+[2023-09-19 20:01:55,049][68281] Updated weights for policy 1, policy_version 12880 (0.0015)
+[2023-09-19 20:01:58,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12014.9, 300 sec: 12190.8). Total num frames: 14401536. Throughput: 0: 6015.6, 1: 6014.1. Samples: 13212764. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:01:58,688][67555] Avg episode reward: [(0, '-32.465'), (1, '-31.712')]
+[2023-09-19 20:01:58,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000015208_7786496.pth...
+[2023-09-19 20:01:58,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000012920_6615040.pth...
+[2023-09-19 20:01:58,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000014856_7606272.pth
+[2023-09-19 20:01:58,707][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000012568_6434816.pth
+[2023-09-19 20:02:01,974][68280] Updated weights for policy 0, policy_version 15248 (0.0013)
+[2023-09-19 20:02:01,975][68281] Updated weights for policy 1, policy_version 12960 (0.0015)
+[2023-09-19 20:02:03,687][67555] Fps is (10 sec: 11469.0, 60 sec: 12014.9, 300 sec: 12190.8). Total num frames: 14458880. Throughput: 0: 5989.8, 1: 5989.6. Samples: 13282182. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 20:02:03,688][67555] Avg episode reward: [(0, '-33.772'), (1, '-31.511')]
+[2023-09-19 20:02:08,656][68280] Updated weights for policy 0, policy_version 15328 (0.0014)
+[2023-09-19 20:02:08,656][68281] Updated weights for policy 1, policy_version 13040 (0.0014)
+[2023-09-19 20:02:08,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12014.9, 300 sec: 12218.6). Total num frames: 14524416. Throughput: 0: 5992.3, 1: 5988.3. Samples: 13318684. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 20:02:08,688][67555] Avg episode reward: [(0, '-34.013'), (1, '-31.452')]
+[2023-09-19 20:02:13,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12014.9, 300 sec: 12190.8). Total num frames: 14581760. Throughput: 0: 5991.0, 1: 5991.4. Samples: 13391074. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 20:02:13,688][67555] Avg episode reward: [(0, '-32.708'), (1, '-31.546')]
+[2023-09-19 20:02:13,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000015384_7876608.pth...
+[2023-09-19 20:02:13,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000013096_6705152.pth...
+[2023-09-19 20:02:13,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000015032_7696384.pth
+[2023-09-19 20:02:13,707][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000012744_6524928.pth
+[2023-09-19 20:02:15,474][68280] Updated weights for policy 0, policy_version 15408 (0.0013)
+[2023-09-19 20:02:15,474][68281] Updated weights for policy 1, policy_version 13120 (0.0015)
+[2023-09-19 20:02:18,687][67555] Fps is (10 sec: 11468.6, 60 sec: 12015.0, 300 sec: 12190.8). Total num frames: 14639104. Throughput: 0: 5993.9, 1: 5993.0. Samples: 13464504. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:02:18,688][67555] Avg episode reward: [(0, '-32.923'), (1, '-32.286')]
+[2023-09-19 20:02:22,082][68280] Updated weights for policy 0, policy_version 15488 (0.0013)
+[2023-09-19 20:02:22,082][68281] Updated weights for policy 1, policy_version 13200 (0.0014)
+[2023-09-19 20:02:23,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12014.9, 300 sec: 12190.8). Total num frames: 14704640. Throughput: 0: 6020.6, 1: 6019.1. Samples: 13501726. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:02:23,688][67555] Avg episode reward: [(0, '-32.836'), (1, '-31.692')]
+[2023-09-19 20:02:28,687][67555] Fps is (10 sec: 12288.4, 60 sec: 12015.0, 300 sec: 12190.8). Total num frames: 14761984. Throughput: 0: 6017.7, 1: 6015.7. Samples: 13574606. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 20:02:28,687][67555] Avg episode reward: [(0, '-33.073'), (1, '-32.266')]
+[2023-09-19 20:02:28,696][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000015560_7966720.pth...
+[2023-09-19 20:02:28,696][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000013272_6795264.pth...
+[2023-09-19 20:02:28,703][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000012920_6615040.pth
+[2023-09-19 20:02:28,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000015208_7786496.pth
+[2023-09-19 20:02:28,932][68280] Updated weights for policy 0, policy_version 15568 (0.0011)
+[2023-09-19 20:02:28,933][68281] Updated weights for policy 1, policy_version 13280 (0.0012)
+[2023-09-19 20:02:33,687][67555] Fps is (10 sec: 11468.9, 60 sec: 12014.9, 300 sec: 12163.0). Total num frames: 14819328. Throughput: 0: 5982.2, 1: 5981.5. Samples: 13644652. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 20:02:33,688][67555] Avg episode reward: [(0, '-31.524'), (1, '-31.340')]
+[2023-09-19 20:02:35,852][68280] Updated weights for policy 0, policy_version 15648 (0.0016)
+[2023-09-19 20:02:35,852][68281] Updated weights for policy 1, policy_version 13360 (0.0013)
+[2023-09-19 20:02:38,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12014.9, 300 sec: 12190.8). Total num frames: 14884864. Throughput: 0: 6012.1, 1: 6015.5. Samples: 13680726. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 20:02:38,688][67555] Avg episode reward: [(0, '-32.410'), (1, '-31.522')]
+[2023-09-19 20:02:42,623][68281] Updated weights for policy 1, policy_version 13440 (0.0014)
+[2023-09-19 20:02:42,623][68280] Updated weights for policy 0, policy_version 15728 (0.0011)
+[2023-09-19 20:02:43,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12015.0, 300 sec: 12190.8). Total num frames: 14942208. Throughput: 0: 6015.0, 1: 6019.1. Samples: 13754294. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 20:02:43,688][67555] Avg episode reward: [(0, '-31.086'), (1, '-32.213')]
+[2023-09-19 20:02:43,696][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000015736_8056832.pth...
+[2023-09-19 20:02:43,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000013448_6885376.pth...
+[2023-09-19 20:02:43,704][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000013096_6705152.pth
+[2023-09-19 20:02:43,707][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000015384_7876608.pth
+[2023-09-19 20:02:48,687][67555] Fps is (10 sec: 11468.9, 60 sec: 12015.0, 300 sec: 12163.0). Total num frames: 14999552. Throughput: 0: 6058.1, 1: 6059.5. Samples: 13827470. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:02:48,688][67555] Avg episode reward: [(0, '-31.354'), (1, '-32.209')]
+[2023-09-19 20:02:49,455][68280] Updated weights for policy 0, policy_version 15808 (0.0011)
+[2023-09-19 20:02:49,456][68281] Updated weights for policy 1, policy_version 13520 (0.0015)
+[2023-09-19 20:02:53,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12015.0, 300 sec: 12163.0). Total num frames: 15065088. Throughput: 0: 6027.7, 1: 6029.2. Samples: 13861248. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:02:53,688][67555] Avg episode reward: [(0, '-33.160'), (1, '-31.560')]
+[2023-09-19 20:02:56,303][68280] Updated weights for policy 0, policy_version 15888 (0.0011)
+[2023-09-19 20:02:56,303][68281] Updated weights for policy 1, policy_version 13600 (0.0014)
+[2023-09-19 20:02:58,687][67555] Fps is (10 sec: 12287.6, 60 sec: 12014.9, 300 sec: 12163.0). Total num frames: 15122432. Throughput: 0: 6037.5, 1: 6041.4. Samples: 13934626. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:02:58,688][67555] Avg episode reward: [(0, '-32.459'), (1, '-31.337')]
+[2023-09-19 20:02:58,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000015912_8146944.pth...
+[2023-09-19 20:02:58,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000013624_6975488.pth...
+[2023-09-19 20:02:58,703][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000013272_6795264.pth
+[2023-09-19 20:02:58,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000015560_7966720.pth
+[2023-09-19 20:03:03,036][68280] Updated weights for policy 0, policy_version 15968 (0.0015)
+[2023-09-19 20:03:03,036][68281] Updated weights for policy 1, policy_version 13680 (0.0013)
+[2023-09-19 20:03:03,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12151.4, 300 sec: 12163.0). Total num frames: 15187968. Throughput: 0: 6024.6, 1: 6026.5. Samples: 14006800. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:03:03,688][67555] Avg episode reward: [(0, '-29.999'), (1, '-32.206')]
+[2023-09-19 20:03:08,687][67555] Fps is (10 sec: 12288.4, 60 sec: 12015.0, 300 sec: 12163.0). Total num frames: 15245312. Throughput: 0: 6030.7, 1: 6030.5. Samples: 14044478. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:03:08,688][67555] Avg episode reward: [(0, '-30.401'), (1, '-31.914')]
+[2023-09-19 20:03:09,698][68280] Updated weights for policy 0, policy_version 16048 (0.0012)
+[2023-09-19 20:03:09,698][68281] Updated weights for policy 1, policy_version 13760 (0.0012)
+[2023-09-19 20:03:13,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12151.5, 300 sec: 12163.0). Total num frames: 15310848. Throughput: 0: 6028.3, 1: 6030.4. Samples: 14117250. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 20:03:13,688][67555] Avg episode reward: [(0, '-30.151'), (1, '-31.522')]
+[2023-09-19 20:03:13,696][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000016096_8241152.pth...
+[2023-09-19 20:03:13,696][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000013808_7069696.pth...
+[2023-09-19 20:03:13,702][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000015736_8056832.pth
+[2023-09-19 20:03:13,702][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000013448_6885376.pth
+[2023-09-19 20:03:16,443][68280] Updated weights for policy 0, policy_version 16128 (0.0012)
+[2023-09-19 20:03:16,444][68281] Updated weights for policy 1, policy_version 13840 (0.0010)
+[2023-09-19 20:03:18,687][67555] Fps is (10 sec: 12287.6, 60 sec: 12151.5, 300 sec: 12163.0). Total num frames: 15368192. Throughput: 0: 6073.3, 1: 6073.0. Samples: 14191238. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 20:03:18,689][67555] Avg episode reward: [(0, '-31.391'), (1, '-31.871')]
+[2023-09-19 20:03:23,063][68281] Updated weights for policy 1, policy_version 13920 (0.0012)
+[2023-09-19 20:03:23,063][68280] Updated weights for policy 0, policy_version 16208 (0.0014)
+[2023-09-19 20:03:23,687][67555] Fps is (10 sec: 11469.0, 60 sec: 12015.0, 300 sec: 12163.0). Total num frames: 15425536. Throughput: 0: 6092.3, 1: 6089.8. Samples: 14228920. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 20:03:23,688][67555] Avg episode reward: [(0, '-29.912'), (1, '-32.188')]
+[2023-09-19 20:03:28,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.4, 300 sec: 12163.0). Total num frames: 15491072. Throughput: 0: 6093.2, 1: 6093.1. Samples: 14302682. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:03:28,688][67555] Avg episode reward: [(0, '-30.494'), (1, '-32.449')]
+[2023-09-19 20:03:28,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000013984_7159808.pth...
+[2023-09-19 20:03:28,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000016272_8331264.pth...
+[2023-09-19 20:03:28,701][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000013624_6975488.pth
+[2023-09-19 20:03:28,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000015912_8146944.pth
+[2023-09-19 20:03:29,820][68281] Updated weights for policy 1, policy_version 14000 (0.0014)
+[2023-09-19 20:03:29,820][68280] Updated weights for policy 0, policy_version 16288 (0.0011)
+[2023-09-19 20:03:33,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.4, 300 sec: 12163.0). Total num frames: 15548416. Throughput: 0: 6057.2, 1: 6057.3. Samples: 14372626. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:03:33,688][67555] Avg episode reward: [(0, '-31.411'), (1, '-31.311')]
+[2023-09-19 20:03:36,420][68281] Updated weights for policy 1, policy_version 14080 (0.0015)
+[2023-09-19 20:03:36,420][68280] Updated weights for policy 0, policy_version 16368 (0.0014)
+[2023-09-19 20:03:38,687][67555] Fps is (10 sec: 12288.5, 60 sec: 12151.5, 300 sec: 12163.1). Total num frames: 15613952. Throughput: 0: 6113.1, 1: 6111.7. Samples: 14411362. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:03:38,688][67555] Avg episode reward: [(0, '-31.133'), (1, '-32.460')]
+[2023-09-19 20:03:43,216][68280] Updated weights for policy 0, policy_version 16448 (0.0013)
+[2023-09-19 20:03:43,216][68281] Updated weights for policy 1, policy_version 14160 (0.0013)
+[2023-09-19 20:03:43,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12151.4, 300 sec: 12163.0). Total num frames: 15671296. Throughput: 0: 6102.8, 1: 6099.3. Samples: 14483718. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:03:43,688][67555] Avg episode reward: [(0, '-31.922'), (1, '-32.143')]
+[2023-09-19 20:03:43,696][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000014160_7249920.pth...
+[2023-09-19 20:03:43,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000016448_8421376.pth...
+[2023-09-19 20:03:43,701][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000013808_7069696.pth
+[2023-09-19 20:03:43,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000016096_8241152.pth
+[2023-09-19 20:03:48,687][67555] Fps is (10 sec: 11468.9, 60 sec: 12151.5, 300 sec: 12163.0). Total num frames: 15728640. Throughput: 0: 6084.4, 1: 6083.5. Samples: 14554350. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
+[2023-09-19 20:03:48,688][67555] Avg episode reward: [(0, '-30.836'), (1, '-32.181')]
+[2023-09-19 20:03:50,208][68280] Updated weights for policy 0, policy_version 16528 (0.0009)
+[2023-09-19 20:03:50,209][68281] Updated weights for policy 1, policy_version 14240 (0.0012)
+[2023-09-19 20:03:53,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.5, 300 sec: 12163.0). Total num frames: 15794176. Throughput: 0: 6058.5, 1: 6063.1. Samples: 14589954. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
+[2023-09-19 20:03:53,688][67555] Avg episode reward: [(0, '-31.179'), (1, '-32.298')]
+[2023-09-19 20:03:56,956][68280] Updated weights for policy 0, policy_version 16608 (0.0015)
+[2023-09-19 20:03:56,956][68281] Updated weights for policy 1, policy_version 14320 (0.0014)
+[2023-09-19 20:03:58,687][67555] Fps is (10 sec: 12287.6, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 15851520. Throughput: 0: 6069.1, 1: 6071.4. Samples: 14663572. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
+[2023-09-19 20:03:58,688][67555] Avg episode reward: [(0, '-31.162'), (1, '-30.173')]
+[2023-09-19 20:03:58,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000016624_8511488.pth...
+[2023-09-19 20:03:58,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000014336_7340032.pth...
+[2023-09-19 20:03:58,702][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000016272_8331264.pth
+[2023-09-19 20:03:58,705][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000013984_7159808.pth
+[2023-09-19 20:03:58,706][68201] Saving new best policy, reward=-30.173!
+[2023-09-19 20:04:03,687][67555] Fps is (10 sec: 11469.1, 60 sec: 12015.0, 300 sec: 12135.3). Total num frames: 15908864. Throughput: 0: 6026.1, 1: 6024.7. Samples: 14733522. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:04:03,688][67555] Avg episode reward: [(0, '-31.067'), (1, '-31.333')]
+[2023-09-19 20:04:03,881][68280] Updated weights for policy 0, policy_version 16688 (0.0014)
+[2023-09-19 20:04:03,881][68281] Updated weights for policy 1, policy_version 14400 (0.0013)
+[2023-09-19 20:04:08,687][67555] Fps is (10 sec: 12288.3, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 15974400. Throughput: 0: 6013.2, 1: 6016.2. Samples: 14770242. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:04:08,687][67555] Avg episode reward: [(0, '-30.179'), (1, '-31.402')]
+[2023-09-19 20:04:10,529][68280] Updated weights for policy 0, policy_version 16768 (0.0014)
+[2023-09-19 20:04:10,529][68281] Updated weights for policy 1, policy_version 14480 (0.0015)
+[2023-09-19 20:04:13,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12015.0, 300 sec: 12135.3). Total num frames: 16031744. Throughput: 0: 6013.2, 1: 6014.2. Samples: 14843908. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:04:13,687][67555] Avg episode reward: [(0, '-30.546'), (1, '-31.778')]
+[2023-09-19 20:04:13,693][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000014512_7430144.pth...
+[2023-09-19 20:04:13,693][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000016800_8601600.pth...
+[2023-09-19 20:04:13,696][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000014160_7249920.pth
+[2023-09-19 20:04:13,700][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000016448_8421376.pth
+[2023-09-19 20:04:17,448][68281] Updated weights for policy 1, policy_version 14560 (0.0013)
+[2023-09-19 20:04:17,448][68280] Updated weights for policy 0, policy_version 16848 (0.0013)
+[2023-09-19 20:04:18,687][67555] Fps is (10 sec: 11469.0, 60 sec: 12015.0, 300 sec: 12107.5). Total num frames: 16089088. Throughput: 0: 6031.0, 1: 6030.2. Samples: 14915378. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:04:18,687][67555] Avg episode reward: [(0, '-32.164'), (1, '-31.477')]
+[2023-09-19 20:04:23,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 16154624. Throughput: 0: 6001.9, 1: 6003.1. Samples: 14951590. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:04:23,688][67555] Avg episode reward: [(0, '-31.305'), (1, '-31.183')]
+[2023-09-19 20:04:24,077][68280] Updated weights for policy 0, policy_version 16928 (0.0014)
+[2023-09-19 20:04:24,077][68281] Updated weights for policy 1, policy_version 14640 (0.0013)
+[2023-09-19 20:04:28,687][67555] Fps is (10 sec: 13107.0, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 16220160. Throughput: 0: 6043.9, 1: 6044.6. Samples: 15027700. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:04:28,687][67555] Avg episode reward: [(0, '-30.025'), (1, '-31.606')]
+[2023-09-19 20:04:28,693][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000016984_8695808.pth...
+[2023-09-19 20:04:28,694][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000014696_7524352.pth...
+[2023-09-19 20:04:28,697][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000016624_8511488.pth
+[2023-09-19 20:04:28,700][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000014336_7340032.pth
+[2023-09-19 20:04:30,526][68281] Updated weights for policy 1, policy_version 14720 (0.0010)
+[2023-09-19 20:04:30,527][68280] Updated weights for policy 0, policy_version 17008 (0.0013)
+[2023-09-19 20:04:33,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 16277504. Throughput: 0: 6076.9, 1: 6078.5. Samples: 15101346. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:04:33,687][67555] Avg episode reward: [(0, '-31.420'), (1, '-31.754')]
+[2023-09-19 20:04:37,353][68280] Updated weights for policy 0, policy_version 17088 (0.0012)
+[2023-09-19 20:04:37,354][68281] Updated weights for policy 1, policy_version 14800 (0.0013)
+[2023-09-19 20:04:38,687][67555] Fps is (10 sec: 11468.7, 60 sec: 12014.9, 300 sec: 12107.5). Total num frames: 16334848. Throughput: 0: 6094.0, 1: 6090.9. Samples: 15138272. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:04:38,688][67555] Avg episode reward: [(0, '-30.243'), (1, '-30.243')]
+[2023-09-19 20:04:43,687][67555] Fps is (10 sec: 12287.6, 60 sec: 12151.4, 300 sec: 12135.3). Total num frames: 16400384. Throughput: 0: 6053.9, 1: 6051.6. Samples: 15208320. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:04:43,688][67555] Avg episode reward: [(0, '-31.687'), (1, '-31.345')]
+[2023-09-19 20:04:43,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000014872_7614464.pth...
+[2023-09-19 20:04:43,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000017160_8785920.pth...
+[2023-09-19 20:04:43,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000016800_8601600.pth
+[2023-09-19 20:04:43,707][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000014512_7430144.pth
+[2023-09-19 20:04:44,307][68280] Updated weights for policy 0, policy_version 17168 (0.0012)
+[2023-09-19 20:04:44,307][68281] Updated weights for policy 1, policy_version 14880 (0.0012)
+[2023-09-19 20:04:48,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.4, 300 sec: 12107.5). Total num frames: 16457728. Throughput: 0: 6067.3, 1: 6068.3. Samples: 15279628. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:04:48,688][67555] Avg episode reward: [(0, '-31.947'), (1, '-31.554')]
+[2023-09-19 20:04:51,108][68280] Updated weights for policy 0, policy_version 17248 (0.0016)
+[2023-09-19 20:04:51,108][68281] Updated weights for policy 1, policy_version 14960 (0.0014)
+[2023-09-19 20:04:53,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 16523264. Throughput: 0: 6076.6, 1: 6073.4. Samples: 15316992. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 20:04:53,688][67555] Avg episode reward: [(0, '-31.621'), (1, '-32.207')]
+[2023-09-19 20:04:57,413][68280] Updated weights for policy 0, policy_version 17328 (0.0014)
+[2023-09-19 20:04:57,413][68281] Updated weights for policy 1, policy_version 15040 (0.0014)
+[2023-09-19 20:04:58,687][67555] Fps is (10 sec: 12288.3, 60 sec: 12151.5, 300 sec: 12107.5). Total num frames: 16580608. Throughput: 0: 6099.2, 1: 6099.2. Samples: 15392834. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 20:04:58,687][67555] Avg episode reward: [(0, '-31.215'), (1, '-30.626')]
+[2023-09-19 20:04:58,695][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000017336_8876032.pth...
+[2023-09-19 20:04:58,695][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000015048_7704576.pth...
+[2023-09-19 20:04:58,702][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000016984_8695808.pth
+[2023-09-19 20:04:58,703][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000014696_7524352.pth
+[2023-09-19 20:05:03,687][67555] Fps is (10 sec: 12288.4, 60 sec: 12288.0, 300 sec: 12135.3). Total num frames: 16646144. Throughput: 0: 6120.4, 1: 6122.8. Samples: 15466318. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:05:03,688][67555] Avg episode reward: [(0, '-29.837'), (1, '-31.749')]
+[2023-09-19 20:05:04,243][68281] Updated weights for policy 1, policy_version 15120 (0.0012)
+[2023-09-19 20:05:04,243][68280] Updated weights for policy 0, policy_version 17408 (0.0012)
+[2023-09-19 20:05:08,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12151.5, 300 sec: 12107.5). Total num frames: 16703488. Throughput: 0: 6135.6, 1: 6134.1. Samples: 15503722. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:05:08,688][67555] Avg episode reward: [(0, '-30.551'), (1, '-31.373')]
+[2023-09-19 20:05:10,962][68280] Updated weights for policy 0, policy_version 17488 (0.0014)
+[2023-09-19 20:05:10,962][68281] Updated weights for policy 1, policy_version 15200 (0.0014)
+[2023-09-19 20:05:13,687][67555] Fps is (10 sec: 12287.5, 60 sec: 12288.0, 300 sec: 12135.3). Total num frames: 16769024. Throughput: 0: 6090.2, 1: 6089.4. Samples: 15575786. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:05:13,688][67555] Avg episode reward: [(0, '-30.425'), (1, '-31.060')]
+[2023-09-19 20:05:13,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000015232_7798784.pth...
+[2023-09-19 20:05:13,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000017520_8970240.pth...
+[2023-09-19 20:05:13,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000017160_8785920.pth
+[2023-09-19 20:05:13,707][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000014872_7614464.pth
+[2023-09-19 20:05:17,679][68281] Updated weights for policy 1, policy_version 15280 (0.0015)
+[2023-09-19 20:05:17,679][68280] Updated weights for policy 0, policy_version 17568 (0.0011)
+[2023-09-19 20:05:18,689][67555] Fps is (10 sec: 12285.9, 60 sec: 12287.6, 300 sec: 12107.4). Total num frames: 16826368. Throughput: 0: 6089.5, 1: 6087.4. Samples: 15649326. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 20:05:18,691][67555] Avg episode reward: [(0, '-31.302'), (1, '-31.544')]
+[2023-09-19 20:05:23,687][67555] Fps is (10 sec: 11468.9, 60 sec: 12151.5, 300 sec: 12107.5). Total num frames: 16883712. Throughput: 0: 6080.3, 1: 6079.4. Samples: 15685458. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 20:05:23,688][67555] Avg episode reward: [(0, '-29.982'), (1, '-31.123')]
+[2023-09-19 20:05:24,592][68281] Updated weights for policy 1, policy_version 15360 (0.0014)
+[2023-09-19 20:05:24,592][68280] Updated weights for policy 0, policy_version 17648 (0.0016)
+[2023-09-19 20:05:28,687][67555] Fps is (10 sec: 12289.8, 60 sec: 12151.4, 300 sec: 12107.5). Total num frames: 16949248. Throughput: 0: 6082.3, 1: 6082.3. Samples: 15755726. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 20:05:28,688][67555] Avg episode reward: [(0, '-30.454'), (1, '-31.649')]
+[2023-09-19 20:05:28,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000015408_7888896.pth...
+[2023-09-19 20:05:28,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000017696_9060352.pth...
+[2023-09-19 20:05:28,700][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000015048_7704576.pth
+[2023-09-19 20:05:28,703][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000017336_8876032.pth
+[2023-09-19 20:05:31,279][68280] Updated weights for policy 0, policy_version 17728 (0.0016)
+[2023-09-19 20:05:31,279][68281] Updated weights for policy 1, policy_version 15440 (0.0014)
+[2023-09-19 20:05:33,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12151.4, 300 sec: 12107.5). Total num frames: 17006592. Throughput: 0: 6149.4, 1: 6149.5. Samples: 15833080. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 20:05:33,688][67555] Avg episode reward: [(0, '-30.018'), (1, '-31.433')]
+[2023-09-19 20:05:37,778][68280] Updated weights for policy 0, policy_version 17808 (0.0012)
+[2023-09-19 20:05:37,778][68281] Updated weights for policy 1, policy_version 15520 (0.0013)
+[2023-09-19 20:05:38,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12288.0, 300 sec: 12107.5). Total num frames: 17072128. Throughput: 0: 6141.2, 1: 6140.3. Samples: 15869658. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 20:05:38,688][67555] Avg episode reward: [(0, '-30.432'), (1, '-31.313')]
+[2023-09-19 20:05:43,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12151.5, 300 sec: 12107.5). Total num frames: 17129472. Throughput: 0: 6109.4, 1: 6105.3. Samples: 15942494. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:05:43,688][67555] Avg episode reward: [(0, '-29.215'), (1, '-31.025')]
+[2023-09-19 20:05:43,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000017872_9150464.pth...
+[2023-09-19 20:05:43,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000015584_7979008.pth...
+[2023-09-19 20:05:43,705][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000015232_7798784.pth
+[2023-09-19 20:05:43,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000017520_8970240.pth
+[2023-09-19 20:05:43,707][68200] Saving new best policy, reward=-29.215!
+[2023-09-19 20:05:44,409][68280] Updated weights for policy 0, policy_version 17888 (0.0013)
+[2023-09-19 20:05:44,409][68281] Updated weights for policy 1, policy_version 15600 (0.0014)
+[2023-09-19 20:05:48,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12288.0, 300 sec: 12135.3). Total num frames: 17195008. Throughput: 0: 6100.2, 1: 6101.4. Samples: 16015394. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:05:48,688][67555] Avg episode reward: [(0, '-29.342'), (1, '-31.327')]
+[2023-09-19 20:05:51,324][68280] Updated weights for policy 0, policy_version 17968 (0.0014)
+[2023-09-19 20:05:51,325][68281] Updated weights for policy 1, policy_version 15680 (0.0014)
+[2023-09-19 20:05:53,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.5, 300 sec: 12107.5). Total num frames: 17252352. Throughput: 0: 6082.8, 1: 6084.1. Samples: 16051234. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 20:05:53,688][67555] Avg episode reward: [(0, '-29.016'), (1, '-31.195')]
+[2023-09-19 20:05:53,689][68200] Saving new best policy, reward=-29.016!
+[2023-09-19 20:05:58,144][68280] Updated weights for policy 0, policy_version 18048 (0.0015)
+[2023-09-19 20:05:58,144][68281] Updated weights for policy 1, policy_version 15760 (0.0015)
+[2023-09-19 20:05:58,687][67555] Fps is (10 sec: 11468.8, 60 sec: 12151.5, 300 sec: 12107.5). Total num frames: 17309696. Throughput: 0: 6078.3, 1: 6078.3. Samples: 16122830. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 20:05:58,688][67555] Avg episode reward: [(0, '-29.201'), (1, '-32.526')]
+[2023-09-19 20:05:58,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000015760_8069120.pth...
+[2023-09-19 20:05:58,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000018048_9240576.pth...
+[2023-09-19 20:05:58,703][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000015408_7888896.pth
+[2023-09-19 20:05:58,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000017696_9060352.pth
+[2023-09-19 20:06:03,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12151.4, 300 sec: 12107.5). Total num frames: 17375232. Throughput: 0: 6082.6, 1: 6083.9. Samples: 16196800. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 20:06:03,688][67555] Avg episode reward: [(0, '-28.526'), (1, '-31.490')]
+[2023-09-19 20:06:03,688][68200] Saving new best policy, reward=-28.526!
+[2023-09-19 20:06:04,766][68280] Updated weights for policy 0, policy_version 18128 (0.0015)
+[2023-09-19 20:06:04,766][68281] Updated weights for policy 1, policy_version 15840 (0.0013)
+[2023-09-19 20:06:08,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.4, 300 sec: 12107.5). Total num frames: 17432576. Throughput: 0: 6085.5, 1: 6085.6. Samples: 16233160. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 20:06:08,688][67555] Avg episode reward: [(0, '-28.279'), (1, '-31.378')]
+[2023-09-19 20:06:08,689][68200] Saving new best policy, reward=-28.279!
+[2023-09-19 20:06:11,620][68280] Updated weights for policy 0, policy_version 18208 (0.0013)
+[2023-09-19 20:06:11,620][68281] Updated weights for policy 1, policy_version 15920 (0.0012)
+[2023-09-19 20:06:13,687][67555] Fps is (10 sec: 11468.7, 60 sec: 12014.9, 300 sec: 12107.5). Total num frames: 17489920. Throughput: 0: 6102.6, 1: 6102.5. Samples: 16304954. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
+[2023-09-19 20:06:13,688][67555] Avg episode reward: [(0, '-28.139'), (1, '-31.360')]
+[2023-09-19 20:06:13,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000015936_8159232.pth...
+[2023-09-19 20:06:13,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000018224_9330688.pth...
+[2023-09-19 20:06:13,701][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000015584_7979008.pth
+[2023-09-19 20:06:13,707][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000017872_9150464.pth
+[2023-09-19 20:06:13,707][68200] Saving new best policy, reward=-28.139!
+[2023-09-19 20:06:18,600][68281] Updated weights for policy 1, policy_version 16000 (0.0011)
+[2023-09-19 20:06:18,601][68280] Updated weights for policy 0, policy_version 18288 (0.0014)
+[2023-09-19 20:06:18,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12151.8, 300 sec: 12107.5). Total num frames: 17555456. Throughput: 0: 6027.9, 1: 6031.4. Samples: 16375746. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:06:18,688][67555] Avg episode reward: [(0, '-29.856'), (1, '-32.272')]
+[2023-09-19 20:06:23,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12151.5, 300 sec: 12107.5). Total num frames: 17612800. Throughput: 0: 6008.9, 1: 6008.5. Samples: 16410442. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:06:23,688][67555] Avg episode reward: [(0, '-27.665'), (1, '-31.794')]
+[2023-09-19 20:06:23,690][68200] Saving new best policy, reward=-27.665!
+[2023-09-19 20:06:25,331][68280] Updated weights for policy 0, policy_version 18368 (0.0009)
+[2023-09-19 20:06:25,332][68281] Updated weights for policy 1, policy_version 16080 (0.0012)
+[2023-09-19 20:06:28,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 17678336. Throughput: 0: 6022.2, 1: 6022.8. Samples: 16484520. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:06:28,688][67555] Avg episode reward: [(0, '-28.725'), (1, '-31.791')]
+[2023-09-19 20:06:28,695][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000018408_9424896.pth...
+[2023-09-19 20:06:28,695][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000016120_8253440.pth...
+[2023-09-19 20:06:28,701][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000015760_8069120.pth
+[2023-09-19 20:06:28,701][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000018048_9240576.pth
+[2023-09-19 20:06:31,946][68281] Updated weights for policy 1, policy_version 16160 (0.0012)
+[2023-09-19 20:06:31,947][68280] Updated weights for policy 0, policy_version 18448 (0.0014)
+[2023-09-19 20:06:33,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12151.4, 300 sec: 12107.5). Total num frames: 17735680. Throughput: 0: 6050.0, 1: 6045.4. Samples: 16559688. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 20:06:33,688][67555] Avg episode reward: [(0, '-28.054'), (1, '-31.550')]
+[2023-09-19 20:06:38,642][68280] Updated weights for policy 0, policy_version 18528 (0.0013)
+[2023-09-19 20:06:38,642][68281] Updated weights for policy 1, policy_version 16240 (0.0012)
+[2023-09-19 20:06:38,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 17801216. Throughput: 0: 6062.9, 1: 6065.8. Samples: 16597026. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 20:06:38,688][67555] Avg episode reward: [(0, '-28.283'), (1, '-32.183')]
+[2023-09-19 20:06:43,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12151.4, 300 sec: 12135.3). Total num frames: 17858560. Throughput: 0: 6069.3, 1: 6069.2. Samples: 16669064. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 20:06:43,688][67555] Avg episode reward: [(0, '-30.012'), (1, '-32.374')]
+[2023-09-19 20:06:43,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000016296_8343552.pth...
+[2023-09-19 20:06:43,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000018584_9515008.pth...
+[2023-09-19 20:06:43,704][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000015936_8159232.pth
+[2023-09-19 20:06:43,705][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000018224_9330688.pth
+[2023-09-19 20:06:45,419][68280] Updated weights for policy 0, policy_version 18608 (0.0013)
+[2023-09-19 20:06:45,420][68281] Updated weights for policy 1, policy_version 16320 (0.0015)
+[2023-09-19 20:06:48,687][67555] Fps is (10 sec: 11469.0, 60 sec: 12015.0, 300 sec: 12107.5). Total num frames: 17915904. Throughput: 0: 6046.7, 1: 6045.8. Samples: 16740962. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 20:06:48,687][67555] Avg episode reward: [(0, '-28.072'), (1, '-32.308')]
+[2023-09-19 20:06:52,498][68280] Updated weights for policy 0, policy_version 18688 (0.0011)
+[2023-09-19 20:06:52,498][68281] Updated weights for policy 1, policy_version 16400 (0.0012)
+[2023-09-19 20:06:53,687][67555] Fps is (10 sec: 11469.0, 60 sec: 12014.9, 300 sec: 12107.5). Total num frames: 17973248. Throughput: 0: 6022.5, 1: 6022.0. Samples: 16775164. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 20:06:53,688][67555] Avg episode reward: [(0, '-27.979'), (1, '-31.368')]
+[2023-09-19 20:06:58,687][67555] Fps is (10 sec: 12287.7, 60 sec: 12151.4, 300 sec: 12135.3). Total num frames: 18038784. Throughput: 0: 6009.4, 1: 6009.4. Samples: 16845802. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:06:58,688][67555] Avg episode reward: [(0, '-29.147'), (1, '-32.539')]
+[2023-09-19 20:06:58,695][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000018760_9605120.pth...
+[2023-09-19 20:06:58,695][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000016472_8433664.pth...
+[2023-09-19 20:06:58,702][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000016120_8253440.pth
+[2023-09-19 20:06:58,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000018408_9424896.pth
+[2023-09-19 20:06:59,314][68281] Updated weights for policy 1, policy_version 16480 (0.0009)
+[2023-09-19 20:06:59,315][68280] Updated weights for policy 0, policy_version 18768 (0.0012)
+[2023-09-19 20:07:03,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12014.9, 300 sec: 12107.5). Total num frames: 18096128. Throughput: 0: 6032.9, 1: 6027.8. Samples: 16918478. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:07:03,688][67555] Avg episode reward: [(0, '-29.176'), (1, '-31.879')]
+[2023-09-19 20:07:05,927][68280] Updated weights for policy 0, policy_version 18848 (0.0011)
+[2023-09-19 20:07:05,928][68281] Updated weights for policy 1, policy_version 16560 (0.0011)
+[2023-09-19 20:07:08,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 18161664. Throughput: 0: 6075.6, 1: 6079.4. Samples: 16957414. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:07:08,688][67555] Avg episode reward: [(0, '-29.337'), (1, '-31.424')]
+[2023-09-19 20:07:12,564][68281] Updated weights for policy 1, policy_version 16640 (0.0012)
+[2023-09-19 20:07:12,565][68280] Updated weights for policy 0, policy_version 18928 (0.0011)
+[2023-09-19 20:07:13,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.4, 300 sec: 12135.3). Total num frames: 18219008. Throughput: 0: 6072.8, 1: 6075.7. Samples: 17031202. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 20:07:13,688][67555] Avg episode reward: [(0, '-28.356'), (1, '-32.093')]
+[2023-09-19 20:07:13,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000018936_9695232.pth...
+[2023-09-19 20:07:13,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000016648_8523776.pth...
+[2023-09-19 20:07:13,703][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000016296_8343552.pth
+[2023-09-19 20:07:13,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000018584_9515008.pth
+[2023-09-19 20:07:18,687][67555] Fps is (10 sec: 12287.7, 60 sec: 12151.4, 300 sec: 12135.3). Total num frames: 18284544. Throughput: 0: 6055.9, 1: 6060.7. Samples: 17104936. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 20:07:18,688][67555] Avg episode reward: [(0, '-27.589'), (1, '-31.990')]
+[2023-09-19 20:07:18,689][68200] Saving new best policy, reward=-27.589!
+[2023-09-19 20:07:19,303][68280] Updated weights for policy 0, policy_version 19008 (0.0014)
+[2023-09-19 20:07:19,304][68281] Updated weights for policy 1, policy_version 16720 (0.0014)
+[2023-09-19 20:07:23,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 18341888. Throughput: 0: 6041.1, 1: 6038.2. Samples: 17140596. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 20:07:23,688][67555] Avg episode reward: [(0, '-28.931'), (1, '-32.294')]
+[2023-09-19 20:07:26,082][68281] Updated weights for policy 1, policy_version 16800 (0.0015)
+[2023-09-19 20:07:26,082][68280] Updated weights for policy 0, policy_version 19088 (0.0014)
+[2023-09-19 20:07:28,687][67555] Fps is (10 sec: 11468.9, 60 sec: 12014.9, 300 sec: 12135.3). Total num frames: 18399232. Throughput: 0: 6039.7, 1: 6039.5. Samples: 17212626. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:07:28,688][67555] Avg episode reward: [(0, '-28.031'), (1, '-31.560')]
+[2023-09-19 20:07:28,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000019112_9785344.pth...
+[2023-09-19 20:07:28,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000016824_8613888.pth...
+[2023-09-19 20:07:28,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000018760_9605120.pth
+[2023-09-19 20:07:28,706][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000016472_8433664.pth
+[2023-09-19 20:07:32,858][68280] Updated weights for policy 0, policy_version 19168 (0.0013)
+[2023-09-19 20:07:32,858][68281] Updated weights for policy 1, policy_version 16880 (0.0015)
+[2023-09-19 20:07:33,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 18464768. Throughput: 0: 6044.6, 1: 6049.2. Samples: 17285186. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:07:33,688][67555] Avg episode reward: [(0, '-27.629'), (1, '-32.226')]
+[2023-09-19 20:07:38,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12014.9, 300 sec: 12135.3). Total num frames: 18522112. Throughput: 0: 6086.6, 1: 6086.1. Samples: 17322940. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 20:07:38,688][67555] Avg episode reward: [(0, '-28.189'), (1, '-31.961')]
+[2023-09-19 20:07:39,642][68281] Updated weights for policy 1, policy_version 16960 (0.0014)
+[2023-09-19 20:07:39,643][68280] Updated weights for policy 0, policy_version 19248 (0.0013)
+[2023-09-19 20:07:43,688][67555] Fps is (10 sec: 11468.4, 60 sec: 12014.9, 300 sec: 12135.2). Total num frames: 18579456. Throughput: 0: 6090.7, 1: 6089.9. Samples: 17393934. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 20:07:43,689][67555] Avg episode reward: [(0, '-27.408'), (1, '-31.473')]
+[2023-09-19 20:07:43,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000019296_9879552.pth...
+[2023-09-19 20:07:43,700][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000017008_8708096.pth...
+[2023-09-19 20:07:43,702][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000018936_9695232.pth
+[2023-09-19 20:07:43,703][68200] Saving new best policy, reward=-27.408!
+[2023-09-19 20:07:43,705][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000016648_8523776.pth
+[2023-09-19 20:07:46,426][68280] Updated weights for policy 0, policy_version 19328 (0.0015)
+[2023-09-19 20:07:46,426][68281] Updated weights for policy 1, policy_version 17040 (0.0015)
+[2023-09-19 20:07:48,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12151.4, 300 sec: 12135.3). Total num frames: 18644992. Throughput: 0: 6080.0, 1: 6082.1. Samples: 17465772. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2023-09-19 20:07:48,688][67555] Avg episode reward: [(0, '-27.177'), (1, '-31.669')]
+[2023-09-19 20:07:48,689][68200] Saving new best policy, reward=-27.177!
+[2023-09-19 20:07:53,292][68281] Updated weights for policy 1, policy_version 17120 (0.0014)
+[2023-09-19 20:07:53,292][68280] Updated weights for policy 0, policy_version 19408 (0.0015)
+[2023-09-19 20:07:53,687][67555] Fps is (10 sec: 12288.5, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 18702336. Throughput: 0: 6052.1, 1: 6048.5. Samples: 17501944. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 20:07:53,688][67555] Avg episode reward: [(0, '-28.247'), (1, '-32.281')]
+[2023-09-19 20:07:58,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 18767872. Throughput: 0: 6039.5, 1: 6036.2. Samples: 17574610. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 20:07:58,688][67555] Avg episode reward: [(0, '-29.522'), (1, '-31.920')]
+[2023-09-19 20:07:58,696][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000017184_8798208.pth...
+[2023-09-19 20:07:58,696][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000019472_9969664.pth...
+[2023-09-19 20:07:58,703][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000016824_8613888.pth
+[2023-09-19 20:07:58,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000019112_9785344.pth
+[2023-09-19 20:07:59,972][68281] Updated weights for policy 1, policy_version 17200 (0.0015)
+[2023-09-19 20:07:59,972][68280] Updated weights for policy 0, policy_version 19488 (0.0012)
+[2023-09-19 20:08:03,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 18825216. Throughput: 0: 6041.9, 1: 6038.0. Samples: 17648528. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
+[2023-09-19 20:08:03,688][67555] Avg episode reward: [(0, '-27.902'), (1, '-31.389')]
+[2023-09-19 20:08:06,542][68280] Updated weights for policy 0, policy_version 19568 (0.0013)
+[2023-09-19 20:08:06,543][68281] Updated weights for policy 1, policy_version 17280 (0.0013)
+[2023-09-19 20:08:08,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12151.4, 300 sec: 12135.3). Total num frames: 18890752. Throughput: 0: 6064.9, 1: 6068.4. Samples: 17686594. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:08:08,688][67555] Avg episode reward: [(0, '-27.932'), (1, '-31.606')]
+[2023-09-19 20:08:13,288][68281] Updated weights for policy 1, policy_version 17360 (0.0014)
+[2023-09-19 20:08:13,288][68280] Updated weights for policy 0, policy_version 19648 (0.0015)
+[2023-09-19 20:08:13,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 18948096. Throughput: 0: 6063.6, 1: 6064.7. Samples: 17758396. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:08:13,688][67555] Avg episode reward: [(0, '-26.902'), (1, '-32.187')]
+[2023-09-19 20:08:13,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000019648_10059776.pth...
+[2023-09-19 20:08:13,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000017360_8888320.pth...
+[2023-09-19 20:08:13,702][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000019296_9879552.pth
+[2023-09-19 20:08:13,702][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000017008_8708096.pth
+[2023-09-19 20:08:13,703][68200] Saving new best policy, reward=-26.902!
+[2023-09-19 20:08:18,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12151.5, 300 sec: 12163.0). Total num frames: 19013632. Throughput: 0: 6098.5, 1: 6097.8. Samples: 17834018. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:08:18,688][67555] Avg episode reward: [(0, '-28.707'), (1, '-31.546')]
+[2023-09-19 20:08:19,899][68281] Updated weights for policy 1, policy_version 17440 (0.0014)
+[2023-09-19 20:08:19,899][68280] Updated weights for policy 0, policy_version 19728 (0.0014)
+[2023-09-19 20:08:23,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 19070976. Throughput: 0: 6073.1, 1: 6074.7. Samples: 17869592. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:08:23,688][67555] Avg episode reward: [(0, '-27.828'), (1, '-31.489')]
+[2023-09-19 20:08:26,748][68281] Updated weights for policy 1, policy_version 17520 (0.0013)
+[2023-09-19 20:08:26,749][68280] Updated weights for policy 0, policy_version 19808 (0.0013)
+[2023-09-19 20:08:28,687][67555] Fps is (10 sec: 11468.5, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 19128320. Throughput: 0: 6085.6, 1: 6085.9. Samples: 17941650. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:08:28,688][67555] Avg episode reward: [(0, '-25.786'), (1, '-30.828')]
+[2023-09-19 20:08:28,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000019824_10149888.pth...
+[2023-09-19 20:08:28,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000017536_8978432.pth...
+[2023-09-19 20:08:28,705][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000017184_8798208.pth
+[2023-09-19 20:08:28,705][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000019472_9969664.pth
+[2023-09-19 20:08:28,705][68200] Saving new best policy, reward=-25.786!
+[2023-09-19 20:08:33,326][68281] Updated weights for policy 1, policy_version 17600 (0.0013)
+[2023-09-19 20:08:33,326][68280] Updated weights for policy 0, policy_version 19888 (0.0012)
+[2023-09-19 20:08:33,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 19193856. Throughput: 0: 6125.4, 1: 6125.4. Samples: 18017052. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:08:33,688][67555] Avg episode reward: [(0, '-27.439'), (1, '-31.673')]
+[2023-09-19 20:08:38,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 19251200. Throughput: 0: 6144.2, 1: 6146.0. Samples: 18055002. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:08:38,688][67555] Avg episode reward: [(0, '-26.866'), (1, '-31.555')]
+[2023-09-19 20:08:40,150][68281] Updated weights for policy 1, policy_version 17680 (0.0012)
+[2023-09-19 20:08:40,150][68280] Updated weights for policy 0, policy_version 19968 (0.0015)
+[2023-09-19 20:08:43,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12288.1, 300 sec: 12163.0). Total num frames: 19316736. Throughput: 0: 6125.9, 1: 6126.0. Samples: 18125946. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:08:43,688][67555] Avg episode reward: [(0, '-27.797'), (1, '-32.354')]
+[2023-09-19 20:08:43,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000017720_9072640.pth...
+[2023-09-19 20:08:43,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000020008_10244096.pth...
+[2023-09-19 20:08:43,704][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000017360_8888320.pth
+[2023-09-19 20:08:43,706][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000019648_10059776.pth
+[2023-09-19 20:08:46,751][68280] Updated weights for policy 0, policy_version 20048 (0.0013)
+[2023-09-19 20:08:46,751][68281] Updated weights for policy 1, policy_version 17760 (0.0013)
+[2023-09-19 20:08:48,687][67555] Fps is (10 sec: 12287.7, 60 sec: 12151.4, 300 sec: 12135.3). Total num frames: 19374080. Throughput: 0: 6126.3, 1: 6126.5. Samples: 18199908. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 20:08:48,688][67555] Avg episode reward: [(0, '-27.600'), (1, '-31.620')]
+[2023-09-19 20:08:53,595][68281] Updated weights for policy 1, policy_version 17840 (0.0013)
+[2023-09-19 20:08:53,595][68280] Updated weights for policy 0, policy_version 20128 (0.0013)
+[2023-09-19 20:08:53,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12288.0, 300 sec: 12163.0). Total num frames: 19439616. Throughput: 0: 6106.3, 1: 6101.9. Samples: 18235966. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 20:08:53,688][67555] Avg episode reward: [(0, '-27.765'), (1, '-31.346')]
+[2023-09-19 20:08:58,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12151.4, 300 sec: 12163.0). Total num frames: 19496960. Throughput: 0: 6085.0, 1: 6084.2. Samples: 18306010. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
+[2023-09-19 20:08:58,688][67555] Avg episode reward: [(0, '-27.813'), (1, '-32.045')]
+[2023-09-19 20:08:58,702][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000020184_10334208.pth...
+[2023-09-19 20:08:58,702][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000017896_9162752.pth...
+[2023-09-19 20:08:58,707][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000019824_10149888.pth
+[2023-09-19 20:08:58,708][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000017536_8978432.pth
+[2023-09-19 20:09:00,758][68280] Updated weights for policy 0, policy_version 20208 (0.0014)
+[2023-09-19 20:09:00,758][68281] Updated weights for policy 1, policy_version 17920 (0.0015)
+[2023-09-19 20:09:03,687][67555] Fps is (10 sec: 11468.8, 60 sec: 12151.4, 300 sec: 12135.3). Total num frames: 19554304. Throughput: 0: 6031.0, 1: 6026.3. Samples: 18376600. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:09:03,688][67555] Avg episode reward: [(0, '-27.918'), (1, '-31.343')]
+[2023-09-19 20:09:07,365][68280] Updated weights for policy 0, policy_version 20288 (0.0016)
+[2023-09-19 20:09:07,365][68281] Updated weights for policy 1, policy_version 18000 (0.0017)
+[2023-09-19 20:09:08,687][67555] Fps is (10 sec: 11468.9, 60 sec: 12014.9, 300 sec: 12135.3). Total num frames: 19611648. Throughput: 0: 6052.3, 1: 6050.0. Samples: 18414196. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:09:08,688][67555] Avg episode reward: [(0, '-27.306'), (1, '-31.191')]
+[2023-09-19 20:09:13,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.4, 300 sec: 12163.0). Total num frames: 19677184. Throughput: 0: 6031.3, 1: 6030.9. Samples: 18484450. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:09:13,689][67555] Avg episode reward: [(0, '-26.975'), (1, '-30.949')]
+[2023-09-19 20:09:13,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000020360_10424320.pth...
+[2023-09-19 20:09:13,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000018072_9252864.pth...
+[2023-09-19 20:09:13,704][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000017720_9072640.pth
+[2023-09-19 20:09:13,709][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000020008_10244096.pth
+[2023-09-19 20:09:14,305][68281] Updated weights for policy 1, policy_version 18080 (0.0014)
+[2023-09-19 20:09:14,305][68280] Updated weights for policy 0, policy_version 20368 (0.0015)
+[2023-09-19 20:09:18,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12014.9, 300 sec: 12135.3). Total num frames: 19734528. Throughput: 0: 6011.2, 1: 6010.8. Samples: 18558040. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
+[2023-09-19 20:09:18,688][67555] Avg episode reward: [(0, '-27.564'), (1, '-31.515')]
+[2023-09-19 20:09:20,922][68281] Updated weights for policy 1, policy_version 18160 (0.0014)
+[2023-09-19 20:09:20,922][68280] Updated weights for policy 0, policy_version 20448 (0.0012)
+[2023-09-19 20:09:23,687][67555] Fps is (10 sec: 12288.4, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 19800064. Throughput: 0: 6008.1, 1: 6009.9. Samples: 18595812. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
+[2023-09-19 20:09:23,688][67555] Avg episode reward: [(0, '-27.354'), (1, '-31.413')]
+[2023-09-19 20:09:28,009][68281] Updated weights for policy 1, policy_version 18240 (0.0015)
+[2023-09-19 20:09:28,010][68280] Updated weights for policy 0, policy_version 20528 (0.0017)
+[2023-09-19 20:09:28,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12151.5, 300 sec: 12135.3). Total num frames: 19857408. Throughput: 0: 5993.4, 1: 5993.0. Samples: 18665332. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
+[2023-09-19 20:09:28,688][67555] Avg episode reward: [(0, '-27.784'), (1, '-31.910')]
+[2023-09-19 20:09:28,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000020536_10514432.pth...
+[2023-09-19 20:09:28,697][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000018248_9342976.pth...
+[2023-09-19 20:09:28,702][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000020184_10334208.pth
+[2023-09-19 20:09:28,703][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000017896_9162752.pth
+[2023-09-19 20:09:33,687][67555] Fps is (10 sec: 11468.8, 60 sec: 12014.9, 300 sec: 12135.3). Total num frames: 19914752. Throughput: 0: 5946.7, 1: 5949.5. Samples: 18735238. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
+[2023-09-19 20:09:33,688][67555] Avg episode reward: [(0, '-26.484'), (1, '-31.717')]
+[2023-09-19 20:09:34,955][68280] Updated weights for policy 0, policy_version 20608 (0.0014)
+[2023-09-19 20:09:34,955][68281] Updated weights for policy 1, policy_version 18320 (0.0014)
+[2023-09-19 20:09:38,687][67555] Fps is (10 sec: 11468.9, 60 sec: 12014.9, 300 sec: 12107.5). Total num frames: 19972096. Throughput: 0: 5938.7, 1: 5939.5. Samples: 18770480. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
+[2023-09-19 20:09:38,688][67555] Avg episode reward: [(0, '-26.317'), (1, '-31.593')]
+[2023-09-19 20:09:41,736][68281] Updated weights for policy 1, policy_version 18400 (0.0013)
+[2023-09-19 20:09:41,736][68280] Updated weights for policy 0, policy_version 20688 (0.0013)
+[2023-09-19 20:09:43,687][67555] Fps is (10 sec: 11468.7, 60 sec: 11878.4, 300 sec: 12107.5). Total num frames: 20029440. Throughput: 0: 5965.6, 1: 5966.2. Samples: 18842940. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
+[2023-09-19 20:09:43,688][67555] Avg episode reward: [(0, '-26.656'), (1, '-31.923')]
+[2023-09-19 20:09:43,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000020704_10600448.pth...
+[2023-09-19 20:09:43,699][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000018416_9428992.pth...
+[2023-09-19 20:09:43,706][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000018072_9252864.pth
+[2023-09-19 20:09:43,707][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000020360_10424320.pth
+[2023-09-19 20:09:48,594][68281] Updated weights for policy 1, policy_version 18480 (0.0012)
+[2023-09-19 20:09:48,594][68280] Updated weights for policy 0, policy_version 20768 (0.0014)
+[2023-09-19 20:09:48,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12014.9, 300 sec: 12107.5). Total num frames: 20094976. Throughput: 0: 5983.9, 1: 5988.6. Samples: 18915362. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
+[2023-09-19 20:09:48,688][67555] Avg episode reward: [(0, '-26.535'), (1, '-31.345')]
+[2023-09-19 20:09:53,687][67555] Fps is (10 sec: 12288.1, 60 sec: 11878.4, 300 sec: 12107.5). Total num frames: 20152320. Throughput: 0: 5945.4, 1: 5946.6. Samples: 18949338. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
+[2023-09-19 20:09:53,688][67555] Avg episode reward: [(0, '-27.655'), (1, '-32.112')]
+[2023-09-19 20:09:55,603][68281] Updated weights for policy 1, policy_version 18560 (0.0014)
+[2023-09-19 20:09:55,604][68280] Updated weights for policy 0, policy_version 20848 (0.0012)
+[2023-09-19 20:09:58,687][67555] Fps is (10 sec: 11468.8, 60 sec: 11878.4, 300 sec: 12079.7). Total num frames: 20209664. Throughput: 0: 5969.4, 1: 5973.0. Samples: 19021858. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
+[2023-09-19 20:09:58,688][67555] Avg episode reward: [(0, '-27.771'), (1, '-31.870')]
+[2023-09-19 20:09:58,697][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000020880_10690560.pth...
+[2023-09-19 20:09:58,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000018592_9519104.pth...
+[2023-09-19 20:09:58,703][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000020536_10514432.pth
+[2023-09-19 20:09:58,705][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000018248_9342976.pth
+[2023-09-19 20:10:02,242][68281] Updated weights for policy 1, policy_version 18640 (0.0012)
+[2023-09-19 20:10:02,242][68280] Updated weights for policy 0, policy_version 20928 (0.0012)
+[2023-09-19 20:10:03,687][67555] Fps is (10 sec: 12287.9, 60 sec: 12014.9, 300 sec: 12107.5). Total num frames: 20275200. Throughput: 0: 5970.2, 1: 5973.8. Samples: 19095522. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 20:10:03,688][67555] Avg episode reward: [(0, '-27.932'), (1, '-32.452')]
+[2023-09-19 20:10:08,687][67555] Fps is (10 sec: 12288.1, 60 sec: 12014.9, 300 sec: 12079.7). Total num frames: 20332544. Throughput: 0: 5944.8, 1: 5940.8. Samples: 19130662. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 20:10:08,688][67555] Avg episode reward: [(0, '-26.901'), (1, '-32.038')]
+[2023-09-19 20:10:09,027][68281] Updated weights for policy 1, policy_version 18720 (0.0012)
+[2023-09-19 20:10:09,028][68280] Updated weights for policy 0, policy_version 21008 (0.0012)
+[2023-09-19 20:10:13,687][67555] Fps is (10 sec: 11469.0, 60 sec: 11878.5, 300 sec: 12079.8). Total num frames: 20389888. Throughput: 0: 5967.8, 1: 5968.2. Samples: 19202450. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 20:10:13,688][67555] Avg episode reward: [(0, '-25.842'), (1, '-31.543')]
+[2023-09-19 20:10:13,695][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000018768_9609216.pth...
+[2023-09-19 20:10:13,695][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000021056_10780672.pth...
+[2023-09-19 20:10:13,699][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000018416_9428992.pth
+[2023-09-19 20:10:13,701][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000020704_10600448.pth
+[2023-09-19 20:10:15,840][68281] Updated weights for policy 1, policy_version 18800 (0.0015)
+[2023-09-19 20:10:15,841][68280] Updated weights for policy 0, policy_version 21088 (0.0015)
+[2023-09-19 20:10:18,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12014.9, 300 sec: 12107.5). Total num frames: 20455424. Throughput: 0: 6007.9, 1: 6007.4. Samples: 19275930. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:10:18,688][67555] Avg episode reward: [(0, '-29.075'), (1, '-31.944')]
+[2023-09-19 20:10:22,613][68280] Updated weights for policy 0, policy_version 21168 (0.0016)
+[2023-09-19 20:10:22,614][68281] Updated weights for policy 1, policy_version 18880 (0.0015)
+[2023-09-19 20:10:23,687][67555] Fps is (10 sec: 12288.3, 60 sec: 11878.4, 300 sec: 12079.7). Total num frames: 20512768. Throughput: 0: 6006.3, 1: 6006.1. Samples: 19311036. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:10:23,687][67555] Avg episode reward: [(0, '-27.324'), (1, '-31.754')]
+[2023-09-19 20:10:28,689][67555] Fps is (10 sec: 12286.3, 60 sec: 12014.6, 300 sec: 12107.4). Total num frames: 20578304. Throughput: 0: 6027.6, 1: 6026.2. Samples: 19385380. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 20:10:28,691][67555] Avg episode reward: [(0, '-27.890'), (1, '-31.778')]
+[2023-09-19 20:10:28,701][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000021240_10874880.pth...
+[2023-09-19 20:10:28,701][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000018952_9703424.pth...
+[2023-09-19 20:10:28,707][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000020880_10690560.pth
+[2023-09-19 20:10:28,709][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000018592_9519104.pth
+[2023-09-19 20:10:29,407][68281] Updated weights for policy 1, policy_version 18960 (0.0013)
+[2023-09-19 20:10:29,408][68280] Updated weights for policy 0, policy_version 21248 (0.0014)
+[2023-09-19 20:10:33,687][67555] Fps is (10 sec: 12287.8, 60 sec: 12014.9, 300 sec: 12079.7). Total num frames: 20635648. Throughput: 0: 6006.8, 1: 6006.9. Samples: 19455976. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 20:10:33,688][67555] Avg episode reward: [(0, '-27.379'), (1, '-32.118')]
+[2023-09-19 20:10:36,421][68280] Updated weights for policy 0, policy_version 21328 (0.0015)
+[2023-09-19 20:10:36,423][68281] Updated weights for policy 1, policy_version 19040 (0.0013)
+[2023-09-19 20:10:38,687][67555] Fps is (10 sec: 11470.6, 60 sec: 12014.9, 300 sec: 12079.7). Total num frames: 20692992. Throughput: 0: 6013.6, 1: 6014.1. Samples: 19490580. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
+[2023-09-19 20:10:38,688][67555] Avg episode reward: [(0, '-27.157'), (1, '-31.334')]
+[2023-09-19 20:10:43,444][68281] Updated weights for policy 1, policy_version 19120 (0.0016)
+[2023-09-19 20:10:43,445][68280] Updated weights for policy 0, policy_version 21408 (0.0012)
+[2023-09-19 20:10:43,687][67555] Fps is (10 sec: 11468.5, 60 sec: 12014.9, 300 sec: 12051.9). Total num frames: 20750336. Throughput: 0: 5988.2, 1: 5984.1. Samples: 19560612. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:10:43,688][67555] Avg episode reward: [(0, '-26.260'), (1, '-31.849')]
+[2023-09-19 20:10:43,696][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000019120_9789440.pth...
+[2023-09-19 20:10:43,696][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000021408_10960896.pth...
+[2023-09-19 20:10:43,702][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000018768_9609216.pth
+[2023-09-19 20:10:43,709][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000021056_10780672.pth
+[2023-09-19 20:10:48,687][67555] Fps is (10 sec: 12287.7, 60 sec: 12014.9, 300 sec: 12079.7). Total num frames: 20815872. Throughput: 0: 6007.4, 1: 6007.2. Samples: 19636178. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:10:48,688][67555] Avg episode reward: [(0, '-27.723'), (1, '-30.978')]
+[2023-09-19 20:10:50,070][68280] Updated weights for policy 0, policy_version 21488 (0.0014)
+[2023-09-19 20:10:50,070][68281] Updated weights for policy 1, policy_version 19200 (0.0012)
+[2023-09-19 20:10:53,687][67555] Fps is (10 sec: 12288.0, 60 sec: 12014.9, 300 sec: 12079.7). Total num frames: 20873216. Throughput: 0: 6016.2, 1: 6016.4. Samples: 19672134. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:10:53,688][67555] Avg episode reward: [(0, '-28.131'), (1, '-31.391')]
+[2023-09-19 20:10:56,473][68281] Updated weights for policy 1, policy_version 19280 (0.0014)
+[2023-09-19 20:10:56,473][68280] Updated weights for policy 0, policy_version 21568 (0.0014)
+[2023-09-19 20:10:58,687][67555] Fps is (10 sec: 12288.2, 60 sec: 12151.5, 300 sec: 12079.7). Total num frames: 20938752. Throughput: 0: 6070.3, 1: 6069.4. Samples: 19748736. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:10:58,688][67555] Avg episode reward: [(0, '-27.749'), (1, '-31.492')]
+[2023-09-19 20:10:58,698][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000021592_11055104.pth...
+[2023-09-19 20:10:58,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000019304_9883648.pth...
+[2023-09-19 20:10:58,704][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000021240_10874880.pth
+[2023-09-19 20:10:58,705][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000018952_9703424.pth
+[2023-09-19 20:11:03,464][68280] Updated weights for policy 0, policy_version 21648 (0.0015)
+[2023-09-19 20:11:03,464][68281] Updated weights for policy 1, policy_version 19360 (0.0015)
+[2023-09-19 20:11:03,687][67555] Fps is (10 sec: 12288.4, 60 sec: 12015.0, 300 sec: 12079.7). Total num frames: 20996096. Throughput: 0: 6005.5, 1: 6007.5. Samples: 19816514. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:11:03,687][67555] Avg episode reward: [(0, '-27.850'), (1, '-32.556')]
+[2023-09-19 20:11:08,687][67555] Fps is (10 sec: 11469.0, 60 sec: 12015.0, 300 sec: 12079.7). Total num frames: 21053440. Throughput: 0: 6036.0, 1: 6036.3. Samples: 19854290. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
+[2023-09-19 20:11:08,687][67555] Avg episode reward: [(0, '-26.807'), (1, '-31.546')]
+[2023-09-19 20:11:10,148][68281] Updated weights for policy 1, policy_version 19440 (0.0016)
+[2023-09-19 20:11:10,148][68280] Updated weights for policy 0, policy_version 21728 (0.0012)
+[2023-09-19 20:11:13,687][67555] Fps is (10 sec: 12287.6, 60 sec: 12151.4, 300 sec: 12079.7). Total num frames: 21118976. Throughput: 0: 6039.2, 1: 6040.2. Samples: 19928936. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
+[2023-09-19 20:11:13,688][67555] Avg episode reward: [(0, '-27.116'), (1, '-31.012')]
+[2023-09-19 20:11:13,698][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000019480_9973760.pth...
+[2023-09-19 20:11:13,699][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000021768_11145216.pth...
+[2023-09-19 20:11:13,702][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000019120_9789440.pth
+[2023-09-19 20:11:13,708][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000021408_10960896.pth
+[2023-09-19 20:11:16,526][68280] Updated weights for policy 0, policy_version 21808 (0.0009)
+[2023-09-19 20:11:16,527][68281] Updated weights for policy 1, policy_version 19520 (0.0014)
+[2023-09-19 20:11:17,843][68290] Stopping RolloutWorker_w6...
+[2023-09-19 20:11:17,844][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000019536_10002432.pth...
+[2023-09-19 20:11:17,844][68200] Stopping Batcher_0...
+[2023-09-19 20:11:17,844][68290] Loop rollout_proc6_evt_loop terminating...
+[2023-09-19 20:11:17,844][68200] Loop batcher_evt_loop terminating...
+[2023-09-19 20:11:17,844][68286] Stopping RolloutWorker_w3...
+[2023-09-19 20:11:17,844][68283] Stopping RolloutWorker_w1...
+[2023-09-19 20:11:17,844][68284] Stopping RolloutWorker_w2...
+[2023-09-19 20:11:17,844][68291] Stopping RolloutWorker_w5...
+[2023-09-19 20:11:17,844][68292] Stopping RolloutWorker_w7...
+[2023-09-19 20:11:17,844][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000021824_11173888.pth...
+[2023-09-19 20:11:17,844][68282] Stopping RolloutWorker_w0...
+[2023-09-19 20:11:17,844][68286] Loop rollout_proc3_evt_loop terminating...
+[2023-09-19 20:11:17,845][68292] Loop rollout_proc7_evt_loop terminating...
+[2023-09-19 20:11:17,844][68283] Loop rollout_proc1_evt_loop terminating...
+[2023-09-19 20:11:17,845][68284] Loop rollout_proc2_evt_loop terminating...
+[2023-09-19 20:11:17,844][67555] Component RolloutWorker_w6 stopped!
+[2023-09-19 20:11:17,845][68291] Loop rollout_proc5_evt_loop terminating...
+[2023-09-19 20:11:17,845][68282] Loop rollout_proc0_evt_loop terminating...
+[2023-09-19 20:11:17,845][67555] Component Batcher_0 stopped!
+[2023-09-19 20:11:17,846][68289] Stopping RolloutWorker_w4...
+[2023-09-19 20:11:17,846][67555] Component RolloutWorker_w3 stopped!
+[2023-09-19 20:11:17,846][67555] Component RolloutWorker_w1 stopped!
+[2023-09-19 20:11:17,846][68289] Loop rollout_proc4_evt_loop terminating...
+[2023-09-19 20:11:17,847][67555] Component RolloutWorker_w2 stopped!
+[2023-09-19 20:11:17,847][67555] Component RolloutWorker_w5 stopped!
+[2023-09-19 20:11:17,848][67555] Component RolloutWorker_w0 stopped!
+[2023-09-19 20:11:17,848][67555] Component RolloutWorker_w7 stopped!
+[2023-09-19 20:11:17,849][67555] Component Batcher_1 stopped!
+[2023-09-19 20:11:17,849][67555] Component RolloutWorker_w4 stopped!
+[2023-09-19 20:11:17,845][68201] Stopping Batcher_1...
+[2023-09-19 20:11:17,850][68200] Removing ./train_dir/Pusher/checkpoint_p0/checkpoint_000021592_11055104.pth
+[2023-09-19 20:11:17,850][68201] Loop batcher_evt_loop terminating...
+[2023-09-19 20:11:17,850][68201] Removing ./train_dir/Pusher/checkpoint_p1/checkpoint_000019304_9883648.pth
+[2023-09-19 20:11:17,851][68200] Saving ./train_dir/Pusher/checkpoint_p0/checkpoint_000021824_11173888.pth...
+[2023-09-19 20:11:17,851][68201] Saving ./train_dir/Pusher/checkpoint_p1/checkpoint_000019536_10002432.pth...
+[2023-09-19 20:11:17,855][68201] Stopping LearnerWorker_p1...
+[2023-09-19 20:11:17,856][68201] Loop learner_proc1_evt_loop terminating...
+[2023-09-19 20:11:17,856][67555] Component LearnerWorker_p1 stopped!
+[2023-09-19 20:11:17,856][68200] Stopping LearnerWorker_p0...
+[2023-09-19 20:11:17,856][68200] Loop learner_proc0_evt_loop terminating...
+[2023-09-19 20:11:17,856][67555] Component LearnerWorker_p0 stopped!
+[2023-09-19 20:11:17,867][68280] Weights refcount: 2 0
+[2023-09-19 20:11:17,868][68281] Weights refcount: 2 0
+[2023-09-19 20:11:17,868][68280] Stopping InferenceWorker_p0-w0...
+[2023-09-19 20:11:17,868][68280] Loop inference_proc0-0_evt_loop terminating...
+[2023-09-19 20:11:17,868][67555] Component InferenceWorker_p0-w0 stopped!
+[2023-09-19 20:11:17,869][68281] Stopping InferenceWorker_p1-w0...
+[2023-09-19 20:11:17,869][67555] Component InferenceWorker_p1-w0 stopped!
+[2023-09-19 20:11:17,869][68281] Loop inference_proc1-0_evt_loop terminating...
+[2023-09-19 20:11:17,869][67555] Waiting for process learner_proc0 to stop...
+[2023-09-19 20:11:18,471][67555] Waiting for process learner_proc1 to stop...
+[2023-09-19 20:11:18,501][67555] Waiting for process inference_proc0-0 to join...
+[2023-09-19 20:11:18,502][67555] Waiting for process inference_proc1-0 to join...
+[2023-09-19 20:11:18,502][67555] Waiting for process rollout_proc0 to join...
+[2023-09-19 20:11:18,503][67555] Waiting for process rollout_proc1 to join...
+[2023-09-19 20:11:18,503][67555] Waiting for process rollout_proc2 to join...
+[2023-09-19 20:11:18,504][67555] Waiting for process rollout_proc3 to join...
+[2023-09-19 20:11:18,504][67555] Waiting for process rollout_proc4 to join...
+[2023-09-19 20:11:18,504][67555] Waiting for process rollout_proc5 to join...
+[2023-09-19 20:11:18,505][67555] Waiting for process rollout_proc6 to join...
+[2023-09-19 20:11:18,505][67555] Waiting for process rollout_proc7 to join...
+[2023-09-19 20:11:18,506][67555] Batcher 0 profile tree view:
+batching: 41.1946, releasing_batches: 3.3985
+[2023-09-19 20:11:18,506][67555] Batcher 1 profile tree view:
+batching: 41.1500, releasing_batches: 3.4567
+[2023-09-19 20:11:18,506][67555] InferenceWorker_p0-w0 profile tree view:
+wait_policy: 0.0001
+  wait_policy_total: 211.7825
+update_model: 20.5347
+  weight_update: 0.0010
+one_step: 0.0034
+  handle_policy_step: 1317.5729
+    deserialize: 35.1546, stack: 8.2196, obs_to_device_normalize: 264.7522, forward: 661.7036, send_messages: 100.1846
+    prepare_outputs: 171.2177
+      to_cpu: 88.1806
+[2023-09-19 20:11:18,507][67555] InferenceWorker_p1-w0 profile tree view:
+wait_policy: 0.0000
+  wait_policy_total: 212.4024
+update_model: 20.1526
+  weight_update: 0.0014
+one_step: 0.0021
+  handle_policy_step: 1317.4029
+    deserialize: 36.2160, stack: 7.9962, obs_to_device_normalize: 264.5254, forward: 659.9827, send_messages: 99.9353
+    prepare_outputs: 172.1075
+      to_cpu: 87.6839
+[2023-09-19 20:11:18,507][67555] Learner 0 profile tree view:
+misc: 0.0155, prepare_batch: 21.9917
+train: 108.9296
+  epoch_init: 0.0654, minibatch_init: 1.7375, losses_postprocess: 2.8720, kl_divergence: 1.3465, after_optimizer: 1.5782
+  calculate_losses: 31.7698
+    losses_init: 0.0636, forward_head: 3.5807, bptt_initial: 0.2131, bptt: 0.2101, tail: 11.9968, advantages_returns: 1.6272, losses: 12.1421
+  update: 67.3452
+    clip: 8.3350
+[2023-09-19 20:11:18,507][67555] Learner 1 profile tree view:
+misc: 0.0150, prepare_batch: 21.7924
+train: 107.4039
+  epoch_init: 0.0622, minibatch_init: 1.7294, losses_postprocess: 2.8937, kl_divergence: 1.3610, after_optimizer: 1.6574
+  calculate_losses: 31.8622
+    losses_init: 0.0571, forward_head: 3.5667, bptt_initial: 0.2158, bptt: 0.2083, tail: 11.9873, advantages_returns: 1.6231, losses: 12.2512
+  update: 65.5710
+    clip: 8.2721
+[2023-09-19 20:11:18,508][67555] RolloutWorker_w0 profile tree view:
+wait_for_trajectories: 1.6491, enqueue_policy_requests: 77.8908, complete_rollouts: 2.6345, env_step: 596.4457, overhead: 111.2292
+save_policy_outputs: 186.4239
+  split_output_tensors: 64.8659
+[2023-09-19 20:11:18,508][67555] RolloutWorker_w7 profile tree view:
+wait_for_trajectories: 1.6196, enqueue_policy_requests: 75.3342, complete_rollouts: 2.5419, env_step: 579.1142, overhead: 108.2644
+save_policy_outputs: 177.6845
+  split_output_tensors: 61.8344
+[2023-09-19 20:11:18,508][67555] Loop Runner_EvtLoop terminating...
+[2023-09-19 20:11:18,509][67555] Runner profile tree view:
+main_loop: 1651.0130
+[2023-09-19 20:11:18,509][67555] Collected {1: 10002432, 0: 11173888}, FPS: 12116.7