[2023-09-18 22:33:21,682][30208] Saving configuration to ./train_dir/Hopper/config.json... [2023-09-18 22:33:21,684][30208] Rollout worker 0 uses device cpu [2023-09-18 22:33:21,684][30208] Rollout worker 1 uses device cpu [2023-09-18 22:33:21,685][30208] Rollout worker 2 uses device cpu [2023-09-18 22:33:21,685][30208] Rollout worker 3 uses device cpu [2023-09-18 22:33:21,685][30208] Rollout worker 4 uses device cpu [2023-09-18 22:33:21,685][30208] Rollout worker 5 uses device cpu [2023-09-18 22:33:21,686][30208] Rollout worker 6 uses device cpu [2023-09-18 22:33:21,686][30208] Rollout worker 7 uses device cpu [2023-09-18 22:33:21,686][30208] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1 [2023-09-18 22:33:21,732][30208] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-18 22:33:21,732][30208] InferenceWorker_p0-w0: min num requests: 2 [2023-09-18 22:33:21,755][30208] Starting all processes... [2023-09-18 22:33:21,756][30208] Starting process learner_proc0 [2023-09-18 22:33:21,758][30208] Starting all processes... [2023-09-18 22:33:21,762][30208] Starting process inference_proc0-0 [2023-09-18 22:33:21,762][30208] Starting process rollout_proc0 [2023-09-18 22:33:21,762][30208] Starting process rollout_proc1 [2023-09-18 22:33:21,763][30208] Starting process rollout_proc2 [2023-09-18 22:33:21,763][30208] Starting process rollout_proc3 [2023-09-18 22:33:21,763][30208] Starting process rollout_proc4 [2023-09-18 22:33:21,764][30208] Starting process rollout_proc5 [2023-09-18 22:33:21,765][30208] Starting process rollout_proc6 [2023-09-18 22:33:21,765][30208] Starting process rollout_proc7 [2023-09-18 22:33:23,493][30612] Worker 2 uses CPU cores [8, 9, 10, 11] [2023-09-18 22:33:23,501][30590] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-18 22:33:23,502][30590] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-09-18 22:33:23,514][30615] Worker 5 uses CPU cores [20, 21, 22, 23] [2023-09-18 22:33:23,520][30590] Num visible devices: 1 [2023-09-18 22:33:23,541][30610] Worker 0 uses CPU cores [0, 1, 2, 3] [2023-09-18 22:33:23,542][30613] Worker 4 uses CPU cores [16, 17, 18, 19] [2023-09-18 22:33:23,551][30611] Worker 1 uses CPU cores [4, 5, 6, 7] [2023-09-18 22:33:23,565][30609] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-18 22:33:23,565][30609] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-09-18 22:33:23,579][30616] Worker 6 uses CPU cores [24, 25, 26, 27] [2023-09-18 22:33:23,584][30609] Num visible devices: 1 [2023-09-18 22:33:23,602][30590] Starting seed is not provided [2023-09-18 22:33:23,602][30590] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-18 22:33:23,602][30590] Initializing actor-critic model on device cuda:0 [2023-09-18 22:33:23,603][30590] RunningMeanStd input shape: (11,) [2023-09-18 22:33:23,603][30590] RunningMeanStd input shape: (1,) [2023-09-18 22:33:23,615][30619] Worker 7 uses CPU cores [28, 29, 30, 31] [2023-09-18 22:33:23,660][30614] Worker 3 uses CPU cores [12, 13, 14, 15] [2023-09-18 22:33:23,714][30590] Created Actor Critic model with architecture: [2023-09-18 22:33:23,714][30590] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): MlpEncoder( (mlp_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=Tanh) (2): RecursiveScriptModule(original_name=Linear) (3): RecursiveScriptModule(original_name=Tanh) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=64, out_features=1, bias=True) (action_parameterization): ActionParameterizationContinuousNonAdaptiveStddev( (distribution_linear): Linear(in_features=64, out_features=3, bias=True) ) ) [2023-09-18 22:33:24,246][30590] Using optimizer [2023-09-18 22:33:24,246][30590] No checkpoints found [2023-09-18 22:33:24,247][30590] Did not load from checkpoint, starting from scratch! [2023-09-18 22:33:24,247][30590] Initialized policy 0 weights for model version 0 [2023-09-18 22:33:24,248][30590] LearnerWorker_p0 finished initialization! [2023-09-18 22:33:24,249][30590] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-18 22:33:24,872][30609] RunningMeanStd input shape: (11,) [2023-09-18 22:33:24,872][30609] RunningMeanStd input shape: (1,) [2023-09-18 22:33:24,905][30208] Inference worker 0-0 is ready! [2023-09-18 22:33:24,905][30208] All inference workers are ready! Signal rollout workers to start! [2023-09-18 22:33:25,002][30615] Decorrelating experience for 0 frames... [2023-09-18 22:33:25,003][30615] Decorrelating experience for 64 frames... [2023-09-18 22:33:25,009][30612] Decorrelating experience for 0 frames... [2023-09-18 22:33:25,010][30612] Decorrelating experience for 64 frames... [2023-09-18 22:33:25,010][30610] Decorrelating experience for 0 frames... [2023-09-18 22:33:25,010][30613] Decorrelating experience for 0 frames... [2023-09-18 22:33:25,011][30610] Decorrelating experience for 64 frames... [2023-09-18 22:33:25,011][30613] Decorrelating experience for 64 frames... [2023-09-18 22:33:25,013][30616] Decorrelating experience for 0 frames... [2023-09-18 22:33:25,014][30616] Decorrelating experience for 64 frames... [2023-09-18 22:33:25,021][30615] Decorrelating experience for 128 frames... [2023-09-18 22:33:25,028][30612] Decorrelating experience for 128 frames... [2023-09-18 22:33:25,029][30611] Decorrelating experience for 0 frames... [2023-09-18 22:33:25,029][30610] Decorrelating experience for 128 frames... [2023-09-18 22:33:25,029][30611] Decorrelating experience for 64 frames... [2023-09-18 22:33:25,029][30613] Decorrelating experience for 128 frames... [2023-09-18 22:33:25,033][30616] Decorrelating experience for 128 frames... [2023-09-18 22:33:25,038][30619] Decorrelating experience for 0 frames... [2023-09-18 22:33:25,038][30619] Decorrelating experience for 64 frames... [2023-09-18 22:33:25,042][30614] Decorrelating experience for 0 frames... [2023-09-18 22:33:25,043][30614] Decorrelating experience for 64 frames... [2023-09-18 22:33:25,056][30619] Decorrelating experience for 128 frames... [2023-09-18 22:33:25,056][30615] Decorrelating experience for 192 frames... [2023-09-18 22:33:25,058][30611] Decorrelating experience for 128 frames... [2023-09-18 22:33:25,060][30614] Decorrelating experience for 128 frames... [2023-09-18 22:33:25,064][30610] Decorrelating experience for 192 frames... [2023-09-18 22:33:25,065][30613] Decorrelating experience for 192 frames... [2023-09-18 22:33:25,065][30612] Decorrelating experience for 192 frames... [2023-09-18 22:33:25,071][30616] Decorrelating experience for 192 frames... [2023-09-18 22:33:25,091][30619] Decorrelating experience for 192 frames... [2023-09-18 22:33:25,096][30614] Decorrelating experience for 192 frames... [2023-09-18 22:33:25,115][30611] Decorrelating experience for 192 frames... [2023-09-18 22:33:25,123][30615] Decorrelating experience for 256 frames... [2023-09-18 22:33:25,130][30610] Decorrelating experience for 256 frames... [2023-09-18 22:33:25,131][30613] Decorrelating experience for 256 frames... [2023-09-18 22:33:25,140][30616] Decorrelating experience for 256 frames... [2023-09-18 22:33:25,158][30619] Decorrelating experience for 256 frames... [2023-09-18 22:33:25,159][30612] Decorrelating experience for 256 frames... [2023-09-18 22:33:25,163][30614] Decorrelating experience for 256 frames... [2023-09-18 22:33:25,199][30610] Decorrelating experience for 320 frames... [2023-09-18 22:33:25,202][30613] Decorrelating experience for 320 frames... [2023-09-18 22:33:25,205][30615] Decorrelating experience for 320 frames... [2023-09-18 22:33:25,212][30616] Decorrelating experience for 320 frames... [2023-09-18 22:33:25,218][30611] Decorrelating experience for 256 frames... [2023-09-18 22:33:25,228][30619] Decorrelating experience for 320 frames... [2023-09-18 22:33:25,237][30614] Decorrelating experience for 320 frames... [2023-09-18 22:33:25,240][30612] Decorrelating experience for 320 frames... [2023-09-18 22:33:25,287][30610] Decorrelating experience for 384 frames... [2023-09-18 22:33:25,289][30613] Decorrelating experience for 384 frames... [2023-09-18 22:33:25,292][30615] Decorrelating experience for 384 frames... [2023-09-18 22:33:25,304][30616] Decorrelating experience for 384 frames... [2023-09-18 22:33:25,314][30619] Decorrelating experience for 384 frames... [2023-09-18 22:33:25,329][30611] Decorrelating experience for 320 frames... [2023-09-18 22:33:25,330][30614] Decorrelating experience for 384 frames... [2023-09-18 22:33:25,338][30612] Decorrelating experience for 384 frames... [2023-09-18 22:33:25,391][30610] Decorrelating experience for 448 frames... [2023-09-18 22:33:25,393][30613] Decorrelating experience for 448 frames... [2023-09-18 22:33:25,396][30615] Decorrelating experience for 448 frames... [2023-09-18 22:33:25,410][30616] Decorrelating experience for 448 frames... [2023-09-18 22:33:25,420][30619] Decorrelating experience for 448 frames... [2023-09-18 22:33:25,438][30614] Decorrelating experience for 448 frames... [2023-09-18 22:33:25,443][30612] Decorrelating experience for 448 frames... [2023-09-18 22:33:25,466][30611] Decorrelating experience for 384 frames... [2023-09-18 22:33:25,621][30611] Decorrelating experience for 448 frames... [2023-09-18 22:33:27,924][30208] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4096. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:33:27,925][30208] Avg episode reward: [(0, '23.204')] [2023-09-18 22:33:32,206][30609] Updated weights for policy 0, policy_version 80 (0.0015) [2023-09-18 22:33:32,924][30208] Fps is (10 sec: 8191.8, 60 sec: 8191.8, 300 sec: 8191.8). Total num frames: 45056. Throughput: 0: 5736.7. Samples: 28684. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:33:32,925][30208] Avg episode reward: [(0, '217.539')] [2023-09-18 22:33:32,931][30590] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000000096_49152.pth... [2023-09-18 22:33:35,986][30609] Updated weights for policy 0, policy_version 160 (0.0012) [2023-09-18 22:33:37,925][30208] Fps is (10 sec: 9830.2, 60 sec: 9830.2, 300 sec: 9830.2). Total num frames: 102400. Throughput: 0: 9325.0. Samples: 93252. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:33:37,925][30208] Avg episode reward: [(0, '361.711')] [2023-09-18 22:33:37,926][30590] Saving new best policy, reward=361.711! [2023-09-18 22:33:39,658][30609] Updated weights for policy 0, policy_version 240 (0.0013) [2023-09-18 22:33:41,725][30208] Heartbeat connected on Batcher_0 [2023-09-18 22:33:41,728][30208] Heartbeat connected on LearnerWorker_p0 [2023-09-18 22:33:41,734][30208] Heartbeat connected on InferenceWorker_p0-w0 [2023-09-18 22:33:41,738][30208] Heartbeat connected on RolloutWorker_w0 [2023-09-18 22:33:41,738][30208] Heartbeat connected on RolloutWorker_w1 [2023-09-18 22:33:41,743][30208] Heartbeat connected on RolloutWorker_w2 [2023-09-18 22:33:41,749][30208] Heartbeat connected on RolloutWorker_w3 [2023-09-18 22:33:41,750][30208] Heartbeat connected on RolloutWorker_w4 [2023-09-18 22:33:41,751][30208] Heartbeat connected on RolloutWorker_w5 [2023-09-18 22:33:41,754][30208] Heartbeat connected on RolloutWorker_w6 [2023-09-18 22:33:41,759][30208] Heartbeat connected on RolloutWorker_w7 [2023-09-18 22:33:42,924][30208] Fps is (10 sec: 11059.4, 60 sec: 10103.5, 300 sec: 10103.5). Total num frames: 155648. Throughput: 0: 10390.2. Samples: 155852. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:33:42,925][30208] Avg episode reward: [(0, '522.050')] [2023-09-18 22:33:42,926][30590] Saving new best policy, reward=522.050! [2023-09-18 22:33:43,690][30609] Updated weights for policy 0, policy_version 320 (0.0015) [2023-09-18 22:33:47,759][30609] Updated weights for policy 0, policy_version 400 (0.0016) [2023-09-18 22:33:47,925][30208] Fps is (10 sec: 10240.0, 60 sec: 10035.1, 300 sec: 10035.1). Total num frames: 204800. Throughput: 0: 9359.3. Samples: 187188. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:33:47,925][30208] Avg episode reward: [(0, '690.036')] [2023-09-18 22:33:47,932][30590] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000000400_204800.pth... [2023-09-18 22:33:47,940][30590] Saving new best policy, reward=690.036! [2023-09-18 22:33:51,830][30609] Updated weights for policy 0, policy_version 480 (0.0013) [2023-09-18 22:33:52,924][30208] Fps is (10 sec: 10240.1, 60 sec: 10158.1, 300 sec: 10158.1). Total num frames: 258048. Throughput: 0: 9874.6. Samples: 246864. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-18 22:33:52,925][30208] Avg episode reward: [(0, '816.519')] [2023-09-18 22:33:52,925][30590] Saving new best policy, reward=816.519! [2023-09-18 22:33:55,447][30609] Updated weights for policy 0, policy_version 560 (0.0015) [2023-09-18 22:33:57,924][30208] Fps is (10 sec: 10649.9, 60 sec: 10240.1, 300 sec: 10240.1). Total num frames: 311296. Throughput: 0: 10397.3. Samples: 311916. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-18 22:33:57,925][30208] Avg episode reward: [(0, '887.808')] [2023-09-18 22:33:57,925][30590] Saving new best policy, reward=887.808! [2023-09-18 22:33:59,414][30609] Updated weights for policy 0, policy_version 640 (0.0014) [2023-09-18 22:34:02,924][30208] Fps is (10 sec: 10239.8, 60 sec: 10181.5, 300 sec: 10181.5). Total num frames: 360448. Throughput: 0: 9819.0. Samples: 343664. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:34:02,925][30208] Avg episode reward: [(0, '975.495')] [2023-09-18 22:34:02,931][30590] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000000704_360448.pth... [2023-09-18 22:34:02,939][30590] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000000096_49152.pth [2023-09-18 22:34:02,939][30590] Saving new best policy, reward=975.495! [2023-09-18 22:34:03,417][30609] Updated weights for policy 0, policy_version 720 (0.0015) [2023-09-18 22:34:07,478][30609] Updated weights for policy 0, policy_version 800 (0.0015) [2023-09-18 22:34:07,924][30208] Fps is (10 sec: 10240.0, 60 sec: 10240.0, 300 sec: 10240.0). Total num frames: 413696. Throughput: 0: 10106.2. Samples: 404248. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:34:07,925][30208] Avg episode reward: [(0, '997.047')] [2023-09-18 22:34:07,925][30590] Saving new best policy, reward=997.047! [2023-09-18 22:34:11,398][30609] Updated weights for policy 0, policy_version 880 (0.0016) [2023-09-18 22:34:12,925][30208] Fps is (10 sec: 10239.4, 60 sec: 10194.3, 300 sec: 10194.3). Total num frames: 462848. Throughput: 0: 10354.4. Samples: 465956. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:34:12,925][30208] Avg episode reward: [(0, '947.876')] [2023-09-18 22:34:15,576][30609] Updated weights for policy 0, policy_version 960 (0.0013) [2023-09-18 22:34:17,925][30208] Fps is (10 sec: 9830.1, 60 sec: 10158.1, 300 sec: 10158.1). Total num frames: 512000. Throughput: 0: 10376.2. Samples: 495612. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-18 22:34:17,925][30208] Avg episode reward: [(0, '1078.655')] [2023-09-18 22:34:17,974][30590] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000001008_516096.pth... [2023-09-18 22:34:17,980][30590] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000000400_204800.pth [2023-09-18 22:34:17,981][30590] Saving new best policy, reward=1078.655! [2023-09-18 22:34:19,611][30609] Updated weights for policy 0, policy_version 1040 (0.0011) [2023-09-18 22:34:22,924][30208] Fps is (10 sec: 10240.7, 60 sec: 10202.8, 300 sec: 10202.8). Total num frames: 565248. Throughput: 0: 10311.3. Samples: 557260. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 22:34:22,925][30208] Avg episode reward: [(0, '1076.832')] [2023-09-18 22:34:23,442][30609] Updated weights for policy 0, policy_version 1120 (0.0014) [2023-09-18 22:34:27,291][30609] Updated weights for policy 0, policy_version 1200 (0.0013) [2023-09-18 22:34:27,924][30208] Fps is (10 sec: 10649.7, 60 sec: 10240.0, 300 sec: 10240.0). Total num frames: 618496. Throughput: 0: 10367.5. Samples: 622392. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:34:27,925][30208] Avg episode reward: [(0, '1163.399')] [2023-09-18 22:34:27,926][30590] Saving new best policy, reward=1163.399! [2023-09-18 22:34:31,238][30609] Updated weights for policy 0, policy_version 1280 (0.0014) [2023-09-18 22:34:32,924][30208] Fps is (10 sec: 10649.5, 60 sec: 10444.8, 300 sec: 10271.5). Total num frames: 671744. Throughput: 0: 10327.9. Samples: 651944. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:34:32,925][30208] Avg episode reward: [(0, '1459.314')] [2023-09-18 22:34:32,931][30590] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000001312_671744.pth... [2023-09-18 22:34:32,939][30590] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000000704_360448.pth [2023-09-18 22:34:32,940][30590] Saving new best policy, reward=1459.314! [2023-09-18 22:34:35,165][30609] Updated weights for policy 0, policy_version 1360 (0.0014) [2023-09-18 22:34:37,924][30208] Fps is (10 sec: 10240.0, 60 sec: 10308.3, 300 sec: 10240.0). Total num frames: 720896. Throughput: 0: 10394.3. Samples: 714608. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:34:37,925][30208] Avg episode reward: [(0, '1518.593')] [2023-09-18 22:34:37,983][30590] Saving new best policy, reward=1518.593! [2023-09-18 22:34:39,112][30609] Updated weights for policy 0, policy_version 1440 (0.0013) [2023-09-18 22:34:42,924][30208] Fps is (10 sec: 10240.0, 60 sec: 10308.2, 300 sec: 10267.3). Total num frames: 774144. Throughput: 0: 10316.4. Samples: 776156. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:34:42,925][30208] Avg episode reward: [(0, '1505.817')] [2023-09-18 22:34:43,186][30609] Updated weights for policy 0, policy_version 1520 (0.0013) [2023-09-18 22:34:47,300][30609] Updated weights for policy 0, policy_version 1600 (0.0012) [2023-09-18 22:34:47,924][30208] Fps is (10 sec: 10239.9, 60 sec: 10308.3, 300 sec: 10240.0). Total num frames: 823296. Throughput: 0: 10288.4. Samples: 806644. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 22:34:47,925][30208] Avg episode reward: [(0, '2092.250')] [2023-09-18 22:34:47,930][30590] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000001608_823296.pth... [2023-09-18 22:34:47,933][30590] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000001008_516096.pth [2023-09-18 22:34:47,934][30590] Saving new best policy, reward=2092.250! [2023-09-18 22:34:51,183][30609] Updated weights for policy 0, policy_version 1680 (0.0013) [2023-09-18 22:34:52,924][30208] Fps is (10 sec: 10240.2, 60 sec: 10308.3, 300 sec: 10264.1). Total num frames: 876544. Throughput: 0: 10312.4. Samples: 868308. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:34:52,924][30208] Avg episode reward: [(0, '2300.036')] [2023-09-18 22:34:52,925][30590] Saving new best policy, reward=2300.036! [2023-09-18 22:34:55,316][30609] Updated weights for policy 0, policy_version 1760 (0.0012) [2023-09-18 22:34:57,924][30208] Fps is (10 sec: 10240.1, 60 sec: 10240.0, 300 sec: 10240.0). Total num frames: 925696. Throughput: 0: 10303.6. Samples: 929612. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:34:57,925][30208] Avg episode reward: [(0, '2068.499')] [2023-09-18 22:34:59,149][30609] Updated weights for policy 0, policy_version 1840 (0.0013) [2023-09-18 22:35:02,924][30208] Fps is (10 sec: 10239.8, 60 sec: 10308.3, 300 sec: 10261.6). Total num frames: 978944. Throughput: 0: 10318.2. Samples: 959932. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:35:02,925][30208] Avg episode reward: [(0, '2195.780')] [2023-09-18 22:35:02,932][30590] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000001912_978944.pth... [2023-09-18 22:35:02,939][30590] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000001312_671744.pth [2023-09-18 22:35:02,999][30609] Updated weights for policy 0, policy_version 1920 (0.0015) [2023-09-18 22:35:06,837][30609] Updated weights for policy 0, policy_version 2000 (0.0013) [2023-09-18 22:35:07,924][30208] Fps is (10 sec: 10649.6, 60 sec: 10308.2, 300 sec: 10281.0). Total num frames: 1032192. Throughput: 0: 10397.4. Samples: 1025144. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-18 22:35:07,925][30208] Avg episode reward: [(0, '1977.205')] [2023-09-18 22:35:10,663][30609] Updated weights for policy 0, policy_version 2080 (0.0016) [2023-09-18 22:35:12,924][30208] Fps is (10 sec: 10649.7, 60 sec: 10376.6, 300 sec: 10298.5). Total num frames: 1085440. Throughput: 0: 10377.7. Samples: 1089388. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:35:12,925][30208] Avg episode reward: [(0, '1971.200')] [2023-09-18 22:35:14,526][30609] Updated weights for policy 0, policy_version 2160 (0.0012) [2023-09-18 22:35:17,924][30208] Fps is (10 sec: 11059.2, 60 sec: 10513.1, 300 sec: 10351.7). Total num frames: 1142784. Throughput: 0: 10416.4. Samples: 1120684. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:35:17,925][30208] Avg episode reward: [(0, '1849.219')] [2023-09-18 22:35:17,931][30590] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000002232_1142784.pth... [2023-09-18 22:35:17,934][30590] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000001608_823296.pth [2023-09-18 22:35:18,322][30609] Updated weights for policy 0, policy_version 2240 (0.0014) [2023-09-18 22:35:22,463][30609] Updated weights for policy 0, policy_version 2320 (0.0014) [2023-09-18 22:35:22,924][30208] Fps is (10 sec: 10649.6, 60 sec: 10444.8, 300 sec: 10329.0). Total num frames: 1191936. Throughput: 0: 10423.6. Samples: 1183668. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:35:22,925][30208] Avg episode reward: [(0, '2341.566')] [2023-09-18 22:35:22,926][30590] Saving new best policy, reward=2341.566! [2023-09-18 22:35:26,798][30609] Updated weights for policy 0, policy_version 2400 (0.0015) [2023-09-18 22:35:27,924][30208] Fps is (10 sec: 9420.8, 60 sec: 10308.3, 300 sec: 10274.1). Total num frames: 1236992. Throughput: 0: 10327.7. Samples: 1240904. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:35:27,925][30208] Avg episode reward: [(0, '1954.160')] [2023-09-18 22:35:30,987][30609] Updated weights for policy 0, policy_version 2480 (0.0014) [2023-09-18 22:35:32,924][30208] Fps is (10 sec: 9420.7, 60 sec: 10240.0, 300 sec: 10256.4). Total num frames: 1286144. Throughput: 0: 10291.5. Samples: 1269760. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 22:35:32,925][30208] Avg episode reward: [(0, '1265.798')] [2023-09-18 22:35:32,970][30590] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000002520_1290240.pth... [2023-09-18 22:35:32,974][30590] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000001912_978944.pth [2023-09-18 22:35:35,022][30609] Updated weights for policy 0, policy_version 2560 (0.0014) [2023-09-18 22:35:37,924][30208] Fps is (10 sec: 10649.6, 60 sec: 10376.5, 300 sec: 10303.0). Total num frames: 1343488. Throughput: 0: 10311.7. Samples: 1332336. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:35:37,925][30208] Avg episode reward: [(0, '1721.652')] [2023-09-18 22:35:38,737][30609] Updated weights for policy 0, policy_version 2640 (0.0012) [2023-09-18 22:35:39,355][30208] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 30208], exiting... [2023-09-18 22:35:39,356][30208] Runner profile tree view: main_loop: 137.6005 [2023-09-18 22:35:39,356][30208] Collected {0: 1355776}, FPS: 9853.0 [2023-09-18 22:35:39,356][30614] Stopping RolloutWorker_w3... [2023-09-18 22:35:39,356][30619] Stopping RolloutWorker_w7... [2023-09-18 22:35:39,356][30614] Loop rollout_proc3_evt_loop terminating... [2023-09-18 22:35:39,356][30612] Stopping RolloutWorker_w2... [2023-09-18 22:35:39,356][30619] Loop rollout_proc7_evt_loop terminating... [2023-09-18 22:35:39,356][30616] Stopping RolloutWorker_w6... [2023-09-18 22:35:39,357][30612] Loop rollout_proc2_evt_loop terminating... [2023-09-18 22:35:39,356][30590] Stopping Batcher_0... [2023-09-18 22:35:39,356][30611] Stopping RolloutWorker_w1... [2023-09-18 22:35:39,357][30615] Stopping RolloutWorker_w5... [2023-09-18 22:35:39,357][30616] Loop rollout_proc6_evt_loop terminating... [2023-09-18 22:35:39,357][30611] Loop rollout_proc1_evt_loop terminating... [2023-09-18 22:35:39,357][30615] Loop rollout_proc5_evt_loop terminating... [2023-09-18 22:35:39,357][30613] Stopping RolloutWorker_w4... [2023-09-18 22:35:39,357][30590] Loop batcher_evt_loop terminating... [2023-09-18 22:35:39,357][30610] Stopping RolloutWorker_w0... [2023-09-18 22:35:39,358][30613] Loop rollout_proc4_evt_loop terminating... [2023-09-18 22:35:39,358][30610] Loop rollout_proc0_evt_loop terminating... [2023-09-18 22:35:39,358][30590] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000002648_1355776.pth... [2023-09-18 22:35:39,362][30590] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000002232_1142784.pth [2023-09-18 22:35:39,362][30590] Stopping LearnerWorker_p0... [2023-09-18 22:35:39,362][30590] Loop learner_proc0_evt_loop terminating... [2023-09-18 22:35:39,370][30609] Weights refcount: 2 0 [2023-09-18 22:35:39,371][30609] Stopping InferenceWorker_p0-w0... [2023-09-18 22:35:39,372][30609] Loop inference_proc0-0_evt_loop terminating... [2023-09-18 22:35:46,709][40872] Saving configuration to ./train_dir/Hopper/config.json... [2023-09-18 22:35:46,711][40872] Rollout worker 0 uses device cpu [2023-09-18 22:35:46,711][40872] Rollout worker 1 uses device cpu [2023-09-18 22:35:46,711][40872] Rollout worker 2 uses device cpu [2023-09-18 22:35:46,712][40872] Rollout worker 3 uses device cpu [2023-09-18 22:35:46,712][40872] Rollout worker 4 uses device cpu [2023-09-18 22:35:46,712][40872] Rollout worker 5 uses device cpu [2023-09-18 22:35:46,712][40872] Rollout worker 6 uses device cpu [2023-09-18 22:35:46,713][40872] Rollout worker 7 uses device cpu [2023-09-18 22:35:46,713][40872] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1 [2023-09-18 22:35:46,766][40872] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-18 22:35:46,767][40872] InferenceWorker_p0-w0: min num requests: 1 [2023-09-18 22:35:46,770][40872] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-18 22:35:46,770][40872] InferenceWorker_p1-w0: min num requests: 1 [2023-09-18 22:35:46,795][40872] Starting all processes... [2023-09-18 22:35:46,795][40872] Starting process learner_proc0 [2023-09-18 22:35:46,798][40872] Starting process learner_proc1 [2023-09-18 22:35:46,845][40872] Starting all processes... [2023-09-18 22:35:46,852][40872] Starting process inference_proc0-0 [2023-09-18 22:35:46,853][40872] Starting process inference_proc1-0 [2023-09-18 22:35:46,853][40872] Starting process rollout_proc0 [2023-09-18 22:35:46,854][40872] Starting process rollout_proc1 [2023-09-18 22:35:46,855][40872] Starting process rollout_proc2 [2023-09-18 22:35:46,857][40872] Starting process rollout_proc3 [2023-09-18 22:35:46,861][40872] Starting process rollout_proc4 [2023-09-18 22:35:46,909][40872] Starting process rollout_proc5 [2023-09-18 22:35:46,910][40872] Starting process rollout_proc6 [2023-09-18 22:35:46,910][40872] Starting process rollout_proc7 [2023-09-18 22:35:48,811][41359] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-18 22:35:48,811][41359] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-09-18 22:35:48,822][41486] Worker 3 uses CPU cores [12, 13, 14, 15] [2023-09-18 22:35:48,823][41393] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-18 22:35:48,823][41393] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-09-18 22:35:48,827][41360] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-18 22:35:48,827][41360] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for learning process 1 [2023-09-18 22:35:48,832][41488] Worker 6 uses CPU cores [24, 25, 26, 27] [2023-09-18 22:35:48,844][41484] Worker 0 uses CPU cores [0, 1, 2, 3] [2023-09-18 22:35:48,845][41360] Num visible devices: 1 [2023-09-18 22:35:48,856][41359] Num visible devices: 1 [2023-09-18 22:35:48,867][41491] Worker 7 uses CPU cores [28, 29, 30, 31] [2023-09-18 22:35:48,877][41393] Num visible devices: 1 [2023-09-18 22:35:48,891][41489] Worker 4 uses CPU cores [16, 17, 18, 19] [2023-09-18 22:35:48,895][41359] Starting seed is not provided [2023-09-18 22:35:48,896][41359] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-18 22:35:48,896][41359] Initializing actor-critic model on device cuda:0 [2023-09-18 22:35:48,898][41359] RunningMeanStd input shape: (11,) [2023-09-18 22:35:48,899][41359] RunningMeanStd input shape: (1,) [2023-09-18 22:35:48,922][41360] Starting seed is not provided [2023-09-18 22:35:48,923][41360] Using GPUs [0] for process 1 (actually maps to GPUs [1]) [2023-09-18 22:35:48,923][41360] Initializing actor-critic model on device cuda:0 [2023-09-18 22:35:48,924][41360] RunningMeanStd input shape: (11,) [2023-09-18 22:35:48,925][41360] RunningMeanStd input shape: (1,) [2023-09-18 22:35:48,935][41482] Worker 1 uses CPU cores [4, 5, 6, 7] [2023-09-18 22:35:48,939][41480] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-18 22:35:48,939][41480] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for inference process 1 [2023-09-18 22:35:48,958][41480] Num visible devices: 1 [2023-09-18 22:35:48,965][41359] Created Actor Critic model with architecture: [2023-09-18 22:35:48,966][41359] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): MlpEncoder( (mlp_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=Tanh) (2): RecursiveScriptModule(original_name=Linear) (3): RecursiveScriptModule(original_name=Tanh) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=64, out_features=1, bias=True) (action_parameterization): ActionParameterizationContinuousNonAdaptiveStddev( (distribution_linear): Linear(in_features=64, out_features=3, bias=True) ) ) [2023-09-18 22:35:48,971][41360] Created Actor Critic model with architecture: [2023-09-18 22:35:48,971][41360] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): MlpEncoder( (mlp_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=Tanh) (2): RecursiveScriptModule(original_name=Linear) (3): RecursiveScriptModule(original_name=Tanh) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=64, out_features=1, bias=True) (action_parameterization): ActionParameterizationContinuousNonAdaptiveStddev( (distribution_linear): Linear(in_features=64, out_features=3, bias=True) ) ) [2023-09-18 22:35:48,987][41485] Worker 2 uses CPU cores [8, 9, 10, 11] [2023-09-18 22:35:49,002][41487] Worker 5 uses CPU cores [20, 21, 22, 23] [2023-09-18 22:35:49,572][41359] Using optimizer [2023-09-18 22:35:49,573][41359] Loading state from checkpoint ./train_dir/Hopper/checkpoint_p0/checkpoint_000002648_1355776.pth... [2023-09-18 22:35:49,578][41360] Using optimizer [2023-09-18 22:35:49,578][41359] Loading model from checkpoint [2023-09-18 22:35:49,578][41360] No checkpoints found [2023-09-18 22:35:49,579][41360] Did not load from checkpoint, starting from scratch! [2023-09-18 22:35:49,579][41360] Initialized policy 1 weights for model version 0 [2023-09-18 22:35:49,580][41359] Loaded experiment state at self.train_step=2648, self.env_steps=1355776 [2023-09-18 22:35:49,580][41360] LearnerWorker_p1 finished initialization! [2023-09-18 22:35:49,580][41360] Using GPUs [0] for process 1 (actually maps to GPUs [1]) [2023-09-18 22:35:49,581][41359] Initialized policy 0 weights for model version 2648 [2023-09-18 22:35:49,582][41359] LearnerWorker_p0 finished initialization! [2023-09-18 22:35:49,582][41359] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-18 22:35:50,168][41480] RunningMeanStd input shape: (11,) [2023-09-18 22:35:50,169][41480] RunningMeanStd input shape: (1,) [2023-09-18 22:35:50,187][41393] RunningMeanStd input shape: (11,) [2023-09-18 22:35:50,188][41393] RunningMeanStd input shape: (1,) [2023-09-18 22:35:50,203][40872] Inference worker 1-0 is ready! [2023-09-18 22:35:50,223][40872] Inference worker 0-0 is ready! [2023-09-18 22:35:50,224][40872] All inference workers are ready! Signal rollout workers to start! [2023-09-18 22:35:50,345][41482] Decorrelating experience for 0 frames... [2023-09-18 22:35:50,346][41482] Decorrelating experience for 64 frames... [2023-09-18 22:35:50,347][41488] Decorrelating experience for 0 frames... [2023-09-18 22:35:50,347][41488] Decorrelating experience for 64 frames... [2023-09-18 22:35:50,351][41487] Decorrelating experience for 0 frames... [2023-09-18 22:35:50,352][41487] Decorrelating experience for 64 frames... [2023-09-18 22:35:50,354][41485] Decorrelating experience for 0 frames... [2023-09-18 22:35:50,355][41485] Decorrelating experience for 64 frames... [2023-09-18 22:35:50,358][41486] Decorrelating experience for 0 frames... [2023-09-18 22:35:50,359][41486] Decorrelating experience for 64 frames... [2023-09-18 22:35:50,360][41491] Decorrelating experience for 0 frames... [2023-09-18 22:35:50,360][41491] Decorrelating experience for 64 frames... [2023-09-18 22:35:50,366][41488] Decorrelating experience for 128 frames... [2023-09-18 22:35:50,368][41482] Decorrelating experience for 128 frames... [2023-09-18 22:35:50,370][41487] Decorrelating experience for 128 frames... [2023-09-18 22:35:50,373][41484] Decorrelating experience for 0 frames... [2023-09-18 22:35:50,373][41489] Decorrelating experience for 0 frames... [2023-09-18 22:35:50,374][41484] Decorrelating experience for 64 frames... [2023-09-18 22:35:50,374][41489] Decorrelating experience for 64 frames... [2023-09-18 22:35:50,378][41486] Decorrelating experience for 128 frames... [2023-09-18 22:35:50,381][41491] Decorrelating experience for 128 frames... [2023-09-18 22:35:50,383][41485] Decorrelating experience for 128 frames... [2023-09-18 22:35:50,403][41488] Decorrelating experience for 192 frames... [2023-09-18 22:35:50,404][41482] Decorrelating experience for 192 frames... [2023-09-18 22:35:50,405][41487] Decorrelating experience for 192 frames... [2023-09-18 22:35:50,406][41484] Decorrelating experience for 128 frames... [2023-09-18 22:35:50,407][41489] Decorrelating experience for 128 frames... [2023-09-18 22:35:50,415][41486] Decorrelating experience for 192 frames... [2023-09-18 22:35:50,418][41491] Decorrelating experience for 192 frames... [2023-09-18 22:35:50,440][41485] Decorrelating experience for 192 frames... [2023-09-18 22:35:50,470][41484] Decorrelating experience for 192 frames... [2023-09-18 22:35:50,472][41489] Decorrelating experience for 192 frames... [2023-09-18 22:35:50,472][41488] Decorrelating experience for 256 frames... [2023-09-18 22:35:50,474][41482] Decorrelating experience for 256 frames... [2023-09-18 22:35:50,484][41486] Decorrelating experience for 256 frames... [2023-09-18 22:35:50,486][41491] Decorrelating experience for 256 frames... [2023-09-18 22:35:50,486][41487] Decorrelating experience for 256 frames... [2023-09-18 22:35:50,541][41489] Decorrelating experience for 256 frames... [2023-09-18 22:35:50,543][41484] Decorrelating experience for 256 frames... [2023-09-18 22:35:50,545][41485] Decorrelating experience for 256 frames... [2023-09-18 22:35:50,547][41488] Decorrelating experience for 320 frames... [2023-09-18 22:35:50,548][41482] Decorrelating experience for 320 frames... [2023-09-18 22:35:50,556][41486] Decorrelating experience for 320 frames... [2023-09-18 22:35:50,557][41491] Decorrelating experience for 320 frames... [2023-09-18 22:35:50,559][41487] Decorrelating experience for 320 frames... [2023-09-18 22:35:50,612][41489] Decorrelating experience for 320 frames... [2023-09-18 22:35:50,619][41484] Decorrelating experience for 320 frames... [2023-09-18 22:35:50,638][41488] Decorrelating experience for 384 frames... [2023-09-18 22:35:50,641][41482] Decorrelating experience for 384 frames... [2023-09-18 22:35:50,646][41486] Decorrelating experience for 384 frames... [2023-09-18 22:35:50,650][41491] Decorrelating experience for 384 frames... [2023-09-18 22:35:50,652][41487] Decorrelating experience for 384 frames... [2023-09-18 22:35:50,658][41485] Decorrelating experience for 320 frames... [2023-09-18 22:35:50,698][41489] Decorrelating experience for 384 frames... [2023-09-18 22:35:50,707][41484] Decorrelating experience for 384 frames... [2023-09-18 22:35:50,749][41488] Decorrelating experience for 448 frames... [2023-09-18 22:35:50,749][41482] Decorrelating experience for 448 frames... [2023-09-18 22:35:50,754][41486] Decorrelating experience for 448 frames... [2023-09-18 22:35:50,757][41491] Decorrelating experience for 448 frames... [2023-09-18 22:35:50,757][41487] Decorrelating experience for 448 frames... [2023-09-18 22:35:50,797][41485] Decorrelating experience for 384 frames... [2023-09-18 22:35:50,802][41489] Decorrelating experience for 448 frames... [2023-09-18 22:35:50,821][41484] Decorrelating experience for 448 frames... [2023-09-18 22:35:50,976][41485] Decorrelating experience for 448 frames... [2023-09-18 22:35:52,921][40872] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 1355776. Throughput: 0: nan, 1: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-09-18 22:35:52,921][40872] Avg episode reward: [(0, '191.965'), (1, '16.595')] [2023-09-18 22:35:57,921][40872] Fps is (10 sec: 8191.9, 60 sec: 8191.9, 300 sec: 8191.9). Total num frames: 1396736. Throughput: 0: 2705.6, 1: 2676.0. Samples: 26908. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:35:57,922][40872] Avg episode reward: [(0, '1171.007'), (1, '100.426')] [2023-09-18 22:35:57,927][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000002688_1376256.pth... [2023-09-18 22:35:57,927][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000000040_20480.pth... [2023-09-18 22:35:57,933][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000002520_1290240.pth [2023-09-18 22:36:01,060][41393] Updated weights for policy 0, policy_version 2728 (0.0012) [2023-09-18 22:36:01,061][41480] Updated weights for policy 1, policy_version 80 (0.0013) [2023-09-18 22:36:02,921][40872] Fps is (10 sec: 9830.4, 60 sec: 9830.4, 300 sec: 9830.4). Total num frames: 1454080. Throughput: 0: 4723.4, 1: 4708.0. Samples: 94314. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:36:02,922][40872] Avg episode reward: [(0, '1417.753'), (1, '305.303')] [2023-09-18 22:36:06,753][40872] Heartbeat connected on Batcher_0 [2023-09-18 22:36:06,756][40872] Heartbeat connected on LearnerWorker_p0 [2023-09-18 22:36:06,760][40872] Heartbeat connected on Batcher_1 [2023-09-18 22:36:06,763][40872] Heartbeat connected on LearnerWorker_p1 [2023-09-18 22:36:06,769][40872] Heartbeat connected on InferenceWorker_p0-w0 [2023-09-18 22:36:06,773][40872] Heartbeat connected on InferenceWorker_p1-w0 [2023-09-18 22:36:06,774][40872] Heartbeat connected on RolloutWorker_w0 [2023-09-18 22:36:06,779][40872] Heartbeat connected on RolloutWorker_w1 [2023-09-18 22:36:06,782][40872] Heartbeat connected on RolloutWorker_w2 [2023-09-18 22:36:06,784][40872] Heartbeat connected on RolloutWorker_w3 [2023-09-18 22:36:06,788][40872] Heartbeat connected on RolloutWorker_w4 [2023-09-18 22:36:06,789][40872] Heartbeat connected on RolloutWorker_w5 [2023-09-18 22:36:06,793][40872] Heartbeat connected on RolloutWorker_w6 [2023-09-18 22:36:06,796][40872] Heartbeat connected on RolloutWorker_w7 [2023-09-18 22:36:07,921][40872] Fps is (10 sec: 10649.6, 60 sec: 9830.3, 300 sec: 9830.3). Total num frames: 1503232. Throughput: 0: 4210.5, 1: 4202.6. Samples: 126198. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-18 22:36:07,922][40872] Avg episode reward: [(0, '1798.202'), (1, '345.626')] [2023-09-18 22:36:08,850][41480] Updated weights for policy 1, policy_version 160 (0.0014) [2023-09-18 22:36:08,850][41393] Updated weights for policy 0, policy_version 2808 (0.0015) [2023-09-18 22:36:12,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10240.0, 300 sec: 10240.0). Total num frames: 1560576. Throughput: 0: 4730.0, 1: 4724.0. Samples: 189080. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:36:12,921][40872] Avg episode reward: [(0, '1874.947'), (1, '381.107')] [2023-09-18 22:36:12,929][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000002848_1458176.pth... [2023-09-18 22:36:12,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000000200_102400.pth... [2023-09-18 22:36:12,935][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000002648_1355776.pth [2023-09-18 22:36:12,943][41360] Saving new best policy, reward=381.107! [2023-09-18 22:36:16,565][41393] Updated weights for policy 0, policy_version 2888 (0.0013) [2023-09-18 22:36:16,566][41480] Updated weights for policy 1, policy_version 240 (0.0015) [2023-09-18 22:36:17,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10158.1, 300 sec: 10158.1). Total num frames: 1609728. Throughput: 0: 5076.8, 1: 5075.5. Samples: 253808. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:36:17,922][40872] Avg episode reward: [(0, '2031.201'), (1, '602.104')] [2023-09-18 22:36:17,923][41360] Saving new best policy, reward=602.104! [2023-09-18 22:36:22,921][40872] Fps is (10 sec: 10649.4, 60 sec: 10376.5, 300 sec: 10376.5). Total num frames: 1667072. Throughput: 0: 4778.7, 1: 4778.6. Samples: 286722. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:36:22,922][40872] Avg episode reward: [(0, '2306.326'), (1, '779.307')] [2023-09-18 22:36:22,923][41360] Saving new best policy, reward=779.307! [2023-09-18 22:36:24,130][41393] Updated weights for policy 0, policy_version 2968 (0.0013) [2023-09-18 22:36:24,130][41480] Updated weights for policy 1, policy_version 320 (0.0013) [2023-09-18 22:36:27,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10298.5, 300 sec: 10298.5). Total num frames: 1716224. Throughput: 0: 5015.7, 1: 5012.2. Samples: 350980. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:36:27,922][40872] Avg episode reward: [(0, '2536.207'), (1, '916.514')] [2023-09-18 22:36:27,931][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000000352_180224.pth... [2023-09-18 22:36:27,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000003000_1536000.pth... [2023-09-18 22:36:27,939][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000000040_20480.pth [2023-09-18 22:36:27,939][41360] Saving new best policy, reward=916.514! [2023-09-18 22:36:27,940][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000002688_1376256.pth [2023-09-18 22:36:27,941][41359] Saving new best policy, reward=2536.207! [2023-09-18 22:36:31,882][41393] Updated weights for policy 0, policy_version 3048 (0.0012) [2023-09-18 22:36:31,882][41480] Updated weights for policy 1, policy_version 400 (0.0013) [2023-09-18 22:36:32,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10444.8, 300 sec: 10444.8). Total num frames: 1773568. Throughput: 0: 5178.7, 1: 5176.0. Samples: 414188. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:36:32,921][40872] Avg episode reward: [(0, '2650.612'), (1, '1031.168')] [2023-09-18 22:36:32,922][41359] Saving new best policy, reward=2650.612! [2023-09-18 22:36:32,922][41360] Saving new best policy, reward=1031.168! [2023-09-18 22:36:37,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10376.5, 300 sec: 10376.5). Total num frames: 1822720. Throughput: 0: 4954.2, 1: 4951.4. Samples: 445752. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 22:36:37,922][40872] Avg episode reward: [(0, '2584.605'), (1, '1021.886')] [2023-09-18 22:36:39,511][41480] Updated weights for policy 1, policy_version 480 (0.0017) [2023-09-18 22:36:39,511][41393] Updated weights for policy 0, policy_version 3128 (0.0014) [2023-09-18 22:36:42,921][40872] Fps is (10 sec: 10649.4, 60 sec: 10485.7, 300 sec: 10485.7). Total num frames: 1880064. Throughput: 0: 5385.6, 1: 5386.0. Samples: 511632. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:36:42,922][40872] Avg episode reward: [(0, '2810.191'), (1, '1008.387')] [2023-09-18 22:36:42,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000003160_1617920.pth... [2023-09-18 22:36:42,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000000512_262144.pth... [2023-09-18 22:36:42,938][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000002848_1458176.pth [2023-09-18 22:36:42,939][41359] Saving new best policy, reward=2810.191! [2023-09-18 22:36:42,940][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000000200_102400.pth [2023-09-18 22:36:47,217][41480] Updated weights for policy 1, policy_version 560 (0.0012) [2023-09-18 22:36:47,218][41393] Updated weights for policy 0, policy_version 3208 (0.0015) [2023-09-18 22:36:47,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10426.2, 300 sec: 10426.2). Total num frames: 1929216. Throughput: 0: 5323.0, 1: 5326.1. Samples: 573522. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 22:36:47,922][40872] Avg episode reward: [(0, '2560.439'), (1, '1158.231')] [2023-09-18 22:36:47,923][41360] Saving new best policy, reward=1158.231! [2023-09-18 22:36:52,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10513.0, 300 sec: 10513.0). Total num frames: 1986560. Throughput: 0: 5348.6, 1: 5349.1. Samples: 607598. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-18 22:36:52,922][40872] Avg episode reward: [(0, '2765.092'), (1, '1182.431')] [2023-09-18 22:36:52,924][41360] Saving new best policy, reward=1182.431! [2023-09-18 22:36:54,616][41393] Updated weights for policy 0, policy_version 3288 (0.0012) [2023-09-18 22:36:54,617][41480] Updated weights for policy 1, policy_version 640 (0.0015) [2023-09-18 22:36:57,921][40872] Fps is (10 sec: 11469.0, 60 sec: 10786.1, 300 sec: 10586.6). Total num frames: 2043904. Throughput: 0: 5375.0, 1: 5375.4. Samples: 672846. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 22:36:57,921][40872] Avg episode reward: [(0, '2483.758'), (1, '1237.596')] [2023-09-18 22:36:57,927][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000003320_1699840.pth... [2023-09-18 22:36:57,928][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000000672_344064.pth... [2023-09-18 22:36:57,931][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000003000_1536000.pth [2023-09-18 22:36:57,934][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000000352_180224.pth [2023-09-18 22:36:57,935][41360] Saving new best policy, reward=1237.596! [2023-09-18 22:37:02,144][41393] Updated weights for policy 0, policy_version 3368 (0.0016) [2023-09-18 22:37:02,144][41480] Updated weights for policy 1, policy_version 720 (0.0015) [2023-09-18 22:37:02,921][40872] Fps is (10 sec: 11469.2, 60 sec: 10786.2, 300 sec: 10649.6). Total num frames: 2101248. Throughput: 0: 5383.4, 1: 5380.7. Samples: 738190. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:37:02,921][40872] Avg episode reward: [(0, '2584.665'), (1, '1306.199')] [2023-09-18 22:37:02,922][41360] Saving new best policy, reward=1306.199! [2023-09-18 22:37:07,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10786.2, 300 sec: 10595.0). Total num frames: 2150400. Throughput: 0: 5377.4, 1: 5374.4. Samples: 770548. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-18 22:37:07,921][40872] Avg episode reward: [(0, '2253.017'), (1, '1457.052')] [2023-09-18 22:37:07,922][41360] Saving new best policy, reward=1457.052! [2023-09-18 22:37:09,506][41393] Updated weights for policy 0, policy_version 3448 (0.0013) [2023-09-18 22:37:09,507][41480] Updated weights for policy 1, policy_version 800 (0.0013) [2023-09-18 22:37:12,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10786.1, 300 sec: 10649.6). Total num frames: 2207744. Throughput: 0: 5436.7, 1: 5436.5. Samples: 840268. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-18 22:37:12,921][40872] Avg episode reward: [(0, '2455.549'), (1, '1730.993')] [2023-09-18 22:37:12,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000000832_425984.pth... [2023-09-18 22:37:12,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000003480_1781760.pth... [2023-09-18 22:37:12,934][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000000512_262144.pth [2023-09-18 22:37:12,934][41360] Saving new best policy, reward=1730.993! [2023-09-18 22:37:12,941][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000003160_1617920.pth [2023-09-18 22:37:16,987][41393] Updated weights for policy 0, policy_version 3528 (0.0011) [2023-09-18 22:37:16,987][41480] Updated weights for policy 1, policy_version 880 (0.0013) [2023-09-18 22:37:17,921][40872] Fps is (10 sec: 11468.7, 60 sec: 10922.7, 300 sec: 10697.8). Total num frames: 2265088. Throughput: 0: 5449.2, 1: 5448.2. Samples: 904572. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:37:17,922][40872] Avg episode reward: [(0, '2526.165'), (1, '1899.774')] [2023-09-18 22:37:17,923][41360] Saving new best policy, reward=1899.774! [2023-09-18 22:37:22,921][40872] Fps is (10 sec: 11468.7, 60 sec: 10922.7, 300 sec: 10740.6). Total num frames: 2322432. Throughput: 0: 5476.6, 1: 5476.3. Samples: 938632. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 22:37:22,921][40872] Avg episode reward: [(0, '2637.824'), (1, '1718.439')] [2023-09-18 22:37:24,263][41480] Updated weights for policy 1, policy_version 960 (0.0014) [2023-09-18 22:37:24,263][41393] Updated weights for policy 0, policy_version 3608 (0.0014) [2023-09-18 22:37:27,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10692.7). Total num frames: 2371584. Throughput: 0: 5481.8, 1: 5481.5. Samples: 1004976. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:37:27,921][40872] Avg episode reward: [(0, '2437.142'), (1, '1648.795')] [2023-09-18 22:37:27,931][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000000992_507904.pth... [2023-09-18 22:37:27,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000003640_1863680.pth... [2023-09-18 22:37:27,938][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000000672_344064.pth [2023-09-18 22:37:27,938][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000003320_1699840.pth [2023-09-18 22:37:31,922][41480] Updated weights for policy 1, policy_version 1040 (0.0013) [2023-09-18 22:37:31,923][41393] Updated weights for policy 0, policy_version 3688 (0.0014) [2023-09-18 22:37:32,921][40872] Fps is (10 sec: 10649.3, 60 sec: 10922.6, 300 sec: 10731.5). Total num frames: 2428928. Throughput: 0: 5506.7, 1: 5504.1. Samples: 1069010. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:37:32,922][40872] Avg episode reward: [(0, '2378.184'), (1, '1936.068')] [2023-09-18 22:37:32,924][41360] Saving new best policy, reward=1936.068! [2023-09-18 22:37:37,921][40872] Fps is (10 sec: 10649.4, 60 sec: 10922.6, 300 sec: 10688.6). Total num frames: 2478080. Throughput: 0: 5454.6, 1: 5454.5. Samples: 1098508. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:37:37,922][40872] Avg episode reward: [(0, '2485.771'), (1, '2167.482')] [2023-09-18 22:37:37,923][41360] Saving new best policy, reward=2167.482! [2023-09-18 22:37:39,735][41480] Updated weights for policy 1, policy_version 1120 (0.0013) [2023-09-18 22:37:39,736][41393] Updated weights for policy 0, policy_version 3768 (0.0013) [2023-09-18 22:37:42,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10724.1). Total num frames: 2535424. Throughput: 0: 5447.8, 1: 5449.0. Samples: 1163202. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:37:42,922][40872] Avg episode reward: [(0, '2451.731'), (1, '2278.881')] [2023-09-18 22:37:42,933][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000003800_1945600.pth... [2023-09-18 22:37:42,933][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000001152_589824.pth... [2023-09-18 22:37:42,940][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000000832_425984.pth [2023-09-18 22:37:42,940][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000003480_1781760.pth [2023-09-18 22:37:42,941][41360] Saving new best policy, reward=2278.881! [2023-09-18 22:37:47,337][41393] Updated weights for policy 0, policy_version 3848 (0.0011) [2023-09-18 22:37:47,338][41480] Updated weights for policy 1, policy_version 1200 (0.0012) [2023-09-18 22:37:47,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 10685.2). Total num frames: 2584576. Throughput: 0: 5448.9, 1: 5451.8. Samples: 1228724. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:37:47,922][40872] Avg episode reward: [(0, '2233.863'), (1, '2368.168')] [2023-09-18 22:37:47,923][41360] Saving new best policy, reward=2368.168! [2023-09-18 22:37:52,921][40872] Fps is (10 sec: 11468.8, 60 sec: 11059.2, 300 sec: 10786.1). Total num frames: 2650112. Throughput: 0: 5467.8, 1: 5468.6. Samples: 1262686. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:37:52,922][40872] Avg episode reward: [(0, '2263.661'), (1, '2214.891')] [2023-09-18 22:37:54,388][41393] Updated weights for policy 0, policy_version 3928 (0.0011) [2023-09-18 22:37:54,389][41480] Updated weights for policy 1, policy_version 1280 (0.0016) [2023-09-18 22:37:57,921][40872] Fps is (10 sec: 12288.2, 60 sec: 11059.2, 300 sec: 10813.4). Total num frames: 2707456. Throughput: 0: 5465.2, 1: 5464.8. Samples: 1332118. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 22:37:57,922][40872] Avg episode reward: [(0, '2296.519'), (1, '1804.193')] [2023-09-18 22:37:57,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000003968_2031616.pth... [2023-09-18 22:37:57,933][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000001320_675840.pth... [2023-09-18 22:37:57,938][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000003640_1863680.pth [2023-09-18 22:37:57,941][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000000992_507904.pth [2023-09-18 22:38:01,716][41480] Updated weights for policy 1, policy_version 1360 (0.0012) [2023-09-18 22:38:01,716][41393] Updated weights for policy 0, policy_version 4008 (0.0014) [2023-09-18 22:38:02,921][40872] Fps is (10 sec: 10649.9, 60 sec: 10922.7, 300 sec: 10775.6). Total num frames: 2756608. Throughput: 0: 5490.7, 1: 5491.2. Samples: 1398756. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:38:02,921][40872] Avg episode reward: [(0, '2413.661'), (1, '1773.820')] [2023-09-18 22:38:07,921][40872] Fps is (10 sec: 10649.6, 60 sec: 11059.2, 300 sec: 10801.3). Total num frames: 2813952. Throughput: 0: 5455.8, 1: 5456.2. Samples: 1429670. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 22:38:07,921][40872] Avg episode reward: [(0, '2537.393'), (1, '1852.423')] [2023-09-18 22:38:09,165][41480] Updated weights for policy 1, policy_version 1440 (0.0015) [2023-09-18 22:38:09,165][41393] Updated weights for policy 0, policy_version 4088 (0.0015) [2023-09-18 22:38:12,921][40872] Fps is (10 sec: 10649.1, 60 sec: 10922.6, 300 sec: 10766.6). Total num frames: 2863104. Throughput: 0: 5478.4, 1: 5478.7. Samples: 1498050. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:38:12,922][40872] Avg episode reward: [(0, '2660.017'), (1, '2229.466')] [2023-09-18 22:38:12,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000004120_2109440.pth... [2023-09-18 22:38:12,933][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000001472_753664.pth... [2023-09-18 22:38:12,939][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000003800_1945600.pth [2023-09-18 22:38:12,943][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000001152_589824.pth [2023-09-18 22:38:16,980][41393] Updated weights for policy 0, policy_version 4168 (0.0017) [2023-09-18 22:38:16,980][41480] Updated weights for policy 1, policy_version 1520 (0.0015) [2023-09-18 22:38:17,921][40872] Fps is (10 sec: 10649.3, 60 sec: 10922.6, 300 sec: 10790.8). Total num frames: 2920448. Throughput: 0: 5449.5, 1: 5449.2. Samples: 1559448. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 22:38:17,922][40872] Avg episode reward: [(0, '2241.540'), (1, '2411.941')] [2023-09-18 22:38:17,923][41360] Saving new best policy, reward=2411.941! [2023-09-18 22:38:22,921][40872] Fps is (10 sec: 10649.9, 60 sec: 10786.1, 300 sec: 10758.8). Total num frames: 2969600. Throughput: 0: 5460.8, 1: 5460.6. Samples: 1589972. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:38:22,922][40872] Avg episode reward: [(0, '1878.926'), (1, '2362.066')] [2023-09-18 22:38:24,724][41393] Updated weights for policy 0, policy_version 4248 (0.0010) [2023-09-18 22:38:24,724][41480] Updated weights for policy 1, policy_version 1600 (0.0014) [2023-09-18 22:38:27,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.6, 300 sec: 10781.7). Total num frames: 3026944. Throughput: 0: 5461.6, 1: 5462.5. Samples: 1654784. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-18 22:38:27,922][40872] Avg episode reward: [(0, '1990.063'), (1, '2313.284')] [2023-09-18 22:38:27,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000004280_2191360.pth... [2023-09-18 22:38:27,931][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000001632_835584.pth... [2023-09-18 22:38:27,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000003968_2031616.pth [2023-09-18 22:38:27,937][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000001320_675840.pth [2023-09-18 22:38:32,442][41393] Updated weights for policy 0, policy_version 4328 (0.0013) [2023-09-18 22:38:32,443][41480] Updated weights for policy 1, policy_version 1680 (0.0013) [2023-09-18 22:38:32,921][40872] Fps is (10 sec: 10649.8, 60 sec: 10786.2, 300 sec: 10752.0). Total num frames: 3076096. Throughput: 0: 5454.2, 1: 5452.9. Samples: 1719542. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 22:38:32,921][40872] Avg episode reward: [(0, '2210.912'), (1, '2220.458')] [2023-09-18 22:38:37,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 10773.7). Total num frames: 3133440. Throughput: 0: 5432.4, 1: 5431.3. Samples: 1751554. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:38:37,922][40872] Avg episode reward: [(0, '2093.644'), (1, '2199.059')] [2023-09-18 22:38:40,157][41393] Updated weights for policy 0, policy_version 4408 (0.0014) [2023-09-18 22:38:40,157][41480] Updated weights for policy 1, policy_version 1760 (0.0011) [2023-09-18 22:38:42,921][40872] Fps is (10 sec: 10649.4, 60 sec: 10786.1, 300 sec: 10746.0). Total num frames: 3182592. Throughput: 0: 5356.2, 1: 5356.8. Samples: 1814208. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:38:42,922][40872] Avg episode reward: [(0, '2254.247'), (1, '2379.676')] [2023-09-18 22:38:42,929][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000004432_2269184.pth... [2023-09-18 22:38:42,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000001784_913408.pth... [2023-09-18 22:38:42,933][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000004120_2109440.pth [2023-09-18 22:38:42,936][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000001472_753664.pth [2023-09-18 22:38:47,858][41480] Updated weights for policy 1, policy_version 1840 (0.0010) [2023-09-18 22:38:47,859][41393] Updated weights for policy 0, policy_version 4488 (0.0013) [2023-09-18 22:38:47,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10766.6). Total num frames: 3239936. Throughput: 0: 5316.5, 1: 5316.7. Samples: 1877254. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:38:47,922][40872] Avg episode reward: [(0, '2403.329'), (1, '2440.607')] [2023-09-18 22:38:47,923][41360] Saving new best policy, reward=2440.607! [2023-09-18 22:38:52,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10649.6, 300 sec: 10740.6). Total num frames: 3289088. Throughput: 0: 5333.0, 1: 5332.6. Samples: 1909624. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-18 22:38:52,922][40872] Avg episode reward: [(0, '2700.390'), (1, '2631.163')] [2023-09-18 22:38:52,923][41360] Saving new best policy, reward=2631.163! [2023-09-18 22:38:55,310][41480] Updated weights for policy 1, policy_version 1920 (0.0011) [2023-09-18 22:38:55,310][41393] Updated weights for policy 0, policy_version 4568 (0.0014) [2023-09-18 22:38:57,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10649.6, 300 sec: 10760.3). Total num frames: 3346432. Throughput: 0: 5327.3, 1: 5326.6. Samples: 1977472. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:38:57,922][40872] Avg episode reward: [(0, '2902.217'), (1, '2593.479')] [2023-09-18 22:38:57,930][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000004592_2351104.pth... [2023-09-18 22:38:57,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000001944_995328.pth... [2023-09-18 22:38:57,938][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000001632_835584.pth [2023-09-18 22:38:57,938][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000004280_2191360.pth [2023-09-18 22:38:57,938][41359] Saving new best policy, reward=2902.217! [2023-09-18 22:39:02,742][41480] Updated weights for policy 1, policy_version 2000 (0.0016) [2023-09-18 22:39:02,742][41393] Updated weights for policy 0, policy_version 4648 (0.0013) [2023-09-18 22:39:02,921][40872] Fps is (10 sec: 11468.8, 60 sec: 10786.1, 300 sec: 10778.9). Total num frames: 3403776. Throughput: 0: 5373.1, 1: 5373.3. Samples: 2043034. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 22:39:02,922][40872] Avg episode reward: [(0, '2921.891'), (1, '2692.786')] [2023-09-18 22:39:02,923][41359] Saving new best policy, reward=2921.891! [2023-09-18 22:39:02,923][41360] Saving new best policy, reward=2692.786! [2023-09-18 22:39:07,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10649.6, 300 sec: 10754.6). Total num frames: 3452928. Throughput: 0: 5382.5, 1: 5381.2. Samples: 2074336. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:39:07,922][40872] Avg episode reward: [(0, '2809.032'), (1, '2593.433')] [2023-09-18 22:39:10,224][41480] Updated weights for policy 1, policy_version 2080 (0.0012) [2023-09-18 22:39:10,224][41393] Updated weights for policy 0, policy_version 4728 (0.0016) [2023-09-18 22:39:12,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10786.2, 300 sec: 10772.5). Total num frames: 3510272. Throughput: 0: 5401.5, 1: 5398.6. Samples: 2140788. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:39:12,922][40872] Avg episode reward: [(0, '2562.068'), (1, '2461.698')] [2023-09-18 22:39:12,934][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000004752_2433024.pth... [2023-09-18 22:39:12,934][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000002104_1077248.pth... [2023-09-18 22:39:12,942][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000004432_2269184.pth [2023-09-18 22:39:12,942][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000001784_913408.pth [2023-09-18 22:39:17,639][41393] Updated weights for policy 0, policy_version 4808 (0.0013) [2023-09-18 22:39:17,640][41480] Updated weights for policy 1, policy_version 2160 (0.0014) [2023-09-18 22:39:17,921][40872] Fps is (10 sec: 11468.8, 60 sec: 10786.2, 300 sec: 10789.5). Total num frames: 3567616. Throughput: 0: 5418.0, 1: 5416.4. Samples: 2207092. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-18 22:39:17,922][40872] Avg episode reward: [(0, '2255.988'), (1, '2046.952')] [2023-09-18 22:39:22,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10786.1, 300 sec: 10766.6). Total num frames: 3616768. Throughput: 0: 5433.5, 1: 5434.6. Samples: 2240618. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:39:22,922][40872] Avg episode reward: [(0, '2011.513'), (1, '2089.820')] [2023-09-18 22:39:25,140][41480] Updated weights for policy 1, policy_version 2240 (0.0011) [2023-09-18 22:39:25,140][41393] Updated weights for policy 0, policy_version 4888 (0.0010) [2023-09-18 22:39:27,921][40872] Fps is (10 sec: 11468.8, 60 sec: 10922.7, 300 sec: 10821.1). Total num frames: 3682304. Throughput: 0: 5468.3, 1: 5468.0. Samples: 2306338. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:39:27,922][40872] Avg episode reward: [(0, '1849.134'), (1, '2149.329')] [2023-09-18 22:39:27,929][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000004920_2519040.pth... [2023-09-18 22:39:27,929][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000002272_1163264.pth... [2023-09-18 22:39:27,937][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000001944_995328.pth [2023-09-18 22:39:27,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000004592_2351104.pth [2023-09-18 22:39:32,458][41393] Updated weights for policy 0, policy_version 4968 (0.0014) [2023-09-18 22:39:32,459][41480] Updated weights for policy 1, policy_version 2320 (0.0015) [2023-09-18 22:39:32,921][40872] Fps is (10 sec: 11468.8, 60 sec: 10922.6, 300 sec: 10798.5). Total num frames: 3731456. Throughput: 0: 5521.1, 1: 5521.0. Samples: 2374146. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:39:32,922][40872] Avg episode reward: [(0, '1775.219'), (1, '2098.647')] [2023-09-18 22:39:37,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10813.4). Total num frames: 3788800. Throughput: 0: 5531.0, 1: 5530.5. Samples: 2407394. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:39:37,922][40872] Avg episode reward: [(0, '1871.846'), (1, '2203.137')] [2023-09-18 22:39:39,883][41393] Updated weights for policy 0, policy_version 5048 (0.0013) [2023-09-18 22:39:39,883][41480] Updated weights for policy 1, policy_version 2400 (0.0016) [2023-09-18 22:39:42,921][40872] Fps is (10 sec: 11469.0, 60 sec: 11059.2, 300 sec: 10827.7). Total num frames: 3846144. Throughput: 0: 5507.9, 1: 5509.2. Samples: 2473240. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:39:42,921][40872] Avg episode reward: [(0, '1942.839'), (1, '2182.620')] [2023-09-18 22:39:42,928][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000002432_1245184.pth... [2023-09-18 22:39:42,928][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000005080_2600960.pth... [2023-09-18 22:39:42,931][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000002104_1077248.pth [2023-09-18 22:39:42,935][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000004752_2433024.pth [2023-09-18 22:39:47,365][41480] Updated weights for policy 1, policy_version 2480 (0.0015) [2023-09-18 22:39:47,365][41393] Updated weights for policy 0, policy_version 5128 (0.0015) [2023-09-18 22:39:47,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10806.5). Total num frames: 3895296. Throughput: 0: 5507.2, 1: 5508.4. Samples: 2538738. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:39:47,922][40872] Avg episode reward: [(0, '2160.811'), (1, '2039.004')] [2023-09-18 22:39:52,921][40872] Fps is (10 sec: 10649.5, 60 sec: 11059.2, 300 sec: 10820.3). Total num frames: 3952640. Throughput: 0: 5508.5, 1: 5510.0. Samples: 2570168. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:39:52,921][40872] Avg episode reward: [(0, '2195.652'), (1, '2146.636')] [2023-09-18 22:39:54,950][41480] Updated weights for policy 1, policy_version 2560 (0.0014) [2023-09-18 22:39:54,950][41393] Updated weights for policy 0, policy_version 5208 (0.0015) [2023-09-18 22:39:57,921][40872] Fps is (10 sec: 11468.7, 60 sec: 11059.2, 300 sec: 10833.5). Total num frames: 4009984. Throughput: 0: 5514.7, 1: 5516.1. Samples: 2637174. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:39:57,922][40872] Avg episode reward: [(0, '2681.520'), (1, '2556.639')] [2023-09-18 22:39:57,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000005240_2682880.pth... [2023-09-18 22:39:57,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000002592_1327104.pth... [2023-09-18 22:39:57,939][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000002272_1163264.pth [2023-09-18 22:39:57,940][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000004920_2519040.pth [2023-09-18 22:40:02,282][41480] Updated weights for policy 1, policy_version 2640 (0.0014) [2023-09-18 22:40:02,282][41393] Updated weights for policy 0, policy_version 5288 (0.0015) [2023-09-18 22:40:02,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10922.7, 300 sec: 10813.4). Total num frames: 4059136. Throughput: 0: 5512.5, 1: 5515.6. Samples: 2703356. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:40:02,922][40872] Avg episode reward: [(0, '2706.528'), (1, '2655.337')] [2023-09-18 22:40:07,921][40872] Fps is (10 sec: 10649.8, 60 sec: 11059.2, 300 sec: 10826.3). Total num frames: 4116480. Throughput: 0: 5499.5, 1: 5500.1. Samples: 2735596. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:40:07,922][40872] Avg episode reward: [(0, '2784.750'), (1, '2345.813')] [2023-09-18 22:40:09,747][41480] Updated weights for policy 1, policy_version 2720 (0.0014) [2023-09-18 22:40:09,747][41393] Updated weights for policy 0, policy_version 5368 (0.0014) [2023-09-18 22:40:12,921][40872] Fps is (10 sec: 11468.9, 60 sec: 11059.2, 300 sec: 10838.6). Total num frames: 4173824. Throughput: 0: 5501.6, 1: 5501.6. Samples: 2801480. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:40:12,922][40872] Avg episode reward: [(0, '2592.374'), (1, '2048.888')] [2023-09-18 22:40:12,929][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000002752_1409024.pth... [2023-09-18 22:40:12,930][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000005400_2764800.pth... [2023-09-18 22:40:12,937][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000002432_1245184.pth [2023-09-18 22:40:12,938][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000005080_2600960.pth [2023-09-18 22:40:17,240][41480] Updated weights for policy 1, policy_version 2800 (0.0014) [2023-09-18 22:40:17,241][41393] Updated weights for policy 0, policy_version 5448 (0.0013) [2023-09-18 22:40:17,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10922.7, 300 sec: 10819.6). Total num frames: 4222976. Throughput: 0: 5477.7, 1: 5479.9. Samples: 2867238. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:40:17,922][40872] Avg episode reward: [(0, '2468.182'), (1, '1678.418')] [2023-09-18 22:40:22,921][40872] Fps is (10 sec: 10649.5, 60 sec: 11059.2, 300 sec: 10831.6). Total num frames: 4280320. Throughput: 0: 5471.3, 1: 5474.8. Samples: 2899970. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:40:22,922][40872] Avg episode reward: [(0, '2436.430'), (1, '1887.034')] [2023-09-18 22:40:24,767][41480] Updated weights for policy 1, policy_version 2880 (0.0011) [2023-09-18 22:40:24,767][41393] Updated weights for policy 0, policy_version 5528 (0.0010) [2023-09-18 22:40:27,921][40872] Fps is (10 sec: 11468.7, 60 sec: 10922.7, 300 sec: 10843.2). Total num frames: 4337664. Throughput: 0: 5468.5, 1: 5469.4. Samples: 2965448. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:40:27,922][40872] Avg episode reward: [(0, '2667.002'), (1, '2163.270')] [2023-09-18 22:40:27,933][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000005560_2846720.pth... [2023-09-18 22:40:27,933][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000002912_1490944.pth... [2023-09-18 22:40:27,940][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000005240_2682880.pth [2023-09-18 22:40:27,942][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000002592_1327104.pth [2023-09-18 22:40:31,988][41480] Updated weights for policy 1, policy_version 2960 (0.0012) [2023-09-18 22:40:31,989][41393] Updated weights for policy 0, policy_version 5608 (0.0014) [2023-09-18 22:40:32,921][40872] Fps is (10 sec: 11468.8, 60 sec: 11059.2, 300 sec: 10854.4). Total num frames: 4395008. Throughput: 0: 5498.7, 1: 5498.0. Samples: 3033592. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:40:32,922][40872] Avg episode reward: [(0, '2638.030'), (1, '2133.708')] [2023-09-18 22:40:37,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 10836.4). Total num frames: 4444160. Throughput: 0: 5517.3, 1: 5516.9. Samples: 3066710. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:40:37,922][40872] Avg episode reward: [(0, '2739.708'), (1, '2193.968')] [2023-09-18 22:40:39,598][41480] Updated weights for policy 1, policy_version 3040 (0.0013) [2023-09-18 22:40:39,598][41393] Updated weights for policy 0, policy_version 5688 (0.0015) [2023-09-18 22:40:42,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10922.6, 300 sec: 10847.3). Total num frames: 4501504. Throughput: 0: 5501.8, 1: 5499.9. Samples: 3132250. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:40:42,922][40872] Avg episode reward: [(0, '2566.632'), (1, '2507.954')] [2023-09-18 22:40:42,931][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000003072_1572864.pth... [2023-09-18 22:40:42,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000005720_2928640.pth... [2023-09-18 22:40:42,938][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000002752_1409024.pth [2023-09-18 22:40:42,940][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000005400_2764800.pth [2023-09-18 22:40:47,092][41480] Updated weights for policy 1, policy_version 3120 (0.0013) [2023-09-18 22:40:47,092][41393] Updated weights for policy 0, policy_version 5768 (0.0011) [2023-09-18 22:40:47,921][40872] Fps is (10 sec: 11469.1, 60 sec: 11059.2, 300 sec: 10857.9). Total num frames: 4558848. Throughput: 0: 5480.8, 1: 5478.1. Samples: 3196504. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:40:47,921][40872] Avg episode reward: [(0, '2585.931'), (1, '2708.584')] [2023-09-18 22:40:47,922][41360] Saving new best policy, reward=2708.584! [2023-09-18 22:40:52,921][40872] Fps is (10 sec: 10649.9, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 4608000. Throughput: 0: 5467.2, 1: 5468.1. Samples: 3227682. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:40:52,921][40872] Avg episode reward: [(0, '2621.369'), (1, '2953.706')] [2023-09-18 22:40:52,922][41360] Saving new best policy, reward=2953.706! [2023-09-18 22:40:54,919][41480] Updated weights for policy 1, policy_version 3200 (0.0013) [2023-09-18 22:40:54,920][41393] Updated weights for policy 0, policy_version 5848 (0.0014) [2023-09-18 22:40:57,921][40872] Fps is (10 sec: 9830.1, 60 sec: 10786.1, 300 sec: 10857.9). Total num frames: 4657152. Throughput: 0: 5442.3, 1: 5442.6. Samples: 3291304. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:40:57,922][40872] Avg episode reward: [(0, '2857.402'), (1, '2841.210')] [2023-09-18 22:40:57,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000005872_3006464.pth... [2023-09-18 22:40:57,931][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000003224_1650688.pth... [2023-09-18 22:40:57,936][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000005560_2846720.pth [2023-09-18 22:40:57,939][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000002912_1490944.pth [2023-09-18 22:41:02,564][41480] Updated weights for policy 1, policy_version 3280 (0.0014) [2023-09-18 22:41:02,564][41393] Updated weights for policy 0, policy_version 5928 (0.0011) [2023-09-18 22:41:02,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 4714496. Throughput: 0: 5435.3, 1: 5433.2. Samples: 3356320. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:41:02,921][40872] Avg episode reward: [(0, '2926.688'), (1, '2562.492')] [2023-09-18 22:41:02,922][41359] Saving new best policy, reward=2926.688! [2023-09-18 22:41:07,921][40872] Fps is (10 sec: 11469.2, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 4771840. Throughput: 0: 5430.4, 1: 5427.4. Samples: 3388566. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-18 22:41:07,921][40872] Avg episode reward: [(0, '2361.034'), (1, '2444.728')] [2023-09-18 22:41:09,652][41480] Updated weights for policy 1, policy_version 3360 (0.0010) [2023-09-18 22:41:09,653][41393] Updated weights for policy 0, policy_version 6008 (0.0014) [2023-09-18 22:41:12,921][40872] Fps is (10 sec: 11468.7, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 4829184. Throughput: 0: 5471.5, 1: 5469.2. Samples: 3457776. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:41:12,922][40872] Avg episode reward: [(0, '1642.486'), (1, '2529.224')] [2023-09-18 22:41:12,930][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000006040_3092480.pth... [2023-09-18 22:41:12,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000003392_1736704.pth... [2023-09-18 22:41:12,936][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000003072_1572864.pth [2023-09-18 22:41:12,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000005720_2928640.pth [2023-09-18 22:41:17,310][41480] Updated weights for policy 1, policy_version 3440 (0.0014) [2023-09-18 22:41:17,310][41393] Updated weights for policy 0, policy_version 6088 (0.0014) [2023-09-18 22:41:17,921][40872] Fps is (10 sec: 10649.3, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 4878336. Throughput: 0: 5431.6, 1: 5432.8. Samples: 3522490. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:41:17,922][40872] Avg episode reward: [(0, '1633.661'), (1, '2615.398')] [2023-09-18 22:41:22,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 4935680. Throughput: 0: 5425.3, 1: 5425.8. Samples: 3555008. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-18 22:41:22,922][40872] Avg episode reward: [(0, '2019.355'), (1, '2545.782')] [2023-09-18 22:41:25,054][41393] Updated weights for policy 0, policy_version 6168 (0.0014) [2023-09-18 22:41:25,055][41480] Updated weights for policy 1, policy_version 3520 (0.0013) [2023-09-18 22:41:27,924][40872] Fps is (10 sec: 11465.4, 60 sec: 10922.1, 300 sec: 10913.3). Total num frames: 4993024. Throughput: 0: 5414.6, 1: 5414.5. Samples: 3619592. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:41:27,925][40872] Avg episode reward: [(0, '2470.348'), (1, '2684.902')] [2023-09-18 22:41:27,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000003552_1818624.pth... [2023-09-18 22:41:27,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000006200_3174400.pth... [2023-09-18 22:41:27,936][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000003224_1650688.pth [2023-09-18 22:41:27,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000005872_3006464.pth [2023-09-18 22:41:32,279][41393] Updated weights for policy 0, policy_version 6248 (0.0014) [2023-09-18 22:41:32,279][41480] Updated weights for policy 1, policy_version 3600 (0.0014) [2023-09-18 22:41:32,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10786.1, 300 sec: 10913.4). Total num frames: 5042176. Throughput: 0: 5441.9, 1: 5444.8. Samples: 3686408. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:41:32,922][40872] Avg episode reward: [(0, '2583.366'), (1, '2745.068')] [2023-09-18 22:41:37,921][40872] Fps is (10 sec: 10653.0, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 5099520. Throughput: 0: 5457.3, 1: 5456.6. Samples: 3718810. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:41:37,921][40872] Avg episode reward: [(0, '2408.218'), (1, '2835.420')] [2023-09-18 22:41:39,666][41480] Updated weights for policy 1, policy_version 3680 (0.0014) [2023-09-18 22:41:39,666][41393] Updated weights for policy 0, policy_version 6328 (0.0013) [2023-09-18 22:41:42,921][40872] Fps is (10 sec: 11468.6, 60 sec: 10922.7, 300 sec: 10941.2). Total num frames: 5156864. Throughput: 0: 5498.5, 1: 5498.4. Samples: 3786164. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 22:41:42,922][40872] Avg episode reward: [(0, '2440.500'), (1, '2568.683')] [2023-09-18 22:41:42,933][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000006360_3256320.pth... [2023-09-18 22:41:42,933][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000003712_1900544.pth... [2023-09-18 22:41:42,941][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000006040_3092480.pth [2023-09-18 22:41:42,943][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000003392_1736704.pth [2023-09-18 22:41:47,040][41480] Updated weights for policy 1, policy_version 3760 (0.0015) [2023-09-18 22:41:47,041][41393] Updated weights for policy 0, policy_version 6408 (0.0015) [2023-09-18 22:41:47,921][40872] Fps is (10 sec: 11468.8, 60 sec: 10922.7, 300 sec: 10941.2). Total num frames: 5214208. Throughput: 0: 5515.9, 1: 5515.1. Samples: 3852712. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:41:47,921][40872] Avg episode reward: [(0, '2471.824'), (1, '2535.287')] [2023-09-18 22:41:52,921][40872] Fps is (10 sec: 11469.0, 60 sec: 11059.2, 300 sec: 10941.2). Total num frames: 5271552. Throughput: 0: 5551.2, 1: 5551.3. Samples: 3888184. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:41:52,922][40872] Avg episode reward: [(0, '2201.160'), (1, '2675.685')] [2023-09-18 22:41:54,380][41480] Updated weights for policy 1, policy_version 3840 (0.0012) [2023-09-18 22:41:54,382][41393] Updated weights for policy 0, policy_version 6488 (0.0013) [2023-09-18 22:41:57,921][40872] Fps is (10 sec: 10649.7, 60 sec: 11059.3, 300 sec: 10913.4). Total num frames: 5320704. Throughput: 0: 5487.0, 1: 5487.2. Samples: 3951612. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-18 22:41:57,921][40872] Avg episode reward: [(0, '2181.077'), (1, '2908.861')] [2023-09-18 22:41:57,928][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000003872_1982464.pth... [2023-09-18 22:41:57,928][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000006520_3338240.pth... [2023-09-18 22:41:57,934][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000003552_1818624.pth [2023-09-18 22:41:57,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000006200_3174400.pth [2023-09-18 22:42:02,151][41480] Updated weights for policy 1, policy_version 3920 (0.0012) [2023-09-18 22:42:02,151][41393] Updated weights for policy 0, policy_version 6568 (0.0014) [2023-09-18 22:42:02,921][40872] Fps is (10 sec: 10649.8, 60 sec: 11059.2, 300 sec: 10941.2). Total num frames: 5378048. Throughput: 0: 5471.2, 1: 5469.6. Samples: 4014824. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-18 22:42:02,921][40872] Avg episode reward: [(0, '2368.832'), (1, '3068.266')] [2023-09-18 22:42:02,922][41360] Saving new best policy, reward=3068.266! [2023-09-18 22:42:07,921][40872] Fps is (10 sec: 10649.4, 60 sec: 10922.6, 300 sec: 10913.4). Total num frames: 5427200. Throughput: 0: 5481.4, 1: 5480.3. Samples: 4048288. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:42:07,922][40872] Avg episode reward: [(0, '2743.666'), (1, '2722.478')] [2023-09-18 22:42:09,653][41480] Updated weights for policy 1, policy_version 4000 (0.0014) [2023-09-18 22:42:09,653][41393] Updated weights for policy 0, policy_version 6648 (0.0016) [2023-09-18 22:42:12,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 5484544. Throughput: 0: 5494.6, 1: 5494.5. Samples: 4114064. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:42:12,921][40872] Avg episode reward: [(0, '2600.984'), (1, '2058.699')] [2023-09-18 22:42:12,927][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000004032_2064384.pth... [2023-09-18 22:42:12,927][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000006680_3420160.pth... [2023-09-18 22:42:12,933][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000003712_1900544.pth [2023-09-18 22:42:12,933][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000006360_3256320.pth [2023-09-18 22:42:16,809][41480] Updated weights for policy 1, policy_version 4080 (0.0013) [2023-09-18 22:42:16,809][41393] Updated weights for policy 0, policy_version 6728 (0.0013) [2023-09-18 22:42:17,921][40872] Fps is (10 sec: 11468.8, 60 sec: 11059.2, 300 sec: 10913.4). Total num frames: 5541888. Throughput: 0: 5527.1, 1: 5524.3. Samples: 4183724. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:42:17,922][40872] Avg episode reward: [(0, '1846.889'), (1, '2105.475')] [2023-09-18 22:42:22,921][40872] Fps is (10 sec: 11468.6, 60 sec: 11059.2, 300 sec: 10941.2). Total num frames: 5599232. Throughput: 0: 5521.2, 1: 5519.6. Samples: 4215650. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:42:22,922][40872] Avg episode reward: [(0, '1873.627'), (1, '2545.365')] [2023-09-18 22:42:24,460][41393] Updated weights for policy 0, policy_version 6808 (0.0017) [2023-09-18 22:42:24,461][41480] Updated weights for policy 1, policy_version 4160 (0.0014) [2023-09-18 22:42:27,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10923.2, 300 sec: 10913.4). Total num frames: 5648384. Throughput: 0: 5485.0, 1: 5484.3. Samples: 4279780. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:42:27,922][40872] Avg episode reward: [(0, '2137.597'), (1, '2938.985')] [2023-09-18 22:42:27,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000006840_3502080.pth... [2023-09-18 22:42:27,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000004192_2146304.pth... [2023-09-18 22:42:27,940][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000006520_3338240.pth [2023-09-18 22:42:27,941][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000003872_1982464.pth [2023-09-18 22:42:31,758][41393] Updated weights for policy 0, policy_version 6888 (0.0014) [2023-09-18 22:42:31,758][41480] Updated weights for policy 1, policy_version 4240 (0.0014) [2023-09-18 22:42:32,921][40872] Fps is (10 sec: 10649.5, 60 sec: 11059.2, 300 sec: 10941.2). Total num frames: 5705728. Throughput: 0: 5497.3, 1: 5498.2. Samples: 4347514. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 22:42:32,922][40872] Avg episode reward: [(0, '2274.100'), (1, '3018.768')] [2023-09-18 22:42:37,921][40872] Fps is (10 sec: 11468.9, 60 sec: 11059.2, 300 sec: 10941.2). Total num frames: 5763072. Throughput: 0: 5466.1, 1: 5466.4. Samples: 4380148. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:42:37,922][40872] Avg episode reward: [(0, '2202.936'), (1, '2833.715')] [2023-09-18 22:42:38,713][41393] Updated weights for policy 0, policy_version 6968 (0.0013) [2023-09-18 22:42:38,713][41480] Updated weights for policy 1, policy_version 4320 (0.0013) [2023-09-18 22:42:42,921][40872] Fps is (10 sec: 11468.8, 60 sec: 11059.2, 300 sec: 10968.9). Total num frames: 5820416. Throughput: 0: 5541.7, 1: 5543.0. Samples: 4450426. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:42:42,922][40872] Avg episode reward: [(0, '1956.275'), (1, '2658.507')] [2023-09-18 22:42:42,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000004360_2232320.pth... [2023-09-18 22:42:42,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000007008_3588096.pth... [2023-09-18 22:42:42,939][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000004032_2064384.pth [2023-09-18 22:42:42,940][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000006680_3420160.pth [2023-09-18 22:42:46,063][41480] Updated weights for policy 1, policy_version 4400 (0.0012) [2023-09-18 22:42:46,063][41393] Updated weights for policy 0, policy_version 7048 (0.0009) [2023-09-18 22:42:47,921][40872] Fps is (10 sec: 11468.8, 60 sec: 11059.2, 300 sec: 10941.2). Total num frames: 5877760. Throughput: 0: 5593.7, 1: 5594.2. Samples: 4518278. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:42:47,922][40872] Avg episode reward: [(0, '1922.184'), (1, '2902.333')] [2023-09-18 22:42:52,921][40872] Fps is (10 sec: 11469.0, 60 sec: 11059.2, 300 sec: 10941.2). Total num frames: 5935104. Throughput: 0: 5589.8, 1: 5590.8. Samples: 4551416. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:42:52,922][40872] Avg episode reward: [(0, '2178.903'), (1, '2899.999')] [2023-09-18 22:42:53,623][41393] Updated weights for policy 0, policy_version 7128 (0.0018) [2023-09-18 22:42:53,624][41480] Updated weights for policy 1, policy_version 4480 (0.0015) [2023-09-18 22:42:57,921][40872] Fps is (10 sec: 10649.6, 60 sec: 11059.2, 300 sec: 10941.2). Total num frames: 5984256. Throughput: 0: 5586.2, 1: 5586.6. Samples: 4616840. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:42:57,921][40872] Avg episode reward: [(0, '2505.534'), (1, '3017.435')] [2023-09-18 22:42:57,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000004520_2314240.pth... [2023-09-18 22:42:57,930][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000007168_3670016.pth... [2023-09-18 22:42:57,933][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000004192_2146304.pth [2023-09-18 22:42:57,944][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000006840_3502080.pth [2023-09-18 22:43:01,204][41393] Updated weights for policy 0, policy_version 7208 (0.0014) [2023-09-18 22:43:01,204][41480] Updated weights for policy 1, policy_version 4560 (0.0013) [2023-09-18 22:43:02,921][40872] Fps is (10 sec: 10649.6, 60 sec: 11059.2, 300 sec: 10941.2). Total num frames: 6041600. Throughput: 0: 5519.9, 1: 5519.8. Samples: 4680512. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:43:02,921][40872] Avg episode reward: [(0, '2789.904'), (1, '2982.791')] [2023-09-18 22:43:07,921][40872] Fps is (10 sec: 10649.7, 60 sec: 11059.2, 300 sec: 10941.2). Total num frames: 6090752. Throughput: 0: 5516.2, 1: 5515.8. Samples: 4712086. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:43:07,921][40872] Avg episode reward: [(0, '2874.293'), (1, '3119.791')] [2023-09-18 22:43:07,922][41360] Saving new best policy, reward=3119.791! [2023-09-18 22:43:08,874][41480] Updated weights for policy 1, policy_version 4640 (0.0015) [2023-09-18 22:43:08,875][41393] Updated weights for policy 0, policy_version 7288 (0.0015) [2023-09-18 22:43:12,921][40872] Fps is (10 sec: 10649.5, 60 sec: 11059.2, 300 sec: 10941.2). Total num frames: 6148096. Throughput: 0: 5522.9, 1: 5523.3. Samples: 4776858. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:43:12,921][40872] Avg episode reward: [(0, '2846.200'), (1, '3045.995')] [2023-09-18 22:43:12,930][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000007328_3751936.pth... [2023-09-18 22:43:12,931][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000004680_2396160.pth... [2023-09-18 22:43:12,934][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000007008_3588096.pth [2023-09-18 22:43:12,937][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000004360_2232320.pth [2023-09-18 22:43:16,414][41393] Updated weights for policy 0, policy_version 7368 (0.0013) [2023-09-18 22:43:16,415][41480] Updated weights for policy 1, policy_version 4720 (0.0013) [2023-09-18 22:43:17,921][40872] Fps is (10 sec: 11468.6, 60 sec: 11059.2, 300 sec: 10968.9). Total num frames: 6205440. Throughput: 0: 5497.0, 1: 5496.0. Samples: 4842198. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 22:43:17,922][40872] Avg episode reward: [(0, '2776.560'), (1, '2944.696')] [2023-09-18 22:43:22,921][40872] Fps is (10 sec: 10649.8, 60 sec: 10922.7, 300 sec: 10941.2). Total num frames: 6254592. Throughput: 0: 5513.1, 1: 5512.6. Samples: 4876304. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 22:43:22,921][40872] Avg episode reward: [(0, '2800.661'), (1, '2982.160')] [2023-09-18 22:43:23,668][41480] Updated weights for policy 1, policy_version 4800 (0.0013) [2023-09-18 22:43:23,668][41393] Updated weights for policy 0, policy_version 7448 (0.0015) [2023-09-18 22:43:27,921][40872] Fps is (10 sec: 10649.8, 60 sec: 11059.2, 300 sec: 10969.0). Total num frames: 6311936. Throughput: 0: 5497.2, 1: 5496.1. Samples: 4945124. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:43:27,921][40872] Avg episode reward: [(0, '2834.272'), (1, '3178.187')] [2023-09-18 22:43:27,961][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000007496_3837952.pth... [2023-09-18 22:43:27,964][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000007168_3670016.pth [2023-09-18 22:43:27,965][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000004848_2482176.pth... [2023-09-18 22:43:27,970][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000004520_2314240.pth [2023-09-18 22:43:27,971][41360] Saving new best policy, reward=3178.187! [2023-09-18 22:43:30,902][41480] Updated weights for policy 1, policy_version 4880 (0.0013) [2023-09-18 22:43:30,902][41393] Updated weights for policy 0, policy_version 7528 (0.0015) [2023-09-18 22:43:32,921][40872] Fps is (10 sec: 11468.8, 60 sec: 11059.3, 300 sec: 10969.0). Total num frames: 6369280. Throughput: 0: 5492.8, 1: 5492.6. Samples: 5012620. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:43:32,921][40872] Avg episode reward: [(0, '2859.730'), (1, '3258.638')] [2023-09-18 22:43:32,922][41360] Saving new best policy, reward=3258.638! [2023-09-18 22:43:37,921][40872] Fps is (10 sec: 11468.7, 60 sec: 11059.2, 300 sec: 10996.7). Total num frames: 6426624. Throughput: 0: 5497.3, 1: 5499.2. Samples: 5046258. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:43:37,922][40872] Avg episode reward: [(0, '2862.275'), (1, '3234.669')] [2023-09-18 22:43:38,053][41480] Updated weights for policy 1, policy_version 4960 (0.0011) [2023-09-18 22:43:38,053][41393] Updated weights for policy 0, policy_version 7608 (0.0013) [2023-09-18 22:43:42,921][40872] Fps is (10 sec: 11468.5, 60 sec: 11059.2, 300 sec: 10996.7). Total num frames: 6483968. Throughput: 0: 5526.6, 1: 5527.0. Samples: 5114256. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:43:42,922][40872] Avg episode reward: [(0, '2930.412'), (1, '3078.371')] [2023-09-18 22:43:42,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000005008_2564096.pth... [2023-09-18 22:43:42,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000007656_3919872.pth... [2023-09-18 22:43:42,940][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000007328_3751936.pth [2023-09-18 22:43:42,940][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000004680_2396160.pth [2023-09-18 22:43:42,941][41359] Saving new best policy, reward=2930.412! [2023-09-18 22:43:45,434][41480] Updated weights for policy 1, policy_version 5040 (0.0013) [2023-09-18 22:43:45,436][41393] Updated weights for policy 0, policy_version 7688 (0.0015) [2023-09-18 22:43:47,921][40872] Fps is (10 sec: 11468.8, 60 sec: 11059.2, 300 sec: 11024.5). Total num frames: 6541312. Throughput: 0: 5590.6, 1: 5589.9. Samples: 5183632. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:43:47,921][40872] Avg episode reward: [(0, '3046.192'), (1, '2849.641')] [2023-09-18 22:43:47,922][41359] Saving new best policy, reward=3046.192! [2023-09-18 22:43:52,242][41393] Updated weights for policy 0, policy_version 7768 (0.0011) [2023-09-18 22:43:52,243][41480] Updated weights for policy 1, policy_version 5120 (0.0015) [2023-09-18 22:43:52,921][40872] Fps is (10 sec: 11468.9, 60 sec: 11059.2, 300 sec: 11024.5). Total num frames: 6598656. Throughput: 0: 5646.3, 1: 5647.1. Samples: 5220292. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:43:52,922][40872] Avg episode reward: [(0, '3142.227'), (1, '2641.232')] [2023-09-18 22:43:52,923][41359] Saving new best policy, reward=3142.227! [2023-09-18 22:43:57,921][40872] Fps is (10 sec: 11468.9, 60 sec: 11195.8, 300 sec: 11024.5). Total num frames: 6656000. Throughput: 0: 5665.4, 1: 5666.0. Samples: 5286768. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:43:57,921][40872] Avg episode reward: [(0, '3209.332'), (1, '2654.926')] [2023-09-18 22:43:57,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000007824_4005888.pth... [2023-09-18 22:43:57,931][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000005176_2650112.pth... [2023-09-18 22:43:57,936][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000007496_3837952.pth [2023-09-18 22:43:57,937][41359] Saving new best policy, reward=3209.332! [2023-09-18 22:43:57,937][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000004848_2482176.pth [2023-09-18 22:43:59,639][41393] Updated weights for policy 0, policy_version 7848 (0.0013) [2023-09-18 22:43:59,640][41480] Updated weights for policy 1, policy_version 5200 (0.0014) [2023-09-18 22:44:02,921][40872] Fps is (10 sec: 11468.8, 60 sec: 11195.7, 300 sec: 11052.3). Total num frames: 6713344. Throughput: 0: 5676.0, 1: 5677.0. Samples: 5353086. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:44:02,922][40872] Avg episode reward: [(0, '3115.109'), (1, '2675.011')] [2023-09-18 22:44:07,386][41393] Updated weights for policy 0, policy_version 7928 (0.0017) [2023-09-18 22:44:07,387][41480] Updated weights for policy 1, policy_version 5280 (0.0014) [2023-09-18 22:44:07,921][40872] Fps is (10 sec: 10649.4, 60 sec: 11195.7, 300 sec: 11024.5). Total num frames: 6762496. Throughput: 0: 5643.8, 1: 5644.0. Samples: 5384260. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:44:07,922][40872] Avg episode reward: [(0, '3041.329'), (1, '2498.417')] [2023-09-18 22:44:12,921][40872] Fps is (10 sec: 10649.7, 60 sec: 11195.7, 300 sec: 11024.5). Total num frames: 6819840. Throughput: 0: 5601.5, 1: 5602.3. Samples: 5449298. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:44:12,922][40872] Avg episode reward: [(0, '3058.697'), (1, '2394.689')] [2023-09-18 22:44:12,930][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000007984_4087808.pth... [2023-09-18 22:44:12,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000005336_2732032.pth... [2023-09-18 22:44:12,941][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000007656_3919872.pth [2023-09-18 22:44:12,941][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000005008_2564096.pth [2023-09-18 22:44:15,056][41393] Updated weights for policy 0, policy_version 8008 (0.0013) [2023-09-18 22:44:15,056][41480] Updated weights for policy 1, policy_version 5360 (0.0013) [2023-09-18 22:44:17,921][40872] Fps is (10 sec: 10649.6, 60 sec: 11059.2, 300 sec: 11024.5). Total num frames: 6868992. Throughput: 0: 5537.3, 1: 5537.6. Samples: 5510992. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:44:17,922][40872] Avg episode reward: [(0, '3113.313'), (1, '2401.285')] [2023-09-18 22:44:22,921][40872] Fps is (10 sec: 10240.0, 60 sec: 11127.4, 300 sec: 10982.8). Total num frames: 6922240. Throughput: 0: 5513.7, 1: 5510.7. Samples: 5542356. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:44:22,922][40872] Avg episode reward: [(0, '3042.860'), (1, '2668.501')] [2023-09-18 22:44:22,926][41393] Updated weights for policy 0, policy_version 8088 (0.0013) [2023-09-18 22:44:22,926][41480] Updated weights for policy 1, policy_version 5440 (0.0014) [2023-09-18 22:44:27,921][40872] Fps is (10 sec: 10649.5, 60 sec: 11059.2, 300 sec: 10996.7). Total num frames: 6975488. Throughput: 0: 5463.7, 1: 5463.6. Samples: 5605988. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:44:27,922][40872] Avg episode reward: [(0, '3121.554'), (1, '2890.114')] [2023-09-18 22:44:27,933][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000008136_4165632.pth... [2023-09-18 22:44:27,933][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000005488_2809856.pth... [2023-09-18 22:44:27,938][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000007824_4005888.pth [2023-09-18 22:44:27,943][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000005176_2650112.pth [2023-09-18 22:44:30,657][41393] Updated weights for policy 0, policy_version 8168 (0.0013) [2023-09-18 22:44:30,657][41480] Updated weights for policy 1, policy_version 5520 (0.0014) [2023-09-18 22:44:32,921][40872] Fps is (10 sec: 10240.0, 60 sec: 10922.6, 300 sec: 10969.0). Total num frames: 7024640. Throughput: 0: 5390.4, 1: 5394.0. Samples: 5668932. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:44:32,922][40872] Avg episode reward: [(0, '3134.206'), (1, '2948.135')] [2023-09-18 22:44:37,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.6, 300 sec: 10968.9). Total num frames: 7081984. Throughput: 0: 5347.7, 1: 5349.5. Samples: 5701666. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:44:37,922][40872] Avg episode reward: [(0, '3106.224'), (1, '2904.736')] [2023-09-18 22:44:38,230][41393] Updated weights for policy 0, policy_version 8248 (0.0014) [2023-09-18 22:44:38,231][41480] Updated weights for policy 1, policy_version 5600 (0.0015) [2023-09-18 22:44:42,921][40872] Fps is (10 sec: 11468.8, 60 sec: 10922.7, 300 sec: 10996.7). Total num frames: 7139328. Throughput: 0: 5335.3, 1: 5336.4. Samples: 5766996. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:44:42,922][40872] Avg episode reward: [(0, '716.480'), (1, '2572.673')] [2023-09-18 22:44:42,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000008296_4247552.pth... [2023-09-18 22:44:42,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000005648_2891776.pth... [2023-09-18 22:44:42,939][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000007984_4087808.pth [2023-09-18 22:44:42,941][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000005336_2732032.pth [2023-09-18 22:44:45,860][41393] Updated weights for policy 0, policy_version 8328 (0.0014) [2023-09-18 22:44:45,860][41480] Updated weights for policy 1, policy_version 5680 (0.0014) [2023-09-18 22:44:47,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10786.1, 300 sec: 10968.9). Total num frames: 7188480. Throughput: 0: 5319.7, 1: 5320.4. Samples: 5831892. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:44:47,922][40872] Avg episode reward: [(0, '806.920'), (1, '2563.044')] [2023-09-18 22:44:52,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10786.1, 300 sec: 10969.0). Total num frames: 7245824. Throughput: 0: 5334.4, 1: 5334.8. Samples: 5864374. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:44:52,922][40872] Avg episode reward: [(0, '1054.086'), (1, '1792.913')] [2023-09-18 22:44:53,560][41480] Updated weights for policy 1, policy_version 5760 (0.0013) [2023-09-18 22:44:53,561][41393] Updated weights for policy 0, policy_version 8408 (0.0011) [2023-09-18 22:44:57,921][40872] Fps is (10 sec: 10649.8, 60 sec: 10649.6, 300 sec: 10969.0). Total num frames: 7294976. Throughput: 0: 5314.6, 1: 5313.7. Samples: 5927570. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:44:57,922][40872] Avg episode reward: [(0, '1448.866'), (1, '1394.609')] [2023-09-18 22:44:57,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000005800_2969600.pth... [2023-09-18 22:44:57,930][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000008448_4325376.pth... [2023-09-18 22:44:57,938][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000005488_2809856.pth [2023-09-18 22:44:57,938][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000008136_4165632.pth [2023-09-18 22:45:00,880][41393] Updated weights for policy 0, policy_version 8488 (0.0013) [2023-09-18 22:45:00,881][41480] Updated weights for policy 1, policy_version 5840 (0.0014) [2023-09-18 22:45:02,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10649.6, 300 sec: 10969.0). Total num frames: 7352320. Throughput: 0: 5390.0, 1: 5390.1. Samples: 5996096. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:45:02,921][40872] Avg episode reward: [(0, '2113.743'), (1, '1900.562')] [2023-09-18 22:45:07,921][40872] Fps is (10 sec: 11468.6, 60 sec: 10786.1, 300 sec: 10968.9). Total num frames: 7409664. Throughput: 0: 5406.0, 1: 5407.5. Samples: 6028962. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:45:07,922][40872] Avg episode reward: [(0, '2612.902'), (1, '2437.445')] [2023-09-18 22:45:08,244][41393] Updated weights for policy 0, policy_version 8568 (0.0013) [2023-09-18 22:45:08,244][41480] Updated weights for policy 1, policy_version 5920 (0.0014) [2023-09-18 22:45:12,921][40872] Fps is (10 sec: 11468.7, 60 sec: 10786.1, 300 sec: 10996.7). Total num frames: 7467008. Throughput: 0: 5430.4, 1: 5433.0. Samples: 6094838. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:45:12,922][40872] Avg episode reward: [(0, '3088.338'), (1, '2617.120')] [2023-09-18 22:45:12,930][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000008616_4411392.pth... [2023-09-18 22:45:12,931][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000005968_3055616.pth... [2023-09-18 22:45:12,936][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000005648_2891776.pth [2023-09-18 22:45:12,936][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000008296_4247552.pth [2023-09-18 22:45:15,794][41480] Updated weights for policy 1, policy_version 6000 (0.0012) [2023-09-18 22:45:15,794][41393] Updated weights for policy 0, policy_version 8648 (0.0014) [2023-09-18 22:45:17,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10786.1, 300 sec: 10968.9). Total num frames: 7516160. Throughput: 0: 5460.2, 1: 5459.5. Samples: 6160320. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-18 22:45:17,922][40872] Avg episode reward: [(0, '2876.057'), (1, '2736.360')] [2023-09-18 22:45:22,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10854.4, 300 sec: 10969.0). Total num frames: 7573504. Throughput: 0: 5455.2, 1: 5453.7. Samples: 6192566. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-18 22:45:22,921][40872] Avg episode reward: [(0, '2587.919'), (1, '2935.082')] [2023-09-18 22:45:23,502][41480] Updated weights for policy 1, policy_version 6080 (0.0013) [2023-09-18 22:45:23,503][41393] Updated weights for policy 0, policy_version 8728 (0.0014) [2023-09-18 22:45:27,921][40872] Fps is (10 sec: 11468.8, 60 sec: 10922.7, 300 sec: 10968.9). Total num frames: 7630848. Throughput: 0: 5439.9, 1: 5438.0. Samples: 6256504. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-18 22:45:27,922][40872] Avg episode reward: [(0, '2423.456'), (1, '3133.004')] [2023-09-18 22:45:27,934][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000008776_4493312.pth... [2023-09-18 22:45:27,934][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000006128_3137536.pth... [2023-09-18 22:45:27,939][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000008448_4325376.pth [2023-09-18 22:45:27,940][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000005800_2969600.pth [2023-09-18 22:45:30,790][41480] Updated weights for policy 1, policy_version 6160 (0.0011) [2023-09-18 22:45:30,792][41393] Updated weights for policy 0, policy_version 8808 (0.0012) [2023-09-18 22:45:32,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10969.0). Total num frames: 7680000. Throughput: 0: 5468.4, 1: 5469.3. Samples: 6324086. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:45:32,921][40872] Avg episode reward: [(0, '2356.081'), (1, '3171.192')] [2023-09-18 22:45:37,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10968.9). Total num frames: 7737344. Throughput: 0: 5462.4, 1: 5462.6. Samples: 6356002. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:45:37,922][40872] Avg episode reward: [(0, '2357.032'), (1, '3186.202')] [2023-09-18 22:45:38,658][41393] Updated weights for policy 0, policy_version 8888 (0.0015) [2023-09-18 22:45:38,658][41480] Updated weights for policy 1, policy_version 6240 (0.0010) [2023-09-18 22:45:42,921][40872] Fps is (10 sec: 10649.3, 60 sec: 10786.1, 300 sec: 10941.2). Total num frames: 7786496. Throughput: 0: 5456.7, 1: 5457.1. Samples: 6418696. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:45:42,922][40872] Avg episode reward: [(0, '2510.499'), (1, '3113.260')] [2023-09-18 22:45:42,933][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000006280_3215360.pth... [2023-09-18 22:45:42,933][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000008928_4571136.pth... [2023-09-18 22:45:42,938][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000005968_3055616.pth [2023-09-18 22:45:42,947][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000008616_4411392.pth [2023-09-18 22:45:46,159][41480] Updated weights for policy 1, policy_version 6320 (0.0013) [2023-09-18 22:45:46,160][41393] Updated weights for policy 0, policy_version 8968 (0.0012) [2023-09-18 22:45:47,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 10968.9). Total num frames: 7843840. Throughput: 0: 5417.8, 1: 5416.9. Samples: 6483660. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:45:47,922][40872] Avg episode reward: [(0, '2583.367'), (1, '3069.962')] [2023-09-18 22:45:52,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10786.1, 300 sec: 10968.9). Total num frames: 7892992. Throughput: 0: 5413.5, 1: 5412.0. Samples: 6516114. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:45:52,922][40872] Avg episode reward: [(0, '2658.274'), (1, '2947.969')] [2023-09-18 22:45:53,775][41480] Updated weights for policy 1, policy_version 6400 (0.0011) [2023-09-18 22:45:53,775][41393] Updated weights for policy 0, policy_version 9048 (0.0013) [2023-09-18 22:45:57,921][40872] Fps is (10 sec: 10649.8, 60 sec: 10922.7, 300 sec: 10969.0). Total num frames: 7950336. Throughput: 0: 5385.1, 1: 5383.3. Samples: 6579412. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:45:57,921][40872] Avg episode reward: [(0, '2608.114'), (1, '2921.159')] [2023-09-18 22:45:57,928][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000009088_4653056.pth... [2023-09-18 22:45:57,928][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000006440_3297280.pth... [2023-09-18 22:45:57,934][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000008776_4493312.pth [2023-09-18 22:45:57,934][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000006128_3137536.pth [2023-09-18 22:46:01,280][41393] Updated weights for policy 0, policy_version 9128 (0.0013) [2023-09-18 22:46:01,280][41480] Updated weights for policy 1, policy_version 6480 (0.0015) [2023-09-18 22:46:02,921][40872] Fps is (10 sec: 11469.1, 60 sec: 10922.7, 300 sec: 10968.9). Total num frames: 8007680. Throughput: 0: 5407.8, 1: 5405.4. Samples: 6646910. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:46:02,921][40872] Avg episode reward: [(0, '2604.482'), (1, '2959.775')] [2023-09-18 22:46:07,921][40872] Fps is (10 sec: 11468.6, 60 sec: 10922.7, 300 sec: 10968.9). Total num frames: 8065024. Throughput: 0: 5448.2, 1: 5446.8. Samples: 6682842. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:46:07,922][40872] Avg episode reward: [(0, '2589.406'), (1, '2857.291')] [2023-09-18 22:46:08,608][41393] Updated weights for policy 0, policy_version 9208 (0.0013) [2023-09-18 22:46:08,608][41480] Updated weights for policy 1, policy_version 6560 (0.0014) [2023-09-18 22:46:12,921][40872] Fps is (10 sec: 10649.1, 60 sec: 10786.1, 300 sec: 10968.9). Total num frames: 8114176. Throughput: 0: 5426.9, 1: 5427.7. Samples: 6744960. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:46:12,923][40872] Avg episode reward: [(0, '2734.510'), (1, '2756.938')] [2023-09-18 22:46:12,930][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000009248_4734976.pth... [2023-09-18 22:46:12,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000006600_3379200.pth... [2023-09-18 22:46:12,933][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000008928_4571136.pth [2023-09-18 22:46:12,933][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000006280_3215360.pth [2023-09-18 22:46:16,435][41393] Updated weights for policy 0, policy_version 9288 (0.0014) [2023-09-18 22:46:16,435][41480] Updated weights for policy 1, policy_version 6640 (0.0014) [2023-09-18 22:46:17,921][40872] Fps is (10 sec: 10649.8, 60 sec: 10922.7, 300 sec: 10968.9). Total num frames: 8171520. Throughput: 0: 5386.4, 1: 5384.6. Samples: 6808782. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:46:17,921][40872] Avg episode reward: [(0, '2575.389'), (1, '2682.312')] [2023-09-18 22:46:22,921][40872] Fps is (10 sec: 10650.0, 60 sec: 10786.1, 300 sec: 10941.3). Total num frames: 8220672. Throughput: 0: 5393.2, 1: 5392.7. Samples: 6841364. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:46:22,922][40872] Avg episode reward: [(0, '2640.387'), (1, '2620.936')] [2023-09-18 22:46:23,979][41393] Updated weights for policy 0, policy_version 9368 (0.0011) [2023-09-18 22:46:23,980][41480] Updated weights for policy 1, policy_version 6720 (0.0010) [2023-09-18 22:46:27,921][40872] Fps is (10 sec: 10649.4, 60 sec: 10786.1, 300 sec: 10968.9). Total num frames: 8278016. Throughput: 0: 5411.5, 1: 5414.4. Samples: 6905860. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:46:27,922][40872] Avg episode reward: [(0, '2625.483'), (1, '2425.008')] [2023-09-18 22:46:27,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000009408_4816896.pth... [2023-09-18 22:46:27,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000006760_3461120.pth... [2023-09-18 22:46:27,938][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000006440_3297280.pth [2023-09-18 22:46:27,939][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000009088_4653056.pth [2023-09-18 22:46:31,549][41480] Updated weights for policy 1, policy_version 6800 (0.0014) [2023-09-18 22:46:31,549][41393] Updated weights for policy 0, policy_version 9448 (0.0014) [2023-09-18 22:46:32,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10786.1, 300 sec: 10941.2). Total num frames: 8327168. Throughput: 0: 5416.4, 1: 5416.7. Samples: 6971152. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:46:32,922][40872] Avg episode reward: [(0, '2735.129'), (1, '2062.939')] [2023-09-18 22:46:37,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10786.1, 300 sec: 10941.2). Total num frames: 8384512. Throughput: 0: 5419.7, 1: 5421.1. Samples: 7003952. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:46:37,922][40872] Avg episode reward: [(0, '2585.870'), (1, '1953.544')] [2023-09-18 22:46:38,931][41480] Updated weights for policy 1, policy_version 6880 (0.0013) [2023-09-18 22:46:38,931][41393] Updated weights for policy 0, policy_version 9528 (0.0015) [2023-09-18 22:46:42,921][40872] Fps is (10 sec: 11468.8, 60 sec: 10922.7, 300 sec: 10941.2). Total num frames: 8441856. Throughput: 0: 5447.2, 1: 5448.7. Samples: 7069730. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:46:42,922][40872] Avg episode reward: [(0, '2631.245'), (1, '1853.380')] [2023-09-18 22:46:42,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000009568_4898816.pth... [2023-09-18 22:46:42,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000006920_3543040.pth... [2023-09-18 22:46:42,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000009248_4734976.pth [2023-09-18 22:46:42,938][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000006600_3379200.pth [2023-09-18 22:46:46,679][41480] Updated weights for policy 1, policy_version 6960 (0.0010) [2023-09-18 22:46:46,679][41393] Updated weights for policy 0, policy_version 9608 (0.0012) [2023-09-18 22:46:47,921][40872] Fps is (10 sec: 10649.8, 60 sec: 10786.2, 300 sec: 10913.4). Total num frames: 8491008. Throughput: 0: 5415.3, 1: 5416.6. Samples: 7134346. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:46:47,921][40872] Avg episode reward: [(0, '2663.611'), (1, '1874.257')] [2023-09-18 22:46:52,921][40872] Fps is (10 sec: 10649.8, 60 sec: 10922.7, 300 sec: 10941.2). Total num frames: 8548352. Throughput: 0: 5376.2, 1: 5377.2. Samples: 7166740. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 22:46:52,921][40872] Avg episode reward: [(0, '2720.354'), (1, '2044.373')] [2023-09-18 22:46:54,304][41393] Updated weights for policy 0, policy_version 9688 (0.0014) [2023-09-18 22:46:54,304][41480] Updated weights for policy 1, policy_version 7040 (0.0012) [2023-09-18 22:46:57,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10786.1, 300 sec: 10913.4). Total num frames: 8597504. Throughput: 0: 5385.0, 1: 5384.8. Samples: 7229598. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:46:57,921][40872] Avg episode reward: [(0, '2719.376'), (1, '2084.310')] [2023-09-18 22:46:57,929][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000009720_4976640.pth... [2023-09-18 22:46:57,929][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000007072_3620864.pth... [2023-09-18 22:46:57,937][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000006760_3461120.pth [2023-09-18 22:46:57,938][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000009408_4816896.pth [2023-09-18 22:47:02,018][41480] Updated weights for policy 1, policy_version 7120 (0.0013) [2023-09-18 22:47:02,019][41393] Updated weights for policy 0, policy_version 9768 (0.0015) [2023-09-18 22:47:02,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10786.1, 300 sec: 10941.2). Total num frames: 8654848. Throughput: 0: 5384.5, 1: 5384.9. Samples: 7293406. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:47:02,922][40872] Avg episode reward: [(0, '2631.352'), (1, '2157.675')] [2023-09-18 22:47:07,921][40872] Fps is (10 sec: 10649.2, 60 sec: 10649.6, 300 sec: 10913.4). Total num frames: 8704000. Throughput: 0: 5376.8, 1: 5376.5. Samples: 7325266. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:47:07,922][40872] Avg episode reward: [(0, '2831.336'), (1, '2331.148')] [2023-09-18 22:47:09,331][41393] Updated weights for policy 0, policy_version 9848 (0.0016) [2023-09-18 22:47:09,331][41480] Updated weights for policy 1, policy_version 7200 (0.0016) [2023-09-18 22:47:12,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10786.2, 300 sec: 10913.4). Total num frames: 8761344. Throughput: 0: 5418.8, 1: 5416.0. Samples: 7393428. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:47:12,922][40872] Avg episode reward: [(0, '2836.701'), (1, '2591.747')] [2023-09-18 22:47:12,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000009880_5058560.pth... [2023-09-18 22:47:12,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000007232_3702784.pth... [2023-09-18 22:47:12,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000009568_4898816.pth [2023-09-18 22:47:12,940][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000006920_3543040.pth [2023-09-18 22:47:16,873][41480] Updated weights for policy 1, policy_version 7280 (0.0012) [2023-09-18 22:47:16,873][41393] Updated weights for policy 0, policy_version 9928 (0.0012) [2023-09-18 22:47:17,921][40872] Fps is (10 sec: 11469.0, 60 sec: 10786.1, 300 sec: 10913.4). Total num frames: 8818688. Throughput: 0: 5416.2, 1: 5416.3. Samples: 7458616. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:47:17,922][40872] Avg episode reward: [(0, '3075.206'), (1, '2754.871')] [2023-09-18 22:47:22,921][40872] Fps is (10 sec: 11468.9, 60 sec: 10922.7, 300 sec: 10941.2). Total num frames: 8876032. Throughput: 0: 5430.9, 1: 5430.2. Samples: 7492702. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:47:22,922][40872] Avg episode reward: [(0, '3088.355'), (1, '2777.299')] [2023-09-18 22:47:24,435][41393] Updated weights for policy 0, policy_version 10008 (0.0009) [2023-09-18 22:47:24,435][41480] Updated weights for policy 1, policy_version 7360 (0.0013) [2023-09-18 22:47:27,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10786.1, 300 sec: 10913.4). Total num frames: 8925184. Throughput: 0: 5391.8, 1: 5389.6. Samples: 7554892. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:47:27,922][40872] Avg episode reward: [(0, '3098.479'), (1, '2751.549')] [2023-09-18 22:47:27,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000010040_5140480.pth... [2023-09-18 22:47:27,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000007392_3784704.pth... [2023-09-18 22:47:27,940][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000009720_4976640.pth [2023-09-18 22:47:27,941][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000007072_3620864.pth [2023-09-18 22:47:32,103][41393] Updated weights for policy 0, policy_version 10088 (0.0013) [2023-09-18 22:47:32,103][41480] Updated weights for policy 1, policy_version 7440 (0.0010) [2023-09-18 22:47:32,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 8982528. Throughput: 0: 5397.0, 1: 5395.9. Samples: 7620030. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:47:32,922][40872] Avg episode reward: [(0, '3086.646'), (1, '2810.879')] [2023-09-18 22:47:37,921][40872] Fps is (10 sec: 10649.8, 60 sec: 10786.2, 300 sec: 10885.6). Total num frames: 9031680. Throughput: 0: 5403.8, 1: 5403.5. Samples: 7653070. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:47:37,921][40872] Avg episode reward: [(0, '3196.623'), (1, '2840.124')] [2023-09-18 22:47:39,475][41393] Updated weights for policy 0, policy_version 10168 (0.0015) [2023-09-18 22:47:39,475][41480] Updated weights for policy 1, policy_version 7520 (0.0015) [2023-09-18 22:47:42,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10786.1, 300 sec: 10885.6). Total num frames: 9089024. Throughput: 0: 5444.1, 1: 5444.7. Samples: 7719596. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:47:42,921][40872] Avg episode reward: [(0, '3124.677'), (1, '2531.031')] [2023-09-18 22:47:42,930][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000010200_5222400.pth... [2023-09-18 22:47:42,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000007552_3866624.pth... [2023-09-18 22:47:42,938][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000009880_5058560.pth [2023-09-18 22:47:42,940][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000007232_3702784.pth [2023-09-18 22:47:47,049][41480] Updated weights for policy 1, policy_version 7600 (0.0013) [2023-09-18 22:47:47,049][41393] Updated weights for policy 0, policy_version 10248 (0.0010) [2023-09-18 22:47:47,921][40872] Fps is (10 sec: 11468.7, 60 sec: 10922.6, 300 sec: 10885.6). Total num frames: 9146368. Throughput: 0: 5459.2, 1: 5458.1. Samples: 7784686. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:47:47,922][40872] Avg episode reward: [(0, '3096.271'), (1, '2304.752')] [2023-09-18 22:47:52,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10786.1, 300 sec: 10885.6). Total num frames: 9195520. Throughput: 0: 5486.5, 1: 5486.2. Samples: 7819036. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:47:52,922][40872] Avg episode reward: [(0, '3010.241'), (1, '2433.923')] [2023-09-18 22:47:54,527][41480] Updated weights for policy 1, policy_version 7680 (0.0011) [2023-09-18 22:47:54,528][41393] Updated weights for policy 0, policy_version 10328 (0.0014) [2023-09-18 22:47:57,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.6, 300 sec: 10885.6). Total num frames: 9252864. Throughput: 0: 5436.9, 1: 5436.7. Samples: 7882738. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:47:57,921][40872] Avg episode reward: [(0, '2955.623'), (1, '2640.497')] [2023-09-18 22:47:57,929][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000007712_3948544.pth... [2023-09-18 22:47:57,929][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000010360_5304320.pth... [2023-09-18 22:47:57,935][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000007392_3784704.pth [2023-09-18 22:47:57,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000010040_5140480.pth [2023-09-18 22:48:01,864][41393] Updated weights for policy 0, policy_version 10408 (0.0013) [2023-09-18 22:48:01,866][41480] Updated weights for policy 1, policy_version 7760 (0.0015) [2023-09-18 22:48:02,921][40872] Fps is (10 sec: 11468.9, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 9310208. Throughput: 0: 5466.5, 1: 5466.1. Samples: 7950580. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:48:02,921][40872] Avg episode reward: [(0, '3105.327'), (1, '2878.249')] [2023-09-18 22:48:07,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 9359360. Throughput: 0: 5437.6, 1: 5438.2. Samples: 7982112. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:48:07,922][40872] Avg episode reward: [(0, '3199.333'), (1, '2851.357')] [2023-09-18 22:48:09,471][41480] Updated weights for policy 1, policy_version 7840 (0.0014) [2023-09-18 22:48:09,472][41393] Updated weights for policy 0, policy_version 10488 (0.0012) [2023-09-18 22:48:12,921][40872] Fps is (10 sec: 10649.4, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 9416704. Throughput: 0: 5479.2, 1: 5479.6. Samples: 8048036. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:48:12,922][40872] Avg episode reward: [(0, '3187.784'), (1, '2919.440')] [2023-09-18 22:48:12,933][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000010520_5386240.pth... [2023-09-18 22:48:12,934][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000007872_4030464.pth... [2023-09-18 22:48:12,940][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000007552_3866624.pth [2023-09-18 22:48:12,945][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000010200_5222400.pth [2023-09-18 22:48:16,916][41480] Updated weights for policy 1, policy_version 7920 (0.0017) [2023-09-18 22:48:16,916][41393] Updated weights for policy 0, policy_version 10568 (0.0014) [2023-09-18 22:48:17,921][40872] Fps is (10 sec: 11468.8, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 9474048. Throughput: 0: 5486.2, 1: 5486.3. Samples: 8113792. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:48:17,922][40872] Avg episode reward: [(0, '3194.399'), (1, '2901.637')] [2023-09-18 22:48:22,921][40872] Fps is (10 sec: 11468.9, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 9531392. Throughput: 0: 5487.7, 1: 5487.8. Samples: 8146970. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:48:22,922][40872] Avg episode reward: [(0, '3227.930'), (1, '2888.174')] [2023-09-18 22:48:22,923][41359] Saving new best policy, reward=3227.930! [2023-09-18 22:48:24,388][41480] Updated weights for policy 1, policy_version 8000 (0.0012) [2023-09-18 22:48:24,389][41393] Updated weights for policy 0, policy_version 10648 (0.0010) [2023-09-18 22:48:27,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 9580544. Throughput: 0: 5467.9, 1: 5467.4. Samples: 8211682. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 22:48:27,922][40872] Avg episode reward: [(0, '3127.391'), (1, '2932.574')] [2023-09-18 22:48:27,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000008032_4112384.pth... [2023-09-18 22:48:27,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000010680_5468160.pth... [2023-09-18 22:48:27,939][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000007712_3948544.pth [2023-09-18 22:48:27,941][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000010360_5304320.pth [2023-09-18 22:48:31,565][41393] Updated weights for policy 0, policy_version 10728 (0.0015) [2023-09-18 22:48:31,565][41480] Updated weights for policy 1, policy_version 8080 (0.0015) [2023-09-18 22:48:32,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 9637888. Throughput: 0: 5524.2, 1: 5527.2. Samples: 8282000. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 22:48:32,922][40872] Avg episode reward: [(0, '3218.059'), (1, '2979.867')] [2023-09-18 22:48:37,921][40872] Fps is (10 sec: 11468.8, 60 sec: 11059.2, 300 sec: 10885.6). Total num frames: 9695232. Throughput: 0: 5507.6, 1: 5511.0. Samples: 8314874. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:48:37,922][40872] Avg episode reward: [(0, '3231.572'), (1, '3201.791')] [2023-09-18 22:48:37,923][41359] Saving new best policy, reward=3231.572! [2023-09-18 22:48:39,056][41480] Updated weights for policy 1, policy_version 8160 (0.0014) [2023-09-18 22:48:39,057][41393] Updated weights for policy 0, policy_version 10808 (0.0012) [2023-09-18 22:48:42,921][40872] Fps is (10 sec: 11468.7, 60 sec: 11059.2, 300 sec: 10885.6). Total num frames: 9752576. Throughput: 0: 5534.6, 1: 5534.8. Samples: 8380862. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:48:42,922][40872] Avg episode reward: [(0, '3151.988'), (1, '3244.049')] [2023-09-18 22:48:42,934][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000008200_4198400.pth... [2023-09-18 22:48:42,934][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000010848_5554176.pth... [2023-09-18 22:48:42,941][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000007872_4030464.pth [2023-09-18 22:48:42,943][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000010520_5386240.pth [2023-09-18 22:48:46,473][41480] Updated weights for policy 1, policy_version 8240 (0.0010) [2023-09-18 22:48:46,473][41393] Updated weights for policy 0, policy_version 10888 (0.0012) [2023-09-18 22:48:47,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 10857.9). Total num frames: 9801728. Throughput: 0: 5503.2, 1: 5505.8. Samples: 8445986. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:48:47,922][40872] Avg episode reward: [(0, '3225.104'), (1, '3160.829')] [2023-09-18 22:48:52,921][40872] Fps is (10 sec: 10649.9, 60 sec: 11059.2, 300 sec: 10857.9). Total num frames: 9859072. Throughput: 0: 5525.0, 1: 5525.0. Samples: 8479360. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:48:52,921][40872] Avg episode reward: [(0, '3170.124'), (1, '2996.180')] [2023-09-18 22:48:54,048][41480] Updated weights for policy 1, policy_version 8320 (0.0014) [2023-09-18 22:48:54,048][41393] Updated weights for policy 0, policy_version 10968 (0.0014) [2023-09-18 22:48:57,921][40872] Fps is (10 sec: 11468.7, 60 sec: 11059.2, 300 sec: 10857.9). Total num frames: 9916416. Throughput: 0: 5512.3, 1: 5514.8. Samples: 8544256. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:48:57,922][40872] Avg episode reward: [(0, '3278.076'), (1, '2975.501')] [2023-09-18 22:48:57,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000011008_5636096.pth... [2023-09-18 22:48:57,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000008360_4280320.pth... [2023-09-18 22:48:57,938][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000010680_5468160.pth [2023-09-18 22:48:57,939][41359] Saving new best policy, reward=3278.076! [2023-09-18 22:48:57,940][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000008032_4112384.pth [2023-09-18 22:49:01,667][41393] Updated weights for policy 0, policy_version 11048 (0.0012) [2023-09-18 22:49:01,667][41480] Updated weights for policy 1, policy_version 8400 (0.0014) [2023-09-18 22:49:02,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10857.9). Total num frames: 9965568. Throughput: 0: 5507.2, 1: 5509.0. Samples: 8609518. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:49:02,921][40872] Avg episode reward: [(0, '3124.721'), (1, '2926.692')] [2023-09-18 22:49:07,921][40872] Fps is (10 sec: 10649.8, 60 sec: 11059.2, 300 sec: 10857.9). Total num frames: 10022912. Throughput: 0: 5493.4, 1: 5493.4. Samples: 8641376. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:49:07,921][40872] Avg episode reward: [(0, '3108.962'), (1, '3013.683')] [2023-09-18 22:49:08,998][41393] Updated weights for policy 0, policy_version 11128 (0.0015) [2023-09-18 22:49:08,999][41480] Updated weights for policy 1, policy_version 8480 (0.0013) [2023-09-18 22:49:12,921][40872] Fps is (10 sec: 11468.5, 60 sec: 11059.2, 300 sec: 10885.6). Total num frames: 10080256. Throughput: 0: 5514.4, 1: 5517.0. Samples: 8708098. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-18 22:49:12,922][40872] Avg episode reward: [(0, '3014.229'), (1, '3003.948')] [2023-09-18 22:49:12,933][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000008520_4362240.pth... [2023-09-18 22:49:12,933][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000011168_5718016.pth... [2023-09-18 22:49:12,941][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000008200_4198400.pth [2023-09-18 22:49:12,941][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000010848_5554176.pth [2023-09-18 22:49:16,432][41393] Updated weights for policy 0, policy_version 11208 (0.0015) [2023-09-18 22:49:16,433][41480] Updated weights for policy 1, policy_version 8560 (0.0015) [2023-09-18 22:49:17,921][40872] Fps is (10 sec: 11468.7, 60 sec: 11059.2, 300 sec: 10899.5). Total num frames: 10137600. Throughput: 0: 5464.5, 1: 5463.7. Samples: 8773768. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-18 22:49:17,922][40872] Avg episode reward: [(0, '3147.048'), (1, '2940.770')] [2023-09-18 22:49:22,921][40872] Fps is (10 sec: 11469.0, 60 sec: 11059.2, 300 sec: 10913.4). Total num frames: 10194944. Throughput: 0: 5492.7, 1: 5489.7. Samples: 8809082. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:49:22,922][40872] Avg episode reward: [(0, '3171.863'), (1, '2872.447')] [2023-09-18 22:49:23,670][41393] Updated weights for policy 0, policy_version 11288 (0.0015) [2023-09-18 22:49:23,670][41480] Updated weights for policy 1, policy_version 8640 (0.0014) [2023-09-18 22:49:27,921][40872] Fps is (10 sec: 10649.5, 60 sec: 11059.2, 300 sec: 10913.4). Total num frames: 10244096. Throughput: 0: 5488.1, 1: 5488.1. Samples: 8874790. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:49:27,922][40872] Avg episode reward: [(0, '3116.347'), (1, '3011.351')] [2023-09-18 22:49:27,933][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000011328_5799936.pth... [2023-09-18 22:49:27,933][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000008680_4444160.pth... [2023-09-18 22:49:27,941][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000011008_5636096.pth [2023-09-18 22:49:27,941][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000008360_4280320.pth [2023-09-18 22:49:31,241][41480] Updated weights for policy 1, policy_version 8720 (0.0014) [2023-09-18 22:49:31,241][41393] Updated weights for policy 0, policy_version 11368 (0.0013) [2023-09-18 22:49:32,921][40872] Fps is (10 sec: 10649.6, 60 sec: 11059.2, 300 sec: 10913.4). Total num frames: 10301440. Throughput: 0: 5490.7, 1: 5488.3. Samples: 8940040. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:49:32,922][40872] Avg episode reward: [(0, '3059.124'), (1, '3240.467')] [2023-09-18 22:49:37,921][40872] Fps is (10 sec: 11469.1, 60 sec: 11059.2, 300 sec: 10913.4). Total num frames: 10358784. Throughput: 0: 5475.5, 1: 5474.7. Samples: 8972120. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:49:37,921][40872] Avg episode reward: [(0, '2900.771'), (1, '3354.390')] [2023-09-18 22:49:37,922][41360] Saving new best policy, reward=3354.390! [2023-09-18 22:49:38,519][41393] Updated weights for policy 0, policy_version 11448 (0.0013) [2023-09-18 22:49:38,519][41480] Updated weights for policy 1, policy_version 8800 (0.0015) [2023-09-18 22:49:42,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 10407936. Throughput: 0: 5526.0, 1: 5523.0. Samples: 9041460. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:49:42,922][40872] Avg episode reward: [(0, '3046.425'), (1, '3383.517')] [2023-09-18 22:49:42,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000008840_4526080.pth... [2023-09-18 22:49:42,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000011488_5881856.pth... [2023-09-18 22:49:42,937][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000008520_4362240.pth [2023-09-18 22:49:42,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000011168_5718016.pth [2023-09-18 22:49:42,938][41360] Saving new best policy, reward=3383.517! [2023-09-18 22:49:46,056][41480] Updated weights for policy 1, policy_version 8880 (0.0012) [2023-09-18 22:49:46,056][41393] Updated weights for policy 0, policy_version 11528 (0.0011) [2023-09-18 22:49:47,921][40872] Fps is (10 sec: 10649.7, 60 sec: 11059.2, 300 sec: 10913.4). Total num frames: 10465280. Throughput: 0: 5516.1, 1: 5515.0. Samples: 9105916. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-18 22:49:47,921][40872] Avg episode reward: [(0, '3024.770'), (1, '3321.496')] [2023-09-18 22:49:52,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.6, 300 sec: 10913.4). Total num frames: 10514432. Throughput: 0: 5506.5, 1: 5506.9. Samples: 9136982. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-18 22:49:52,922][40872] Avg episode reward: [(0, '3120.468'), (1, '3300.138')] [2023-09-18 22:49:53,759][41480] Updated weights for policy 1, policy_version 8960 (0.0013) [2023-09-18 22:49:53,760][41393] Updated weights for policy 0, policy_version 11608 (0.0011) [2023-09-18 22:49:57,921][40872] Fps is (10 sec: 10649.3, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 10571776. Throughput: 0: 5483.2, 1: 5480.8. Samples: 9201476. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:49:57,922][40872] Avg episode reward: [(0, '3151.302'), (1, '3214.313')] [2023-09-18 22:49:57,934][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000011648_5963776.pth... [2023-09-18 22:49:57,934][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000009000_4608000.pth... [2023-09-18 22:49:57,942][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000008680_4444160.pth [2023-09-18 22:49:57,946][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000011328_5799936.pth [2023-09-18 22:50:01,320][41393] Updated weights for policy 0, policy_version 11688 (0.0012) [2023-09-18 22:50:01,320][41480] Updated weights for policy 1, policy_version 9040 (0.0013) [2023-09-18 22:50:02,921][40872] Fps is (10 sec: 11468.8, 60 sec: 11059.2, 300 sec: 10913.4). Total num frames: 10629120. Throughput: 0: 5473.4, 1: 5472.6. Samples: 9266338. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:50:02,922][40872] Avg episode reward: [(0, '3193.816'), (1, '3251.487')] [2023-09-18 22:50:07,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 10678272. Throughput: 0: 5439.2, 1: 5439.6. Samples: 9298632. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:50:07,922][40872] Avg episode reward: [(0, '3256.762'), (1, '3276.373')] [2023-09-18 22:50:08,932][41480] Updated weights for policy 1, policy_version 9120 (0.0011) [2023-09-18 22:50:08,933][41393] Updated weights for policy 0, policy_version 11768 (0.0015) [2023-09-18 22:50:12,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 10735616. Throughput: 0: 5459.7, 1: 5459.2. Samples: 9366140. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:50:12,921][40872] Avg episode reward: [(0, '3278.783'), (1, '3214.958')] [2023-09-18 22:50:12,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000011808_6045696.pth... [2023-09-18 22:50:12,931][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000009160_4689920.pth... [2023-09-18 22:50:12,941][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000008840_4526080.pth [2023-09-18 22:50:12,942][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000011488_5881856.pth [2023-09-18 22:50:12,942][41359] Saving new best policy, reward=3278.783! [2023-09-18 22:50:16,274][41480] Updated weights for policy 1, policy_version 9200 (0.0011) [2023-09-18 22:50:16,275][41393] Updated weights for policy 0, policy_version 11848 (0.0014) [2023-09-18 22:50:17,921][40872] Fps is (10 sec: 11468.8, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 10792960. Throughput: 0: 5465.7, 1: 5465.1. Samples: 9431930. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 22:50:17,922][40872] Avg episode reward: [(0, '3184.540'), (1, '3025.803')] [2023-09-18 22:50:22,921][40872] Fps is (10 sec: 11468.7, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 10850304. Throughput: 0: 5507.9, 1: 5508.1. Samples: 9467840. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 22:50:22,922][40872] Avg episode reward: [(0, '3244.500'), (1, '2880.658')] [2023-09-18 22:50:23,565][41480] Updated weights for policy 1, policy_version 9280 (0.0011) [2023-09-18 22:50:23,566][41393] Updated weights for policy 0, policy_version 11928 (0.0015) [2023-09-18 22:50:27,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 10899456. Throughput: 0: 5462.5, 1: 5461.5. Samples: 9533044. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:50:27,922][40872] Avg episode reward: [(0, '3298.101'), (1, '2550.717')] [2023-09-18 22:50:27,933][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000009320_4771840.pth... [2023-09-18 22:50:27,934][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000011968_6127616.pth... [2023-09-18 22:50:27,941][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000009000_4608000.pth [2023-09-18 22:50:27,944][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000011648_5963776.pth [2023-09-18 22:50:27,945][41359] Saving new best policy, reward=3298.101! [2023-09-18 22:50:31,087][41480] Updated weights for policy 1, policy_version 9360 (0.0013) [2023-09-18 22:50:31,088][41393] Updated weights for policy 0, policy_version 12008 (0.0013) [2023-09-18 22:50:32,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 10956800. Throughput: 0: 5466.6, 1: 5465.8. Samples: 9597876. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:50:32,922][40872] Avg episode reward: [(0, '3376.109'), (1, '2367.738')] [2023-09-18 22:50:32,923][41359] Saving new best policy, reward=3376.109! [2023-09-18 22:50:37,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10786.1, 300 sec: 10913.4). Total num frames: 11005952. Throughput: 0: 5469.7, 1: 5469.4. Samples: 9629242. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:50:37,922][40872] Avg episode reward: [(0, '3364.568'), (1, '2337.694')] [2023-09-18 22:50:38,706][41480] Updated weights for policy 1, policy_version 9440 (0.0011) [2023-09-18 22:50:38,707][41393] Updated weights for policy 0, policy_version 12088 (0.0012) [2023-09-18 22:50:42,921][40872] Fps is (10 sec: 10649.4, 60 sec: 10922.6, 300 sec: 10913.4). Total num frames: 11063296. Throughput: 0: 5476.2, 1: 5475.9. Samples: 9694318. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:50:42,922][40872] Avg episode reward: [(0, '3285.202'), (1, '2489.292')] [2023-09-18 22:50:42,933][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000012128_6209536.pth... [2023-09-18 22:50:42,933][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000009480_4853760.pth... [2023-09-18 22:50:42,940][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000009160_4689920.pth [2023-09-18 22:50:42,941][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000011808_6045696.pth [2023-09-18 22:50:46,230][41393] Updated weights for policy 0, policy_version 12168 (0.0015) [2023-09-18 22:50:46,230][41480] Updated weights for policy 1, policy_version 9520 (0.0011) [2023-09-18 22:50:47,921][40872] Fps is (10 sec: 11468.8, 60 sec: 10922.6, 300 sec: 10941.2). Total num frames: 11120640. Throughput: 0: 5481.4, 1: 5480.1. Samples: 9759606. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:50:47,922][40872] Avg episode reward: [(0, '3259.598'), (1, '2651.810')] [2023-09-18 22:50:52,921][40872] Fps is (10 sec: 10649.8, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 11169792. Throughput: 0: 5486.2, 1: 5485.9. Samples: 9792376. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:50:52,922][40872] Avg episode reward: [(0, '3270.930'), (1, '2919.071')] [2023-09-18 22:50:53,883][41393] Updated weights for policy 0, policy_version 12248 (0.0011) [2023-09-18 22:50:53,883][41480] Updated weights for policy 1, policy_version 9600 (0.0014) [2023-09-18 22:50:57,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 11227136. Throughput: 0: 5431.2, 1: 5433.9. Samples: 9855070. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:50:57,921][40872] Avg episode reward: [(0, '3256.121'), (1, '3033.428')] [2023-09-18 22:50:57,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000009640_4935680.pth... [2023-09-18 22:50:57,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000012288_6291456.pth... [2023-09-18 22:50:57,937][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000009320_4771840.pth [2023-09-18 22:50:57,940][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000011968_6127616.pth [2023-09-18 22:51:01,005][41393] Updated weights for policy 0, policy_version 12328 (0.0012) [2023-09-18 22:51:01,005][41480] Updated weights for policy 1, policy_version 9680 (0.0014) [2023-09-18 22:51:02,921][40872] Fps is (10 sec: 11468.8, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 11284480. Throughput: 0: 5480.7, 1: 5481.5. Samples: 9925226. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:51:02,922][40872] Avg episode reward: [(0, '3175.873'), (1, '3127.182')] [2023-09-18 22:51:07,921][40872] Fps is (10 sec: 11469.0, 60 sec: 11059.2, 300 sec: 10941.2). Total num frames: 11341824. Throughput: 0: 5427.3, 1: 5427.6. Samples: 9956308. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:51:07,921][40872] Avg episode reward: [(0, '2893.103'), (1, '3299.019')] [2023-09-18 22:51:08,586][41393] Updated weights for policy 0, policy_version 12408 (0.0009) [2023-09-18 22:51:08,586][41480] Updated weights for policy 1, policy_version 9760 (0.0014) [2023-09-18 22:51:12,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10922.6, 300 sec: 10913.4). Total num frames: 11390976. Throughput: 0: 5472.2, 1: 5473.0. Samples: 10025578. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:51:12,922][40872] Avg episode reward: [(0, '2473.813'), (1, '3189.950')] [2023-09-18 22:51:12,931][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000009800_5017600.pth... [2023-09-18 22:51:12,942][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000009480_4853760.pth [2023-09-18 22:51:12,984][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000012456_6377472.pth... [2023-09-18 22:51:12,987][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000012128_6209536.pth [2023-09-18 22:51:15,859][41393] Updated weights for policy 0, policy_version 12488 (0.0015) [2023-09-18 22:51:15,859][41480] Updated weights for policy 1, policy_version 9840 (0.0014) [2023-09-18 22:51:17,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10941.2). Total num frames: 11448320. Throughput: 0: 5485.3, 1: 5485.6. Samples: 10091566. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:51:17,921][40872] Avg episode reward: [(0, '2656.963'), (1, '3191.897')] [2023-09-18 22:51:22,921][40872] Fps is (10 sec: 11469.0, 60 sec: 10922.7, 300 sec: 10941.2). Total num frames: 11505664. Throughput: 0: 5511.3, 1: 5514.0. Samples: 10125380. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 22:51:22,922][40872] Avg episode reward: [(0, '2822.157'), (1, '3208.221')] [2023-09-18 22:51:22,987][41393] Updated weights for policy 0, policy_version 12568 (0.0012) [2023-09-18 22:51:22,989][41480] Updated weights for policy 1, policy_version 9920 (0.0015) [2023-09-18 22:51:27,921][40872] Fps is (10 sec: 11468.5, 60 sec: 11059.2, 300 sec: 10968.9). Total num frames: 11563008. Throughput: 0: 5529.9, 1: 5530.4. Samples: 10192030. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 22:51:27,922][40872] Avg episode reward: [(0, '3095.484'), (1, '3327.687')] [2023-09-18 22:51:27,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000009968_5103616.pth... [2023-09-18 22:51:27,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000012616_6459392.pth... [2023-09-18 22:51:27,937][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000009640_4935680.pth [2023-09-18 22:51:27,940][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000012288_6291456.pth [2023-09-18 22:51:30,734][41393] Updated weights for policy 0, policy_version 12648 (0.0013) [2023-09-18 22:51:30,735][41480] Updated weights for policy 1, policy_version 10000 (0.0012) [2023-09-18 22:51:32,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10941.2). Total num frames: 11612160. Throughput: 0: 5518.7, 1: 5522.2. Samples: 10256448. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:51:32,921][40872] Avg episode reward: [(0, '2481.991'), (1, '3266.988')] [2023-09-18 22:51:35,922][41359] Early stopping after 2 epochs (8 sgd steps), loss delta 0.0000009 [2023-09-18 22:51:37,921][40872] Fps is (10 sec: 10649.7, 60 sec: 11059.2, 300 sec: 10941.2). Total num frames: 11669504. Throughput: 0: 5526.5, 1: 5526.4. Samples: 10289760. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:51:37,922][40872] Avg episode reward: [(0, '2352.817'), (1, '3050.233')] [2023-09-18 22:51:38,061][41393] Updated weights for policy 0, policy_version 12728 (0.0012) [2023-09-18 22:51:38,061][41480] Updated weights for policy 1, policy_version 10080 (0.0016) [2023-09-18 22:51:42,921][40872] Fps is (10 sec: 11468.6, 60 sec: 11059.2, 300 sec: 10968.9). Total num frames: 11726848. Throughput: 0: 5585.6, 1: 5583.1. Samples: 10357662. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:51:42,922][40872] Avg episode reward: [(0, '2534.013'), (1, '2983.940')] [2023-09-18 22:51:42,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000010128_5185536.pth... [2023-09-18 22:51:42,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000012776_6541312.pth... [2023-09-18 22:51:42,936][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000009800_5017600.pth [2023-09-18 22:51:42,941][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000012456_6377472.pth [2023-09-18 22:51:45,490][41393] Updated weights for policy 0, policy_version 12808 (0.0013) [2023-09-18 22:51:45,490][41480] Updated weights for policy 1, policy_version 10160 (0.0011) [2023-09-18 22:51:47,921][40872] Fps is (10 sec: 11468.9, 60 sec: 11059.2, 300 sec: 10968.9). Total num frames: 11784192. Throughput: 0: 5529.4, 1: 5528.5. Samples: 10422830. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:51:47,921][40872] Avg episode reward: [(0, '2859.602'), (1, '3047.421')] [2023-09-18 22:51:52,826][41480] Updated weights for policy 1, policy_version 10240 (0.0013) [2023-09-18 22:51:52,827][41393] Updated weights for policy 0, policy_version 12888 (0.0014) [2023-09-18 22:51:52,921][40872] Fps is (10 sec: 11468.9, 60 sec: 11195.7, 300 sec: 10996.7). Total num frames: 11841536. Throughput: 0: 5568.1, 1: 5567.1. Samples: 10457396. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:51:52,922][40872] Avg episode reward: [(0, '3104.465'), (1, '3146.438')] [2023-09-18 22:51:57,921][40872] Fps is (10 sec: 10649.4, 60 sec: 11059.2, 300 sec: 10968.9). Total num frames: 11890688. Throughput: 0: 5523.9, 1: 5524.3. Samples: 10522748. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:51:57,922][40872] Avg episode reward: [(0, '3025.524'), (1, '3124.803')] [2023-09-18 22:51:57,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000010288_5267456.pth... [2023-09-18 22:51:57,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000012936_6623232.pth... [2023-09-18 22:51:57,942][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000009968_5103616.pth [2023-09-18 22:51:57,942][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000012616_6459392.pth [2023-09-18 22:52:00,544][41393] Updated weights for policy 0, policy_version 12968 (0.0015) [2023-09-18 22:52:00,544][41480] Updated weights for policy 1, policy_version 10320 (0.0015) [2023-09-18 22:52:02,921][40872] Fps is (10 sec: 10649.7, 60 sec: 11059.2, 300 sec: 10996.7). Total num frames: 11948032. Throughput: 0: 5484.1, 1: 5484.4. Samples: 10585150. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:52:02,922][40872] Avg episode reward: [(0, '2991.027'), (1, '3124.471')] [2023-09-18 22:52:07,858][41393] Updated weights for policy 0, policy_version 13048 (0.0012) [2023-09-18 22:52:07,858][41480] Updated weights for policy 1, policy_version 10400 (0.0015) [2023-09-18 22:52:07,921][40872] Fps is (10 sec: 11469.1, 60 sec: 11059.2, 300 sec: 10996.7). Total num frames: 12005376. Throughput: 0: 5477.4, 1: 5474.0. Samples: 10618190. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:52:07,921][40872] Avg episode reward: [(0, '3029.690'), (1, '3195.510')] [2023-09-18 22:52:12,921][40872] Fps is (10 sec: 10649.6, 60 sec: 11059.2, 300 sec: 10969.0). Total num frames: 12054528. Throughput: 0: 5530.9, 1: 5531.6. Samples: 10689838. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-18 22:52:12,921][40872] Avg episode reward: [(0, '3071.793'), (1, '3268.794')] [2023-09-18 22:52:12,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000013104_6709248.pth... [2023-09-18 22:52:12,934][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000010456_5353472.pth... [2023-09-18 22:52:12,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000012776_6541312.pth [2023-09-18 22:52:12,937][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000010128_5185536.pth [2023-09-18 22:52:15,182][41480] Updated weights for policy 1, policy_version 10480 (0.0012) [2023-09-18 22:52:15,182][41393] Updated weights for policy 0, policy_version 13128 (0.0010) [2023-09-18 22:52:17,921][40872] Fps is (10 sec: 10649.4, 60 sec: 11059.2, 300 sec: 10968.9). Total num frames: 12111872. Throughput: 0: 5534.0, 1: 5531.9. Samples: 10754414. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-18 22:52:17,922][40872] Avg episode reward: [(0, '3208.358'), (1, '3185.031')] [2023-09-18 22:52:22,841][41480] Updated weights for policy 1, policy_version 10560 (0.0013) [2023-09-18 22:52:22,842][41393] Updated weights for policy 0, policy_version 13208 (0.0013) [2023-09-18 22:52:22,921][40872] Fps is (10 sec: 11468.7, 60 sec: 11059.2, 300 sec: 10996.7). Total num frames: 12169216. Throughput: 0: 5516.7, 1: 5516.9. Samples: 10786270. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:52:22,922][40872] Avg episode reward: [(0, '3207.205'), (1, '3124.201')] [2023-09-18 22:52:27,921][40872] Fps is (10 sec: 11468.8, 60 sec: 11059.2, 300 sec: 10996.7). Total num frames: 12226560. Throughput: 0: 5493.9, 1: 5493.9. Samples: 10852112. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:52:27,922][40872] Avg episode reward: [(0, '2942.721'), (1, '3050.804')] [2023-09-18 22:52:27,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000010616_5435392.pth... [2023-09-18 22:52:27,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000013264_6791168.pth... [2023-09-18 22:52:27,937][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000010288_5267456.pth [2023-09-18 22:52:27,940][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000012936_6623232.pth [2023-09-18 22:52:30,037][41480] Updated weights for policy 1, policy_version 10640 (0.0013) [2023-09-18 22:52:30,037][41393] Updated weights for policy 0, policy_version 13288 (0.0012) [2023-09-18 22:52:32,921][40872] Fps is (10 sec: 10649.6, 60 sec: 11059.2, 300 sec: 10996.7). Total num frames: 12275712. Throughput: 0: 5521.5, 1: 5525.3. Samples: 10919936. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:52:32,922][40872] Avg episode reward: [(0, '2692.620'), (1, '3184.699')] [2023-09-18 22:52:37,502][41393] Updated weights for policy 0, policy_version 13368 (0.0014) [2023-09-18 22:52:37,503][41480] Updated weights for policy 1, policy_version 10720 (0.0013) [2023-09-18 22:52:37,921][40872] Fps is (10 sec: 10649.6, 60 sec: 11059.2, 300 sec: 10996.7). Total num frames: 12333056. Throughput: 0: 5501.6, 1: 5504.7. Samples: 10952680. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:52:37,922][40872] Avg episode reward: [(0, '2360.093'), (1, '3106.573')] [2023-09-18 22:52:42,921][40872] Fps is (10 sec: 11468.7, 60 sec: 11059.2, 300 sec: 10996.7). Total num frames: 12390400. Throughput: 0: 5517.1, 1: 5517.2. Samples: 11019290. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:52:42,922][40872] Avg episode reward: [(0, '2213.574'), (1, '3158.962')] [2023-09-18 22:52:42,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000013424_6873088.pth... [2023-09-18 22:52:42,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000010776_5517312.pth... [2023-09-18 22:52:42,939][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000013104_6709248.pth [2023-09-18 22:52:42,940][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000010456_5353472.pth [2023-09-18 22:52:44,966][41393] Updated weights for policy 0, policy_version 13448 (0.0014) [2023-09-18 22:52:44,966][41480] Updated weights for policy 1, policy_version 10800 (0.0014) [2023-09-18 22:52:47,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 10996.7). Total num frames: 12439552. Throughput: 0: 5539.7, 1: 5542.3. Samples: 11083840. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:52:47,922][40872] Avg episode reward: [(0, '2153.504'), (1, '3034.817')] [2023-09-18 22:52:52,596][41393] Updated weights for policy 0, policy_version 13528 (0.0010) [2023-09-18 22:52:52,596][41480] Updated weights for policy 1, policy_version 10880 (0.0013) [2023-09-18 22:52:52,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 10996.7). Total num frames: 12496896. Throughput: 0: 5532.4, 1: 5533.0. Samples: 11116132. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:52:52,922][40872] Avg episode reward: [(0, '2245.924'), (1, '3017.906')] [2023-09-18 22:52:57,921][40872] Fps is (10 sec: 11468.9, 60 sec: 11059.2, 300 sec: 10996.7). Total num frames: 12554240. Throughput: 0: 5463.3, 1: 5462.5. Samples: 11181496. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:52:57,921][40872] Avg episode reward: [(0, '1450.406'), (1, '3042.222')] [2023-09-18 22:52:57,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000010936_5599232.pth... [2023-09-18 22:52:57,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000013584_6955008.pth... [2023-09-18 22:52:57,936][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000010616_5435392.pth [2023-09-18 22:52:57,939][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000013264_6791168.pth [2023-09-18 22:53:00,299][41480] Updated weights for policy 1, policy_version 10960 (0.0011) [2023-09-18 22:53:00,299][41393] Updated weights for policy 0, policy_version 13608 (0.0014) [2023-09-18 22:53:02,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10996.7). Total num frames: 12603392. Throughput: 0: 5430.9, 1: 5430.9. Samples: 11243196. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 22:53:02,922][40872] Avg episode reward: [(0, '1459.922'), (1, '3005.767')] [2023-09-18 22:53:07,921][40872] Fps is (10 sec: 9830.3, 60 sec: 10786.1, 300 sec: 10969.0). Total num frames: 12652544. Throughput: 0: 5428.4, 1: 5428.0. Samples: 11274810. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 22:53:07,922][40872] Avg episode reward: [(0, '1704.367'), (1, '3046.328')] [2023-09-18 22:53:08,085][41393] Updated weights for policy 0, policy_version 13688 (0.0014) [2023-09-18 22:53:08,085][41480] Updated weights for policy 1, policy_version 11040 (0.0013) [2023-09-18 22:53:12,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10922.6, 300 sec: 10968.9). Total num frames: 12709888. Throughput: 0: 5454.1, 1: 5454.3. Samples: 11342994. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:53:12,922][40872] Avg episode reward: [(0, '2058.938'), (1, '3161.445')] [2023-09-18 22:53:12,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000013736_7032832.pth... [2023-09-18 22:53:12,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000011088_5677056.pth... [2023-09-18 22:53:12,934][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000013424_6873088.pth [2023-09-18 22:53:12,939][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000010776_5517312.pth [2023-09-18 22:53:15,358][41480] Updated weights for policy 1, policy_version 11120 (0.0016) [2023-09-18 22:53:15,358][41393] Updated weights for policy 0, policy_version 13768 (0.0015) [2023-09-18 22:53:17,921][40872] Fps is (10 sec: 11468.9, 60 sec: 10922.7, 300 sec: 10969.0). Total num frames: 12767232. Throughput: 0: 5434.0, 1: 5430.1. Samples: 11408820. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:53:17,921][40872] Avg episode reward: [(0, '2510.798'), (1, '3288.397')] [2023-09-18 22:53:22,425][41393] Updated weights for policy 0, policy_version 13848 (0.0012) [2023-09-18 22:53:22,426][41480] Updated weights for policy 1, policy_version 11200 (0.0016) [2023-09-18 22:53:22,921][40872] Fps is (10 sec: 11468.8, 60 sec: 10922.7, 300 sec: 10996.7). Total num frames: 12824576. Throughput: 0: 5485.6, 1: 5483.6. Samples: 11446294. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-18 22:53:22,922][40872] Avg episode reward: [(0, '2733.613'), (1, '3197.427')] [2023-09-18 22:53:27,921][40872] Fps is (10 sec: 11468.8, 60 sec: 10922.7, 300 sec: 10996.7). Total num frames: 12881920. Throughput: 0: 5447.5, 1: 5449.5. Samples: 11509654. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-18 22:53:27,922][40872] Avg episode reward: [(0, '2887.406'), (1, '3129.416')] [2023-09-18 22:53:27,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000011256_5763072.pth... [2023-09-18 22:53:27,930][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000013904_7118848.pth... [2023-09-18 22:53:27,934][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000010936_5599232.pth [2023-09-18 22:53:27,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000013584_6955008.pth [2023-09-18 22:53:30,096][41393] Updated weights for policy 0, policy_version 13928 (0.0013) [2023-09-18 22:53:30,096][41480] Updated weights for policy 1, policy_version 11280 (0.0014) [2023-09-18 22:53:32,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10969.0). Total num frames: 12931072. Throughput: 0: 5455.0, 1: 5453.7. Samples: 11574732. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-18 22:53:32,922][40872] Avg episode reward: [(0, '2537.530'), (1, '2980.146')] [2023-09-18 22:53:37,672][41393] Updated weights for policy 0, policy_version 14008 (0.0013) [2023-09-18 22:53:37,673][41480] Updated weights for policy 1, policy_version 11360 (0.0013) [2023-09-18 22:53:37,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 10969.0). Total num frames: 12988416. Throughput: 0: 5458.3, 1: 5459.0. Samples: 11607410. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-18 22:53:37,921][40872] Avg episode reward: [(0, '2415.080'), (1, '3097.142')] [2023-09-18 22:53:42,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10786.2, 300 sec: 10969.0). Total num frames: 13037568. Throughput: 0: 5459.0, 1: 5459.5. Samples: 11672830. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 22:53:42,921][40872] Avg episode reward: [(0, '2471.240'), (1, '3110.917')] [2023-09-18 22:53:42,949][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000011416_5844992.pth... [2023-09-18 22:53:42,954][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000011088_5677056.pth [2023-09-18 22:53:42,957][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000014064_7200768.pth... [2023-09-18 22:53:42,960][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000013736_7032832.pth [2023-09-18 22:53:45,227][41480] Updated weights for policy 1, policy_version 11440 (0.0012) [2023-09-18 22:53:45,227][41393] Updated weights for policy 0, policy_version 14088 (0.0014) [2023-09-18 22:53:47,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10968.9). Total num frames: 13094912. Throughput: 0: 5503.6, 1: 5504.4. Samples: 11738552. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 22:53:47,921][40872] Avg episode reward: [(0, '2682.384'), (1, '3281.877')] [2023-09-18 22:53:52,082][41480] Updated weights for policy 1, policy_version 11520 (0.0012) [2023-09-18 22:53:52,083][41393] Updated weights for policy 0, policy_version 14168 (0.0015) [2023-09-18 22:53:52,921][40872] Fps is (10 sec: 12288.1, 60 sec: 11059.2, 300 sec: 10996.7). Total num frames: 13160448. Throughput: 0: 5548.9, 1: 5548.8. Samples: 11774204. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:53:52,921][40872] Avg episode reward: [(0, '2680.027'), (1, '3443.958')] [2023-09-18 22:53:52,922][41360] Saving new best policy, reward=3443.958! [2023-09-18 22:53:57,921][40872] Fps is (10 sec: 12287.9, 60 sec: 11059.2, 300 sec: 11024.5). Total num frames: 13217792. Throughput: 0: 5580.4, 1: 5582.4. Samples: 11845318. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:53:57,921][40872] Avg episode reward: [(0, '2386.319'), (1, '3550.624')] [2023-09-18 22:53:57,929][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000011584_5931008.pth... [2023-09-18 22:53:57,929][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000014232_7286784.pth... [2023-09-18 22:53:57,933][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000011256_5763072.pth [2023-09-18 22:53:57,934][41360] Saving new best policy, reward=3550.624! [2023-09-18 22:53:57,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000013904_7118848.pth [2023-09-18 22:53:59,089][41480] Updated weights for policy 1, policy_version 11600 (0.0013) [2023-09-18 22:53:59,089][41393] Updated weights for policy 0, policy_version 14248 (0.0013) [2023-09-18 22:54:02,921][40872] Fps is (10 sec: 11468.8, 60 sec: 11195.7, 300 sec: 11024.5). Total num frames: 13275136. Throughput: 0: 5593.5, 1: 5595.2. Samples: 11912312. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:54:02,922][40872] Avg episode reward: [(0, '2277.611'), (1, '3545.019')] [2023-09-18 22:54:06,596][41393] Updated weights for policy 0, policy_version 14328 (0.0016) [2023-09-18 22:54:06,597][41480] Updated weights for policy 1, policy_version 11680 (0.0013) [2023-09-18 22:54:07,921][40872] Fps is (10 sec: 10649.5, 60 sec: 11195.7, 300 sec: 10996.7). Total num frames: 13324288. Throughput: 0: 5541.7, 1: 5541.0. Samples: 11945016. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:54:07,922][40872] Avg episode reward: [(0, '2437.168'), (1, '3489.943')] [2023-09-18 22:54:12,921][40872] Fps is (10 sec: 10649.5, 60 sec: 11195.7, 300 sec: 10996.7). Total num frames: 13381632. Throughput: 0: 5591.9, 1: 5590.5. Samples: 12012864. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:54:12,922][40872] Avg episode reward: [(0, '2751.806'), (1, '3460.855')] [2023-09-18 22:54:12,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000014392_7368704.pth... [2023-09-18 22:54:12,933][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000011744_6012928.pth... [2023-09-18 22:54:12,938][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000014064_7200768.pth [2023-09-18 22:54:12,940][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000011416_5844992.pth [2023-09-18 22:54:14,012][41480] Updated weights for policy 1, policy_version 11760 (0.0013) [2023-09-18 22:54:14,013][41393] Updated weights for policy 0, policy_version 14408 (0.0015) [2023-09-18 22:54:17,921][40872] Fps is (10 sec: 11468.8, 60 sec: 11195.7, 300 sec: 10996.7). Total num frames: 13438976. Throughput: 0: 5590.7, 1: 5588.6. Samples: 12077802. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:54:17,922][40872] Avg episode reward: [(0, '2954.239'), (1, '3446.352')] [2023-09-18 22:54:21,207][41480] Updated weights for policy 1, policy_version 11840 (0.0013) [2023-09-18 22:54:21,207][41393] Updated weights for policy 0, policy_version 14488 (0.0012) [2023-09-18 22:54:22,921][40872] Fps is (10 sec: 11468.9, 60 sec: 11195.8, 300 sec: 11024.5). Total num frames: 13496320. Throughput: 0: 5638.3, 1: 5638.1. Samples: 12114848. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:54:22,921][40872] Avg episode reward: [(0, '2985.792'), (1, '3443.089')] [2023-09-18 22:54:27,921][40872] Fps is (10 sec: 11469.0, 60 sec: 11195.7, 300 sec: 11024.5). Total num frames: 13553664. Throughput: 0: 5648.7, 1: 5649.6. Samples: 12181250. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:54:27,921][40872] Avg episode reward: [(0, '3043.469'), (1, '3366.028')] [2023-09-18 22:54:27,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000011912_6098944.pth... [2023-09-18 22:54:27,930][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000014560_7454720.pth... [2023-09-18 22:54:27,938][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000011584_5931008.pth [2023-09-18 22:54:27,938][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000014232_7286784.pth [2023-09-18 22:54:28,289][41480] Updated weights for policy 1, policy_version 11920 (0.0015) [2023-09-18 22:54:28,289][41393] Updated weights for policy 0, policy_version 14568 (0.0014) [2023-09-18 22:54:32,921][40872] Fps is (10 sec: 11468.5, 60 sec: 11332.2, 300 sec: 11024.5). Total num frames: 13611008. Throughput: 0: 5734.9, 1: 5734.5. Samples: 12254680. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:54:32,922][40872] Avg episode reward: [(0, '3097.865'), (1, '3342.146')] [2023-09-18 22:54:35,320][41393] Updated weights for policy 0, policy_version 14648 (0.0011) [2023-09-18 22:54:35,320][41480] Updated weights for policy 1, policy_version 12000 (0.0013) [2023-09-18 22:54:37,921][40872] Fps is (10 sec: 11468.7, 60 sec: 11332.2, 300 sec: 11052.3). Total num frames: 13668352. Throughput: 0: 5706.5, 1: 5708.5. Samples: 12287880. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:54:37,922][40872] Avg episode reward: [(0, '3145.407'), (1, '3314.884')] [2023-09-18 22:54:42,618][41393] Updated weights for policy 0, policy_version 14728 (0.0013) [2023-09-18 22:54:42,618][41480] Updated weights for policy 1, policy_version 12080 (0.0013) [2023-09-18 22:54:42,921][40872] Fps is (10 sec: 11468.8, 60 sec: 11468.8, 300 sec: 11052.2). Total num frames: 13725696. Throughput: 0: 5664.2, 1: 5661.8. Samples: 12354992. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:54:42,922][40872] Avg episode reward: [(0, '3179.236'), (1, '3278.919')] [2023-09-18 22:54:42,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000014728_7540736.pth... [2023-09-18 22:54:42,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000012080_6184960.pth... [2023-09-18 22:54:42,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000014392_7368704.pth [2023-09-18 22:54:42,939][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000011744_6012928.pth [2023-09-18 22:54:47,921][40872] Fps is (10 sec: 11468.8, 60 sec: 11468.8, 300 sec: 11080.0). Total num frames: 13783040. Throughput: 0: 5638.8, 1: 5637.5. Samples: 12419750. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:54:47,922][40872] Avg episode reward: [(0, '3184.435'), (1, '3149.835')] [2023-09-18 22:54:50,008][41393] Updated weights for policy 0, policy_version 14808 (0.0013) [2023-09-18 22:54:50,008][41480] Updated weights for policy 1, policy_version 12160 (0.0015) [2023-09-18 22:54:52,921][40872] Fps is (10 sec: 10649.7, 60 sec: 11195.7, 300 sec: 11052.3). Total num frames: 13832192. Throughput: 0: 5667.9, 1: 5667.1. Samples: 12455090. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:54:52,922][40872] Avg episode reward: [(0, '3184.862'), (1, '3165.680')] [2023-09-18 22:54:57,331][41480] Updated weights for policy 1, policy_version 12240 (0.0015) [2023-09-18 22:54:57,331][41393] Updated weights for policy 0, policy_version 14888 (0.0012) [2023-09-18 22:54:57,921][40872] Fps is (10 sec: 10649.5, 60 sec: 11195.7, 300 sec: 11052.3). Total num frames: 13889536. Throughput: 0: 5650.3, 1: 5649.5. Samples: 12521356. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:54:57,922][40872] Avg episode reward: [(0, '3067.289'), (1, '3256.456')] [2023-09-18 22:54:57,931][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000012240_6266880.pth... [2023-09-18 22:54:57,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000014888_7622656.pth... [2023-09-18 22:54:57,937][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000011912_6098944.pth [2023-09-18 22:54:57,943][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000014560_7454720.pth [2023-09-18 22:55:02,921][40872] Fps is (10 sec: 11468.8, 60 sec: 11195.7, 300 sec: 11080.0). Total num frames: 13946880. Throughput: 0: 5681.5, 1: 5681.2. Samples: 12589122. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:55:02,922][40872] Avg episode reward: [(0, '3029.167'), (1, '3369.853')] [2023-09-18 22:55:04,746][41393] Updated weights for policy 0, policy_version 14968 (0.0014) [2023-09-18 22:55:04,746][41480] Updated weights for policy 1, policy_version 12320 (0.0014) [2023-09-18 22:55:07,921][40872] Fps is (10 sec: 11469.0, 60 sec: 11332.3, 300 sec: 11080.0). Total num frames: 14004224. Throughput: 0: 5631.4, 1: 5630.4. Samples: 12621630. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:55:07,921][40872] Avg episode reward: [(0, '2855.823'), (1, '3425.306')] [2023-09-18 22:55:12,260][41393] Updated weights for policy 0, policy_version 15048 (0.0015) [2023-09-18 22:55:12,261][41480] Updated weights for policy 1, policy_version 12400 (0.0015) [2023-09-18 22:55:12,921][40872] Fps is (10 sec: 10649.7, 60 sec: 11195.7, 300 sec: 11052.3). Total num frames: 14053376. Throughput: 0: 5622.2, 1: 5621.0. Samples: 12687194. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:55:12,921][40872] Avg episode reward: [(0, '2722.022'), (1, '3209.666')] [2023-09-18 22:55:12,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000015048_7704576.pth... [2023-09-18 22:55:12,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000012400_6348800.pth... [2023-09-18 22:55:12,936][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000014728_7540736.pth [2023-09-18 22:55:12,940][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000012080_6184960.pth [2023-09-18 22:55:17,921][40872] Fps is (10 sec: 10649.5, 60 sec: 11195.7, 300 sec: 11052.3). Total num frames: 14110720. Throughput: 0: 5539.8, 1: 5539.3. Samples: 12753236. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:55:17,922][40872] Avg episode reward: [(0, '2797.332'), (1, '3163.015')] [2023-09-18 22:55:19,787][41480] Updated weights for policy 1, policy_version 12480 (0.0016) [2023-09-18 22:55:19,787][41393] Updated weights for policy 0, policy_version 15128 (0.0011) [2023-09-18 22:55:22,921][40872] Fps is (10 sec: 11469.0, 60 sec: 11195.8, 300 sec: 11080.0). Total num frames: 14168064. Throughput: 0: 5526.8, 1: 5525.2. Samples: 12785216. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:55:22,921][40872] Avg episode reward: [(0, '2864.596'), (1, '3179.030')] [2023-09-18 22:55:27,568][41480] Updated weights for policy 1, policy_version 12560 (0.0013) [2023-09-18 22:55:27,569][41393] Updated weights for policy 0, policy_version 15208 (0.0016) [2023-09-18 22:55:27,921][40872] Fps is (10 sec: 10649.5, 60 sec: 11059.2, 300 sec: 11052.3). Total num frames: 14217216. Throughput: 0: 5473.2, 1: 5473.3. Samples: 12847584. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:55:27,922][40872] Avg episode reward: [(0, '3098.482'), (1, '3354.380')] [2023-09-18 22:55:27,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000015208_7786496.pth... [2023-09-18 22:55:27,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000012560_6430720.pth... [2023-09-18 22:55:27,936][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000014888_7622656.pth [2023-09-18 22:55:27,938][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000012240_6266880.pth [2023-09-18 22:55:32,921][40872] Fps is (10 sec: 10649.4, 60 sec: 11059.2, 300 sec: 11080.0). Total num frames: 14274560. Throughput: 0: 5473.3, 1: 5474.6. Samples: 12912406. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:55:32,922][40872] Avg episode reward: [(0, '3178.339'), (1, '3348.020')] [2023-09-18 22:55:35,070][41393] Updated weights for policy 0, policy_version 15288 (0.0012) [2023-09-18 22:55:35,071][41480] Updated weights for policy 1, policy_version 12640 (0.0015) [2023-09-18 22:55:37,921][40872] Fps is (10 sec: 10649.8, 60 sec: 10922.7, 300 sec: 11052.3). Total num frames: 14323712. Throughput: 0: 5447.3, 1: 5448.0. Samples: 12945378. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:55:37,921][40872] Avg episode reward: [(0, '3151.398'), (1, '3341.819')] [2023-09-18 22:55:42,579][41480] Updated weights for policy 1, policy_version 12720 (0.0013) [2023-09-18 22:55:42,580][41393] Updated weights for policy 0, policy_version 15368 (0.0015) [2023-09-18 22:55:42,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10922.7, 300 sec: 11052.3). Total num frames: 14381056. Throughput: 0: 5439.7, 1: 5439.6. Samples: 13010924. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:55:42,922][40872] Avg episode reward: [(0, '3241.290'), (1, '3304.922')] [2023-09-18 22:55:42,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000012720_6512640.pth... [2023-09-18 22:55:42,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000015368_7868416.pth... [2023-09-18 22:55:42,938][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000012400_6348800.pth [2023-09-18 22:55:42,943][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000015048_7704576.pth [2023-09-18 22:55:47,921][40872] Fps is (10 sec: 11468.6, 60 sec: 10922.6, 300 sec: 11080.0). Total num frames: 14438400. Throughput: 0: 5409.0, 1: 5410.2. Samples: 13075986. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:55:47,922][40872] Avg episode reward: [(0, '3217.344'), (1, '3267.328')] [2023-09-18 22:55:50,035][41393] Updated weights for policy 0, policy_version 15448 (0.0010) [2023-09-18 22:55:50,036][41480] Updated weights for policy 1, policy_version 12800 (0.0015) [2023-09-18 22:55:52,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 11052.3). Total num frames: 14487552. Throughput: 0: 5421.1, 1: 5421.5. Samples: 13109544. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:55:52,922][40872] Avg episode reward: [(0, '3125.562'), (1, '3286.556')] [2023-09-18 22:55:57,661][41393] Updated weights for policy 0, policy_version 15528 (0.0015) [2023-09-18 22:55:57,661][41480] Updated weights for policy 1, policy_version 12880 (0.0014) [2023-09-18 22:55:57,921][40872] Fps is (10 sec: 10649.8, 60 sec: 10922.7, 300 sec: 11052.3). Total num frames: 14544896. Throughput: 0: 5405.1, 1: 5404.7. Samples: 13173634. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:55:57,921][40872] Avg episode reward: [(0, '2923.364'), (1, '3289.231')] [2023-09-18 22:55:57,928][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000015528_7950336.pth... [2023-09-18 22:55:57,928][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000012880_6594560.pth... [2023-09-18 22:55:57,932][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000015208_7786496.pth [2023-09-18 22:55:57,934][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000012560_6430720.pth [2023-09-18 22:56:02,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10786.1, 300 sec: 11024.5). Total num frames: 14594048. Throughput: 0: 5388.2, 1: 5390.4. Samples: 13238274. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:56:02,922][40872] Avg episode reward: [(0, '2786.433'), (1, '3371.443')] [2023-09-18 22:56:05,354][41480] Updated weights for policy 1, policy_version 12960 (0.0010) [2023-09-18 22:56:05,354][41393] Updated weights for policy 0, policy_version 15608 (0.0014) [2023-09-18 22:56:07,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10786.1, 300 sec: 11052.3). Total num frames: 14651392. Throughput: 0: 5389.6, 1: 5390.1. Samples: 13270306. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 22:56:07,922][40872] Avg episode reward: [(0, '2770.788'), (1, '3473.457')] [2023-09-18 22:56:12,845][41393] Updated weights for policy 0, policy_version 15688 (0.0015) [2023-09-18 22:56:12,846][41480] Updated weights for policy 1, policy_version 13040 (0.0014) [2023-09-18 22:56:12,921][40872] Fps is (10 sec: 11469.0, 60 sec: 10922.7, 300 sec: 11052.3). Total num frames: 14708736. Throughput: 0: 5431.5, 1: 5433.7. Samples: 13336514. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 22:56:12,921][40872] Avg episode reward: [(0, '2762.738'), (1, '3441.737')] [2023-09-18 22:56:12,928][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000015688_8032256.pth... [2023-09-18 22:56:12,928][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000013040_6676480.pth... [2023-09-18 22:56:12,931][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000015368_7868416.pth [2023-09-18 22:56:12,934][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000012720_6512640.pth [2023-09-18 22:56:17,921][40872] Fps is (10 sec: 10649.4, 60 sec: 10786.1, 300 sec: 11024.5). Total num frames: 14757888. Throughput: 0: 5439.4, 1: 5440.2. Samples: 13401988. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:56:17,922][40872] Avg episode reward: [(0, '2800.303'), (1, '3385.798')] [2023-09-18 22:56:20,176][41393] Updated weights for policy 0, policy_version 15768 (0.0012) [2023-09-18 22:56:20,176][41480] Updated weights for policy 1, policy_version 13120 (0.0012) [2023-09-18 22:56:22,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10786.1, 300 sec: 11024.5). Total num frames: 14815232. Throughput: 0: 5444.6, 1: 5444.3. Samples: 13435376. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:56:22,921][40872] Avg episode reward: [(0, '2763.315'), (1, '3387.237')] [2023-09-18 22:56:27,707][41480] Updated weights for policy 1, policy_version 13200 (0.0013) [2023-09-18 22:56:27,707][41393] Updated weights for policy 0, policy_version 15848 (0.0012) [2023-09-18 22:56:27,921][40872] Fps is (10 sec: 11468.9, 60 sec: 10922.7, 300 sec: 11052.3). Total num frames: 14872576. Throughput: 0: 5439.2, 1: 5441.1. Samples: 13500538. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 22:56:27,922][40872] Avg episode reward: [(0, '2586.151'), (1, '3348.509')] [2023-09-18 22:56:27,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000015848_8114176.pth... [2023-09-18 22:56:27,931][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000013200_6758400.pth... [2023-09-18 22:56:27,937][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000012880_6594560.pth [2023-09-18 22:56:27,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000015528_7950336.pth [2023-09-18 22:56:32,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10786.2, 300 sec: 11024.5). Total num frames: 14921728. Throughput: 0: 5443.7, 1: 5446.0. Samples: 13566018. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:56:32,921][40872] Avg episode reward: [(0, '2361.090'), (1, '3381.236')] [2023-09-18 22:56:35,327][41480] Updated weights for policy 1, policy_version 13280 (0.0014) [2023-09-18 22:56:35,328][41393] Updated weights for policy 0, policy_version 15928 (0.0015) [2023-09-18 22:56:37,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.6, 300 sec: 11024.5). Total num frames: 14979072. Throughput: 0: 5433.0, 1: 5435.0. Samples: 13598604. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:56:37,922][40872] Avg episode reward: [(0, '2347.923'), (1, '3402.775')] [2023-09-18 22:56:42,851][41393] Updated weights for policy 0, policy_version 16008 (0.0012) [2023-09-18 22:56:42,851][41480] Updated weights for policy 1, policy_version 13360 (0.0012) [2023-09-18 22:56:42,921][40872] Fps is (10 sec: 11468.5, 60 sec: 10922.7, 300 sec: 11024.5). Total num frames: 15036416. Throughput: 0: 5450.1, 1: 5452.6. Samples: 13664258. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:56:42,922][40872] Avg episode reward: [(0, '2390.159'), (1, '3410.058')] [2023-09-18 22:56:42,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000016008_8196096.pth... [2023-09-18 22:56:42,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000013360_6840320.pth... [2023-09-18 22:56:42,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000015688_8032256.pth [2023-09-18 22:56:42,940][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000013040_6676480.pth [2023-09-18 22:56:47,921][40872] Fps is (10 sec: 11469.0, 60 sec: 10922.7, 300 sec: 11024.5). Total num frames: 15093760. Throughput: 0: 5475.1, 1: 5471.9. Samples: 13730888. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:56:47,921][40872] Avg episode reward: [(0, '2081.585'), (1, '3378.442')] [2023-09-18 22:56:50,015][41393] Updated weights for policy 0, policy_version 16088 (0.0013) [2023-09-18 22:56:50,016][41480] Updated weights for policy 1, policy_version 13440 (0.0012) [2023-09-18 22:56:52,921][40872] Fps is (10 sec: 11468.9, 60 sec: 11059.2, 300 sec: 11052.3). Total num frames: 15151104. Throughput: 0: 5494.3, 1: 5494.6. Samples: 13764808. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:56:52,922][40872] Avg episode reward: [(0, '1925.360'), (1, '3373.809')] [2023-09-18 22:56:57,288][41480] Updated weights for policy 1, policy_version 13520 (0.0014) [2023-09-18 22:56:57,288][41393] Updated weights for policy 0, policy_version 16168 (0.0014) [2023-09-18 22:56:57,921][40872] Fps is (10 sec: 10649.3, 60 sec: 10922.6, 300 sec: 11024.5). Total num frames: 15200256. Throughput: 0: 5523.5, 1: 5521.7. Samples: 13833552. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:56:57,922][40872] Avg episode reward: [(0, '2005.678'), (1, '3388.201')] [2023-09-18 22:56:57,973][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000013528_6926336.pth... [2023-09-18 22:56:57,978][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000013200_6758400.pth [2023-09-18 22:56:57,981][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000016176_8282112.pth... [2023-09-18 22:56:57,992][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000015848_8114176.pth [2023-09-18 22:57:02,921][40872] Fps is (10 sec: 10649.8, 60 sec: 11059.2, 300 sec: 11024.5). Total num frames: 15257600. Throughput: 0: 5520.7, 1: 5519.1. Samples: 13898778. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:57:02,922][40872] Avg episode reward: [(0, '2010.712'), (1, '3465.410')] [2023-09-18 22:57:04,841][41393] Updated weights for policy 0, policy_version 16248 (0.0012) [2023-09-18 22:57:04,842][41480] Updated weights for policy 1, policy_version 13600 (0.0015) [2023-09-18 22:57:07,921][40872] Fps is (10 sec: 11468.9, 60 sec: 11059.2, 300 sec: 11052.3). Total num frames: 15314944. Throughput: 0: 5518.0, 1: 5517.6. Samples: 13931980. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-18 22:57:07,922][40872] Avg episode reward: [(0, '2086.335'), (1, '3535.213')] [2023-09-18 22:57:12,159][41480] Updated weights for policy 1, policy_version 13680 (0.0015) [2023-09-18 22:57:12,160][41393] Updated weights for policy 0, policy_version 16328 (0.0014) [2023-09-18 22:57:12,921][40872] Fps is (10 sec: 11468.5, 60 sec: 11059.2, 300 sec: 11052.3). Total num frames: 15372288. Throughput: 0: 5547.6, 1: 5548.0. Samples: 13999840. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-18 22:57:12,922][40872] Avg episode reward: [(0, '2156.299'), (1, '3534.958')] [2023-09-18 22:57:12,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000016336_8364032.pth... [2023-09-18 22:57:12,931][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000013688_7008256.pth... [2023-09-18 22:57:12,939][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000013360_6840320.pth [2023-09-18 22:57:12,939][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000016008_8196096.pth [2023-09-18 22:57:17,921][40872] Fps is (10 sec: 10649.6, 60 sec: 11059.2, 300 sec: 11024.5). Total num frames: 15421440. Throughput: 0: 5551.6, 1: 5551.6. Samples: 14065664. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:57:17,922][40872] Avg episode reward: [(0, '2118.761'), (1, '3526.956')] [2023-09-18 22:57:19,614][41480] Updated weights for policy 1, policy_version 13760 (0.0015) [2023-09-18 22:57:19,614][41393] Updated weights for policy 0, policy_version 16408 (0.0018) [2023-09-18 22:57:22,921][40872] Fps is (10 sec: 10649.8, 60 sec: 11059.2, 300 sec: 11024.5). Total num frames: 15478784. Throughput: 0: 5542.1, 1: 5540.6. Samples: 14097326. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:57:22,922][40872] Avg episode reward: [(0, '2189.898'), (1, '3495.322')] [2023-09-18 22:57:27,195][41393] Updated weights for policy 0, policy_version 16488 (0.0013) [2023-09-18 22:57:27,196][41480] Updated weights for policy 1, policy_version 13840 (0.0016) [2023-09-18 22:57:27,923][40872] Fps is (10 sec: 10648.0, 60 sec: 10922.4, 300 sec: 11024.4). Total num frames: 15527936. Throughput: 0: 5543.1, 1: 5541.1. Samples: 14163062. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:57:27,923][40872] Avg episode reward: [(0, '2434.542'), (1, '3457.282')] [2023-09-18 22:57:27,952][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000013848_7090176.pth... [2023-09-18 22:57:27,955][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000013528_6926336.pth [2023-09-18 22:57:27,961][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000016496_8445952.pth... [2023-09-18 22:57:27,964][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000016176_8282112.pth [2023-09-18 22:57:32,921][40872] Fps is (10 sec: 10649.3, 60 sec: 11059.1, 300 sec: 11024.5). Total num frames: 15585280. Throughput: 0: 5515.6, 1: 5516.0. Samples: 14227310. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:57:32,922][40872] Avg episode reward: [(0, '2623.583'), (1, '3444.630')] [2023-09-18 22:57:34,636][41393] Updated weights for policy 0, policy_version 16568 (0.0011) [2023-09-18 22:57:34,638][41480] Updated weights for policy 1, policy_version 13920 (0.0013) [2023-09-18 22:57:37,921][40872] Fps is (10 sec: 11470.3, 60 sec: 11059.2, 300 sec: 11024.5). Total num frames: 15642624. Throughput: 0: 5525.1, 1: 5526.3. Samples: 14262124. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:57:37,922][40872] Avg episode reward: [(0, '2722.763'), (1, '3322.234')] [2023-09-18 22:57:41,845][41393] Updated weights for policy 0, policy_version 16648 (0.0014) [2023-09-18 22:57:41,845][41480] Updated weights for policy 1, policy_version 14000 (0.0014) [2023-09-18 22:57:42,921][40872] Fps is (10 sec: 11469.0, 60 sec: 11059.2, 300 sec: 11052.3). Total num frames: 15699968. Throughput: 0: 5511.4, 1: 5510.8. Samples: 14329550. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:57:42,922][40872] Avg episode reward: [(0, '2634.988'), (1, '3352.389')] [2023-09-18 22:57:42,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000016656_8527872.pth... [2023-09-18 22:57:42,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000014008_7172096.pth... [2023-09-18 22:57:42,938][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000013688_7008256.pth [2023-09-18 22:57:42,938][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000016336_8364032.pth [2023-09-18 22:57:47,921][40872] Fps is (10 sec: 11469.0, 60 sec: 11059.2, 300 sec: 11052.3). Total num frames: 15757312. Throughput: 0: 5515.3, 1: 5516.1. Samples: 14395194. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:57:47,922][40872] Avg episode reward: [(0, '2630.442'), (1, '3351.405')] [2023-09-18 22:57:49,479][41393] Updated weights for policy 0, policy_version 16728 (0.0017) [2023-09-18 22:57:49,480][41480] Updated weights for policy 1, policy_version 14080 (0.0016) [2023-09-18 22:57:52,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 11024.5). Total num frames: 15806464. Throughput: 0: 5489.1, 1: 5492.4. Samples: 14426146. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:57:52,922][40872] Avg episode reward: [(0, '2614.665'), (1, '3373.416')] [2023-09-18 22:57:57,270][41480] Updated weights for policy 1, policy_version 14160 (0.0014) [2023-09-18 22:57:57,270][41393] Updated weights for policy 0, policy_version 16808 (0.0015) [2023-09-18 22:57:57,921][40872] Fps is (10 sec: 9830.3, 60 sec: 10922.7, 300 sec: 11024.5). Total num frames: 15855616. Throughput: 0: 5441.3, 1: 5438.9. Samples: 14489450. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:57:57,922][40872] Avg episode reward: [(0, '2715.346'), (1, '3440.187')] [2023-09-18 22:57:57,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000016808_8605696.pth... [2023-09-18 22:57:57,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000014160_7249920.pth... [2023-09-18 22:57:57,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000016496_8445952.pth [2023-09-18 22:57:57,938][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000013848_7090176.pth [2023-09-18 22:58:02,921][40872] Fps is (10 sec: 10649.4, 60 sec: 10922.6, 300 sec: 11052.3). Total num frames: 15912960. Throughput: 0: 5426.0, 1: 5422.4. Samples: 14553838. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:58:02,922][40872] Avg episode reward: [(0, '2918.357'), (1, '3386.414')] [2023-09-18 22:58:04,879][41480] Updated weights for policy 1, policy_version 14240 (0.0014) [2023-09-18 22:58:04,879][41393] Updated weights for policy 0, policy_version 16888 (0.0014) [2023-09-18 22:58:07,921][40872] Fps is (10 sec: 11469.1, 60 sec: 10922.7, 300 sec: 11052.3). Total num frames: 15970304. Throughput: 0: 5428.7, 1: 5428.0. Samples: 14585874. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:58:07,921][40872] Avg episode reward: [(0, '3027.524'), (1, '3294.280')] [2023-09-18 22:58:12,114][41393] Updated weights for policy 0, policy_version 16968 (0.0014) [2023-09-18 22:58:12,114][41480] Updated weights for policy 1, policy_version 14320 (0.0014) [2023-09-18 22:58:12,921][40872] Fps is (10 sec: 11468.7, 60 sec: 10922.7, 300 sec: 11052.2). Total num frames: 16027648. Throughput: 0: 5448.5, 1: 5447.5. Samples: 14653368. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:58:12,922][40872] Avg episode reward: [(0, '2870.788'), (1, '3333.421')] [2023-09-18 22:58:12,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000016976_8691712.pth... [2023-09-18 22:58:12,934][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000014328_7335936.pth... [2023-09-18 22:58:12,940][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000016656_8527872.pth [2023-09-18 22:58:12,941][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000014008_7172096.pth [2023-09-18 22:58:17,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 11024.5). Total num frames: 16076800. Throughput: 0: 5484.3, 1: 5486.7. Samples: 14721004. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:58:17,921][40872] Avg episode reward: [(0, '2723.676'), (1, '3321.970')] [2023-09-18 22:58:19,326][41393] Updated weights for policy 0, policy_version 17048 (0.0017) [2023-09-18 22:58:19,326][41480] Updated weights for policy 1, policy_version 14400 (0.0015) [2023-09-18 22:58:22,921][40872] Fps is (10 sec: 10649.9, 60 sec: 10922.7, 300 sec: 11024.5). Total num frames: 16134144. Throughput: 0: 5480.6, 1: 5478.9. Samples: 14755296. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-18 22:58:22,921][40872] Avg episode reward: [(0, '2028.634'), (1, '3369.552')] [2023-09-18 22:58:26,799][41480] Updated weights for policy 1, policy_version 14480 (0.0014) [2023-09-18 22:58:26,800][41393] Updated weights for policy 0, policy_version 17128 (0.0015) [2023-09-18 22:58:27,921][40872] Fps is (10 sec: 11468.6, 60 sec: 11059.5, 300 sec: 11052.3). Total num frames: 16191488. Throughput: 0: 5466.8, 1: 5467.4. Samples: 14821592. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-18 22:58:27,922][40872] Avg episode reward: [(0, '1849.699'), (1, '3295.704')] [2023-09-18 22:58:27,933][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000014488_7417856.pth... [2023-09-18 22:58:27,933][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000017136_8773632.pth... [2023-09-18 22:58:27,941][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000014160_7249920.pth [2023-09-18 22:58:27,943][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000016808_8605696.pth [2023-09-18 22:58:32,921][40872] Fps is (10 sec: 11468.7, 60 sec: 11059.2, 300 sec: 11052.3). Total num frames: 16248832. Throughput: 0: 5458.7, 1: 5458.7. Samples: 14886474. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:58:32,922][40872] Avg episode reward: [(0, '2097.738'), (1, '3384.782')] [2023-09-18 22:58:34,396][41393] Updated weights for policy 0, policy_version 17208 (0.0015) [2023-09-18 22:58:34,397][41480] Updated weights for policy 1, policy_version 14560 (0.0015) [2023-09-18 22:58:37,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 11052.3). Total num frames: 16297984. Throughput: 0: 5475.8, 1: 5473.1. Samples: 14918848. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:58:37,922][40872] Avg episode reward: [(0, '2132.632'), (1, '3455.553')] [2023-09-18 22:58:41,955][41480] Updated weights for policy 1, policy_version 14640 (0.0014) [2023-09-18 22:58:41,956][41393] Updated weights for policy 0, policy_version 17288 (0.0011) [2023-09-18 22:58:42,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 11052.3). Total num frames: 16355328. Throughput: 0: 5486.5, 1: 5488.1. Samples: 14983304. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:58:42,921][40872] Avg episode reward: [(0, '2196.377'), (1, '3506.818')] [2023-09-18 22:58:42,930][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000017296_8855552.pth... [2023-09-18 22:58:42,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000014648_7499776.pth... [2023-09-18 22:58:42,937][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000014328_7335936.pth [2023-09-18 22:58:42,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000016976_8691712.pth [2023-09-18 22:58:47,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10786.1, 300 sec: 10996.7). Total num frames: 16404480. Throughput: 0: 5497.5, 1: 5501.0. Samples: 15048772. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:58:47,922][40872] Avg episode reward: [(0, '2093.598'), (1, '3533.929')] [2023-09-18 22:58:49,535][41393] Updated weights for policy 0, policy_version 17368 (0.0014) [2023-09-18 22:58:49,536][41480] Updated weights for policy 1, policy_version 14720 (0.0014) [2023-09-18 22:58:52,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 10996.7). Total num frames: 16461824. Throughput: 0: 5505.2, 1: 5508.1. Samples: 15081470. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:58:52,921][40872] Avg episode reward: [(0, '2210.548'), (1, '3516.415')] [2023-09-18 22:58:57,051][41393] Updated weights for policy 0, policy_version 17448 (0.0014) [2023-09-18 22:58:57,051][41480] Updated weights for policy 1, policy_version 14800 (0.0014) [2023-09-18 22:58:57,921][40872] Fps is (10 sec: 11468.8, 60 sec: 11059.2, 300 sec: 10996.7). Total num frames: 16519168. Throughput: 0: 5483.4, 1: 5486.5. Samples: 15147010. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:58:57,922][40872] Avg episode reward: [(0, '2167.472'), (1, '3463.358')] [2023-09-18 22:58:57,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000017456_8937472.pth... [2023-09-18 22:58:57,931][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000014808_7581696.pth... [2023-09-18 22:58:57,939][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000017136_8773632.pth [2023-09-18 22:58:57,939][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000014488_7417856.pth [2023-09-18 22:59:02,921][40872] Fps is (10 sec: 10649.4, 60 sec: 10922.7, 300 sec: 10996.7). Total num frames: 16568320. Throughput: 0: 5434.2, 1: 5432.4. Samples: 15210004. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:59:02,922][40872] Avg episode reward: [(0, '2332.767'), (1, '3481.328')] [2023-09-18 22:59:04,775][41393] Updated weights for policy 0, policy_version 17528 (0.0016) [2023-09-18 22:59:04,775][41480] Updated weights for policy 1, policy_version 14880 (0.0015) [2023-09-18 22:59:07,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10996.7). Total num frames: 16625664. Throughput: 0: 5410.3, 1: 5410.9. Samples: 15242248. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:59:07,922][40872] Avg episode reward: [(0, '2600.657'), (1, '3505.681')] [2023-09-18 22:59:12,167][41393] Updated weights for policy 0, policy_version 17608 (0.0013) [2023-09-18 22:59:12,168][41480] Updated weights for policy 1, policy_version 14960 (0.0016) [2023-09-18 22:59:12,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10786.1, 300 sec: 10968.9). Total num frames: 16674816. Throughput: 0: 5422.5, 1: 5423.2. Samples: 15309648. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:59:12,922][40872] Avg episode reward: [(0, '2833.377'), (1, '3475.352')] [2023-09-18 22:59:12,945][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000014968_7663616.pth... [2023-09-18 22:59:12,953][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000014648_7499776.pth [2023-09-18 22:59:12,954][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000017616_9019392.pth... [2023-09-18 22:59:12,957][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000017296_8855552.pth [2023-09-18 22:59:17,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.6, 300 sec: 10968.9). Total num frames: 16732160. Throughput: 0: 5443.1, 1: 5444.6. Samples: 15376418. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-18 22:59:17,922][40872] Avg episode reward: [(0, '2705.745'), (1, '3489.609')] [2023-09-18 22:59:19,370][41393] Updated weights for policy 0, policy_version 17688 (0.0014) [2023-09-18 22:59:19,370][41480] Updated weights for policy 1, policy_version 15040 (0.0015) [2023-09-18 22:59:22,921][40872] Fps is (10 sec: 11468.9, 60 sec: 10922.6, 300 sec: 10968.9). Total num frames: 16789504. Throughput: 0: 5459.9, 1: 5460.2. Samples: 15410252. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:59:22,922][40872] Avg episode reward: [(0, '2669.449'), (1, '3436.888')] [2023-09-18 22:59:26,868][41393] Updated weights for policy 0, policy_version 17768 (0.0013) [2023-09-18 22:59:26,868][41480] Updated weights for policy 1, policy_version 15120 (0.0012) [2023-09-18 22:59:27,921][40872] Fps is (10 sec: 11468.8, 60 sec: 10922.7, 300 sec: 10969.0). Total num frames: 16846848. Throughput: 0: 5462.3, 1: 5461.6. Samples: 15474876. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:59:27,922][40872] Avg episode reward: [(0, '2770.376'), (1, '3355.064')] [2023-09-18 22:59:27,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000017776_9101312.pth... [2023-09-18 22:59:27,931][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000015128_7745536.pth... [2023-09-18 22:59:27,934][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000017456_8937472.pth [2023-09-18 22:59:27,935][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000014808_7581696.pth [2023-09-18 22:59:32,921][40872] Fps is (10 sec: 11468.8, 60 sec: 10922.7, 300 sec: 10968.9). Total num frames: 16904192. Throughput: 0: 5492.8, 1: 5490.8. Samples: 15543036. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:59:32,922][40872] Avg episode reward: [(0, '3026.509'), (1, '3378.471')] [2023-09-18 22:59:34,313][41480] Updated weights for policy 1, policy_version 15200 (0.0013) [2023-09-18 22:59:34,314][41393] Updated weights for policy 0, policy_version 17848 (0.0013) [2023-09-18 22:59:37,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 10941.2). Total num frames: 16953344. Throughput: 0: 5477.8, 1: 5474.4. Samples: 15574320. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:59:37,921][40872] Avg episode reward: [(0, '3131.233'), (1, '3390.487')] [2023-09-18 22:59:42,019][41393] Updated weights for policy 0, policy_version 17928 (0.0013) [2023-09-18 22:59:42,019][41480] Updated weights for policy 1, policy_version 15280 (0.0014) [2023-09-18 22:59:42,921][40872] Fps is (10 sec: 10649.4, 60 sec: 10922.6, 300 sec: 10941.2). Total num frames: 17010688. Throughput: 0: 5462.0, 1: 5462.0. Samples: 15638594. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 22:59:42,922][40872] Avg episode reward: [(0, '3116.972'), (1, '3482.135')] [2023-09-18 22:59:42,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000017936_9183232.pth... [2023-09-18 22:59:42,933][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000015288_7827456.pth... [2023-09-18 22:59:42,936][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000017616_9019392.pth [2023-09-18 22:59:42,942][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000014968_7663616.pth [2023-09-18 22:59:47,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10922.7, 300 sec: 10941.2). Total num frames: 17059840. Throughput: 0: 5487.1, 1: 5488.3. Samples: 15703898. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:59:47,922][40872] Avg episode reward: [(0, '3107.329'), (1, '3508.933')] [2023-09-18 22:59:49,543][41393] Updated weights for policy 0, policy_version 18008 (0.0014) [2023-09-18 22:59:49,543][41480] Updated weights for policy 1, policy_version 15360 (0.0012) [2023-09-18 22:59:52,921][40872] Fps is (10 sec: 10649.8, 60 sec: 10922.6, 300 sec: 10941.2). Total num frames: 17117184. Throughput: 0: 5494.2, 1: 5494.6. Samples: 15736746. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:59:52,922][40872] Avg episode reward: [(0, '3096.759'), (1, '3344.730')] [2023-09-18 22:59:57,040][41393] Updated weights for policy 0, policy_version 18088 (0.0010) [2023-09-18 22:59:57,041][41480] Updated weights for policy 1, policy_version 15440 (0.0015) [2023-09-18 22:59:57,921][40872] Fps is (10 sec: 11468.7, 60 sec: 10922.6, 300 sec: 10941.2). Total num frames: 17174528. Throughput: 0: 5472.6, 1: 5471.9. Samples: 15802150. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 22:59:57,922][40872] Avg episode reward: [(0, '3116.769'), (1, '3290.650')] [2023-09-18 22:59:57,931][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000015448_7909376.pth... [2023-09-18 22:59:57,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000018096_9265152.pth... [2023-09-18 22:59:57,937][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000015128_7745536.pth [2023-09-18 22:59:57,940][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000017776_9101312.pth [2023-09-18 23:00:02,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 17223680. Throughput: 0: 5450.5, 1: 5448.0. Samples: 15866848. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:00:02,921][40872] Avg episode reward: [(0, '3242.947'), (1, '3134.125')] [2023-09-18 23:00:04,494][41480] Updated weights for policy 1, policy_version 15520 (0.0014) [2023-09-18 23:00:04,494][41393] Updated weights for policy 0, policy_version 18168 (0.0014) [2023-09-18 23:00:07,921][40872] Fps is (10 sec: 10649.8, 60 sec: 10922.7, 300 sec: 10941.2). Total num frames: 17281024. Throughput: 0: 5448.3, 1: 5450.8. Samples: 15900710. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:00:07,921][40872] Avg episode reward: [(0, '3265.932'), (1, '3140.854')] [2023-09-18 23:00:12,044][41480] Updated weights for policy 1, policy_version 15600 (0.0012) [2023-09-18 23:00:12,045][41393] Updated weights for policy 0, policy_version 18248 (0.0013) [2023-09-18 23:00:12,921][40872] Fps is (10 sec: 11468.6, 60 sec: 11059.2, 300 sec: 10941.2). Total num frames: 17338368. Throughput: 0: 5458.1, 1: 5460.4. Samples: 15966210. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:00:12,922][40872] Avg episode reward: [(0, '3271.760'), (1, '3154.359')] [2023-09-18 23:00:12,930][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000018256_9347072.pth... [2023-09-18 23:00:12,931][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000015608_7991296.pth... [2023-09-18 23:00:12,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000017936_9183232.pth [2023-09-18 23:00:12,938][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000015288_7827456.pth [2023-09-18 23:00:17,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 17387520. Throughput: 0: 5426.9, 1: 5427.8. Samples: 16031496. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:00:17,922][40872] Avg episode reward: [(0, '3319.106'), (1, '3135.861')] [2023-09-18 23:00:19,568][41393] Updated weights for policy 0, policy_version 18328 (0.0013) [2023-09-18 23:00:19,568][41480] Updated weights for policy 1, policy_version 15680 (0.0013) [2023-09-18 23:00:22,921][40872] Fps is (10 sec: 10649.8, 60 sec: 10922.7, 300 sec: 10941.2). Total num frames: 17444864. Throughput: 0: 5445.0, 1: 5448.3. Samples: 16064516. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:00:22,921][40872] Avg episode reward: [(0, '3313.973'), (1, '3126.498')] [2023-09-18 23:00:27,225][41393] Updated weights for policy 0, policy_version 18408 (0.0013) [2023-09-18 23:00:27,225][41480] Updated weights for policy 1, policy_version 15760 (0.0012) [2023-09-18 23:00:27,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10786.1, 300 sec: 10913.4). Total num frames: 17494016. Throughput: 0: 5449.7, 1: 5447.6. Samples: 16128968. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 23:00:27,922][40872] Avg episode reward: [(0, '3312.855'), (1, '3229.220')] [2023-09-18 23:00:27,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000015768_8073216.pth... [2023-09-18 23:00:27,930][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000018416_9428992.pth... [2023-09-18 23:00:27,933][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000015448_7909376.pth [2023-09-18 23:00:27,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000018096_9265152.pth [2023-09-18 23:00:32,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10786.2, 300 sec: 10941.2). Total num frames: 17551360. Throughput: 0: 5421.3, 1: 5419.5. Samples: 16191732. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 23:00:32,921][40872] Avg episode reward: [(0, '3285.214'), (1, '3256.245')] [2023-09-18 23:00:34,962][41393] Updated weights for policy 0, policy_version 18488 (0.0012) [2023-09-18 23:00:34,962][41480] Updated weights for policy 1, policy_version 15840 (0.0013) [2023-09-18 23:00:37,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10786.1, 300 sec: 10913.4). Total num frames: 17600512. Throughput: 0: 5404.0, 1: 5403.2. Samples: 16223068. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:00:37,922][40872] Avg episode reward: [(0, '3228.581'), (1, '3368.062')] [2023-09-18 23:00:42,864][41393] Updated weights for policy 0, policy_version 18568 (0.0014) [2023-09-18 23:00:42,864][41480] Updated weights for policy 1, policy_version 15920 (0.0012) [2023-09-18 23:00:42,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10786.2, 300 sec: 10913.4). Total num frames: 17657856. Throughput: 0: 5369.1, 1: 5369.5. Samples: 16285388. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:00:42,921][40872] Avg episode reward: [(0, '3241.865'), (1, '3391.197')] [2023-09-18 23:00:42,929][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000015920_8151040.pth... [2023-09-18 23:00:42,929][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000018568_9506816.pth... [2023-09-18 23:00:42,935][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000015608_7991296.pth [2023-09-18 23:00:42,936][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000018256_9347072.pth [2023-09-18 23:00:47,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10786.1, 300 sec: 10913.4). Total num frames: 17707008. Throughput: 0: 5374.7, 1: 5375.4. Samples: 16350602. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:00:47,922][40872] Avg episode reward: [(0, '3191.627'), (1, '3344.028')] [2023-09-18 23:00:50,505][41480] Updated weights for policy 1, policy_version 16000 (0.0011) [2023-09-18 23:00:50,506][41393] Updated weights for policy 0, policy_version 18648 (0.0012) [2023-09-18 23:00:52,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10786.1, 300 sec: 10913.4). Total num frames: 17764352. Throughput: 0: 5358.0, 1: 5355.4. Samples: 16382814. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 23:00:52,922][40872] Avg episode reward: [(0, '3153.368'), (1, '3288.568')] [2023-09-18 23:00:57,921][40872] Fps is (10 sec: 11059.3, 60 sec: 10717.9, 300 sec: 10927.3). Total num frames: 17817600. Throughput: 0: 5343.4, 1: 5340.8. Samples: 16446996. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 23:00:57,922][40872] Avg episode reward: [(0, '3161.445'), (1, '3240.099')] [2023-09-18 23:00:57,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000018728_9588736.pth... [2023-09-18 23:00:57,937][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000016080_8232960.pth... [2023-09-18 23:00:57,938][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000018416_9428992.pth [2023-09-18 23:00:57,939][41393] Updated weights for policy 0, policy_version 18728 (0.0013) [2023-09-18 23:00:57,940][41480] Updated weights for policy 1, policy_version 16080 (0.0010) [2023-09-18 23:00:57,941][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000015768_8073216.pth [2023-09-18 23:01:02,921][40872] Fps is (10 sec: 10649.8, 60 sec: 10786.1, 300 sec: 10913.4). Total num frames: 17870848. Throughput: 0: 5366.2, 1: 5364.5. Samples: 16514374. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 23:01:02,921][40872] Avg episode reward: [(0, '3192.981'), (1, '3258.114')] [2023-09-18 23:01:05,483][41480] Updated weights for policy 1, policy_version 16160 (0.0011) [2023-09-18 23:01:05,483][41393] Updated weights for policy 0, policy_version 18808 (0.0015) [2023-09-18 23:01:07,921][40872] Fps is (10 sec: 11059.1, 60 sec: 10786.1, 300 sec: 10913.4). Total num frames: 17928192. Throughput: 0: 5363.2, 1: 5361.7. Samples: 16547138. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:01:07,922][40872] Avg episode reward: [(0, '3165.324'), (1, '3209.019')] [2023-09-18 23:01:12,921][40872] Fps is (10 sec: 10649.3, 60 sec: 10649.6, 300 sec: 10913.4). Total num frames: 17977344. Throughput: 0: 5360.2, 1: 5360.1. Samples: 16611382. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:01:12,922][40872] Avg episode reward: [(0, '3269.270'), (1, '3225.039')] [2023-09-18 23:01:12,986][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000016240_8314880.pth... [2023-09-18 23:01:12,989][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000018888_9670656.pth... [2023-09-18 23:01:12,990][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000015920_8151040.pth [2023-09-18 23:01:12,991][41480] Updated weights for policy 1, policy_version 16240 (0.0018) [2023-09-18 23:01:12,992][41393] Updated weights for policy 0, policy_version 18888 (0.0016) [2023-09-18 23:01:12,992][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000018568_9506816.pth [2023-09-18 23:01:17,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10786.1, 300 sec: 10913.4). Total num frames: 18034688. Throughput: 0: 5400.7, 1: 5399.6. Samples: 16677746. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-18 23:01:17,922][40872] Avg episode reward: [(0, '3132.069'), (1, '3136.712')] [2023-09-18 23:01:20,340][41480] Updated weights for policy 1, policy_version 16320 (0.0012) [2023-09-18 23:01:20,341][41393] Updated weights for policy 0, policy_version 18968 (0.0013) [2023-09-18 23:01:22,921][40872] Fps is (10 sec: 11469.2, 60 sec: 10786.1, 300 sec: 10913.4). Total num frames: 18092032. Throughput: 0: 5424.5, 1: 5425.6. Samples: 16711320. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-18 23:01:22,921][40872] Avg episode reward: [(0, '2712.252'), (1, '3096.570')] [2023-09-18 23:01:27,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10786.1, 300 sec: 10913.4). Total num frames: 18141184. Throughput: 0: 5436.6, 1: 5436.0. Samples: 16774660. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-18 23:01:27,922][40872] Avg episode reward: [(0, '2654.086'), (1, '2939.185')] [2023-09-18 23:01:27,954][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000019048_9752576.pth... [2023-09-18 23:01:27,954][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000016400_8396800.pth... [2023-09-18 23:01:27,956][41480] Updated weights for policy 1, policy_version 16400 (0.0014) [2023-09-18 23:01:27,956][41393] Updated weights for policy 0, policy_version 19048 (0.0014) [2023-09-18 23:01:27,958][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000018728_9588736.pth [2023-09-18 23:01:27,960][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000016080_8232960.pth [2023-09-18 23:01:32,921][40872] Fps is (10 sec: 10649.3, 60 sec: 10786.1, 300 sec: 10913.4). Total num frames: 18198528. Throughput: 0: 5434.0, 1: 5433.4. Samples: 16839632. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:01:32,922][40872] Avg episode reward: [(0, '2780.856'), (1, '2930.408')] [2023-09-18 23:01:35,495][41480] Updated weights for policy 1, policy_version 16480 (0.0016) [2023-09-18 23:01:35,496][41393] Updated weights for policy 0, policy_version 19128 (0.0013) [2023-09-18 23:01:37,921][40872] Fps is (10 sec: 11469.0, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 18255872. Throughput: 0: 5455.0, 1: 5454.3. Samples: 16873732. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:01:37,921][40872] Avg episode reward: [(0, '2939.752'), (1, '3064.452')] [2023-09-18 23:01:42,921][40872] Fps is (10 sec: 11059.2, 60 sec: 10854.4, 300 sec: 10899.5). Total num frames: 18309120. Throughput: 0: 5480.7, 1: 5480.4. Samples: 16940244. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:01:42,922][40872] Avg episode reward: [(0, '3302.307'), (1, '3149.163')] [2023-09-18 23:01:42,930][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000019208_9834496.pth... [2023-09-18 23:01:42,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000016560_8478720.pth... [2023-09-18 23:01:42,932][41393] Updated weights for policy 0, policy_version 19208 (0.0012) [2023-09-18 23:01:42,932][41480] Updated weights for policy 1, policy_version 16560 (0.0015) [2023-09-18 23:01:42,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000018888_9670656.pth [2023-09-18 23:01:42,938][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000016240_8314880.pth [2023-09-18 23:01:47,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 18362368. Throughput: 0: 5465.3, 1: 5467.6. Samples: 17006358. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 23:01:47,922][40872] Avg episode reward: [(0, '3242.290'), (1, '3128.230')] [2023-09-18 23:01:50,321][41480] Updated weights for policy 1, policy_version 16640 (0.0009) [2023-09-18 23:01:50,321][41393] Updated weights for policy 0, policy_version 19288 (0.0014) [2023-09-18 23:01:52,921][40872] Fps is (10 sec: 11059.4, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 18419712. Throughput: 0: 5466.4, 1: 5465.6. Samples: 17039074. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 23:01:52,921][40872] Avg episode reward: [(0, '3260.938'), (1, '3023.063')] [2023-09-18 23:01:57,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10854.4, 300 sec: 10885.6). Total num frames: 18468864. Throughput: 0: 5457.1, 1: 5456.6. Samples: 17102498. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 23:01:57,922][40872] Avg episode reward: [(0, '3263.687'), (1, '3002.210')] [2023-09-18 23:01:57,933][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000016712_8556544.pth... [2023-09-18 23:01:57,933][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000019360_9912320.pth... [2023-09-18 23:01:57,938][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000019048_9752576.pth [2023-09-18 23:01:57,943][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000016400_8396800.pth [2023-09-18 23:01:58,126][41393] Updated weights for policy 0, policy_version 19368 (0.0012) [2023-09-18 23:01:58,127][41480] Updated weights for policy 1, policy_version 16720 (0.0012) [2023-09-18 23:02:02,921][40872] Fps is (10 sec: 10649.4, 60 sec: 10922.6, 300 sec: 10885.6). Total num frames: 18526208. Throughput: 0: 5442.8, 1: 5443.8. Samples: 17167646. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:02:02,922][40872] Avg episode reward: [(0, '3262.949'), (1, '3020.812')] [2023-09-18 23:02:05,656][41393] Updated weights for policy 0, policy_version 19448 (0.0014) [2023-09-18 23:02:05,657][41480] Updated weights for policy 1, policy_version 16800 (0.0013) [2023-09-18 23:02:07,921][40872] Fps is (10 sec: 11469.0, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 18583552. Throughput: 0: 5422.1, 1: 5420.4. Samples: 17199234. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:02:07,921][40872] Avg episode reward: [(0, '3256.421'), (1, '3059.141')] [2023-09-18 23:02:12,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 18632704. Throughput: 0: 5434.5, 1: 5435.4. Samples: 17263806. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 23:02:12,922][40872] Avg episode reward: [(0, '3300.419'), (1, '3053.255')] [2023-09-18 23:02:12,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000016872_8638464.pth... [2023-09-18 23:02:12,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000019520_9994240.pth... [2023-09-18 23:02:12,937][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000016560_8478720.pth [2023-09-18 23:02:12,940][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000019208_9834496.pth [2023-09-18 23:02:13,229][41393] Updated weights for policy 0, policy_version 19528 (0.0012) [2023-09-18 23:02:13,230][41480] Updated weights for policy 1, policy_version 16880 (0.0012) [2023-09-18 23:02:17,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 18690048. Throughput: 0: 5454.8, 1: 5454.8. Samples: 17330564. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 23:02:17,921][40872] Avg episode reward: [(0, '3284.277'), (1, '3180.258')] [2023-09-18 23:02:20,514][41480] Updated weights for policy 1, policy_version 16960 (0.0016) [2023-09-18 23:02:20,514][41393] Updated weights for policy 0, policy_version 19608 (0.0013) [2023-09-18 23:02:22,921][40872] Fps is (10 sec: 11469.1, 60 sec: 10922.7, 300 sec: 10913.5). Total num frames: 18747392. Throughput: 0: 5458.2, 1: 5459.1. Samples: 17365010. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 23:02:22,921][40872] Avg episode reward: [(0, '3292.643'), (1, '3194.204')] [2023-09-18 23:02:27,899][41393] Updated weights for policy 0, policy_version 19688 (0.0013) [2023-09-18 23:02:27,899][41480] Updated weights for policy 1, policy_version 17040 (0.0014) [2023-09-18 23:02:27,921][40872] Fps is (10 sec: 11468.5, 60 sec: 11059.2, 300 sec: 10913.4). Total num frames: 18804736. Throughput: 0: 5468.7, 1: 5471.0. Samples: 17432532. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:02:27,922][40872] Avg episode reward: [(0, '3285.696'), (1, '3205.001')] [2023-09-18 23:02:27,933][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000017040_8724480.pth... [2023-09-18 23:02:27,933][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000019688_10080256.pth... [2023-09-18 23:02:27,942][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000016712_8556544.pth [2023-09-18 23:02:27,943][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000019360_9912320.pth [2023-09-18 23:02:32,921][40872] Fps is (10 sec: 10649.4, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 18853888. Throughput: 0: 5460.6, 1: 5460.2. Samples: 17497794. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:02:32,922][40872] Avg episode reward: [(0, '3273.621'), (1, '3249.674')] [2023-09-18 23:02:35,320][41480] Updated weights for policy 1, policy_version 17120 (0.0015) [2023-09-18 23:02:35,320][41393] Updated weights for policy 0, policy_version 19768 (0.0015) [2023-09-18 23:02:37,921][40872] Fps is (10 sec: 11469.1, 60 sec: 11059.2, 300 sec: 10913.4). Total num frames: 18919424. Throughput: 0: 5462.0, 1: 5463.1. Samples: 17530706. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:02:37,921][40872] Avg episode reward: [(0, '3265.671'), (1, '3211.406')] [2023-09-18 23:02:42,418][41393] Updated weights for policy 0, policy_version 19848 (0.0015) [2023-09-18 23:02:42,418][41480] Updated weights for policy 1, policy_version 17200 (0.0010) [2023-09-18 23:02:42,921][40872] Fps is (10 sec: 11468.7, 60 sec: 10990.9, 300 sec: 10885.6). Total num frames: 18968576. Throughput: 0: 5529.4, 1: 5529.4. Samples: 17600146. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 23:02:42,922][40872] Avg episode reward: [(0, '3250.435'), (1, '3247.512')] [2023-09-18 23:02:42,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000019848_10162176.pth... [2023-09-18 23:02:42,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000017200_8806400.pth... [2023-09-18 23:02:42,941][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000016872_8638464.pth [2023-09-18 23:02:42,941][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000019520_9994240.pth [2023-09-18 23:02:47,921][40872] Fps is (10 sec: 10649.5, 60 sec: 11059.2, 300 sec: 10913.4). Total num frames: 19025920. Throughput: 0: 5554.1, 1: 5554.0. Samples: 17667508. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 23:02:47,922][40872] Avg episode reward: [(0, '3113.208'), (1, '3133.415')] [2023-09-18 23:02:49,909][41480] Updated weights for policy 1, policy_version 17280 (0.0012) [2023-09-18 23:02:49,910][41393] Updated weights for policy 0, policy_version 19928 (0.0012) [2023-09-18 23:02:52,921][40872] Fps is (10 sec: 11468.9, 60 sec: 11059.2, 300 sec: 10941.2). Total num frames: 19083264. Throughput: 0: 5555.0, 1: 5554.8. Samples: 17699176. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-18 23:02:52,922][40872] Avg episode reward: [(0, '2955.026'), (1, '3139.423')] [2023-09-18 23:02:57,306][41393] Updated weights for policy 0, policy_version 20008 (0.0014) [2023-09-18 23:02:57,306][41480] Updated weights for policy 1, policy_version 17360 (0.0015) [2023-09-18 23:02:57,921][40872] Fps is (10 sec: 10649.5, 60 sec: 11059.2, 300 sec: 10913.4). Total num frames: 19132416. Throughput: 0: 5566.1, 1: 5566.5. Samples: 17764774. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 23:02:57,922][40872] Avg episode reward: [(0, '2970.867'), (1, '3103.694')] [2023-09-18 23:02:57,931][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000017360_8888320.pth... [2023-09-18 23:02:57,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000020008_10244096.pth... [2023-09-18 23:02:57,936][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000017040_8724480.pth [2023-09-18 23:02:57,944][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000019688_10080256.pth [2023-09-18 23:03:02,921][40872] Fps is (10 sec: 10649.7, 60 sec: 11059.2, 300 sec: 10913.4). Total num frames: 19189760. Throughput: 0: 5525.4, 1: 5525.6. Samples: 17827860. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-18 23:03:02,922][40872] Avg episode reward: [(0, '3077.822'), (1, '3180.034')] [2023-09-18 23:03:05,282][41393] Updated weights for policy 0, policy_version 20088 (0.0013) [2023-09-18 23:03:05,283][41480] Updated weights for policy 1, policy_version 17440 (0.0016) [2023-09-18 23:03:07,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.6, 300 sec: 10885.6). Total num frames: 19238912. Throughput: 0: 5482.3, 1: 5485.4. Samples: 17858560. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:03:07,922][40872] Avg episode reward: [(0, '3088.314'), (1, '3156.807')] [2023-09-18 23:03:12,711][41480] Updated weights for policy 1, policy_version 17520 (0.0013) [2023-09-18 23:03:12,712][41393] Updated weights for policy 0, policy_version 20168 (0.0015) [2023-09-18 23:03:12,921][40872] Fps is (10 sec: 10649.5, 60 sec: 11059.2, 300 sec: 10913.4). Total num frames: 19296256. Throughput: 0: 5464.4, 1: 5463.0. Samples: 17924262. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:03:12,921][40872] Avg episode reward: [(0, '3135.516'), (1, '3053.975')] [2023-09-18 23:03:12,929][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000020168_10326016.pth... [2023-09-18 23:03:12,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000017520_8970240.pth... [2023-09-18 23:03:12,935][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000019848_10162176.pth [2023-09-18 23:03:12,937][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000017200_8806400.pth [2023-09-18 23:03:17,921][40872] Fps is (10 sec: 10649.8, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 19345408. Throughput: 0: 5450.8, 1: 5448.7. Samples: 17988268. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:03:17,921][40872] Avg episode reward: [(0, '3175.814'), (1, '2917.019')] [2023-09-18 23:03:20,498][41393] Updated weights for policy 0, policy_version 20248 (0.0013) [2023-09-18 23:03:20,499][41480] Updated weights for policy 1, policy_version 17600 (0.0011) [2023-09-18 23:03:22,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.6, 300 sec: 10885.6). Total num frames: 19402752. Throughput: 0: 5441.1, 1: 5439.4. Samples: 18020330. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 23:03:22,922][40872] Avg episode reward: [(0, '3118.519'), (1, '2886.190')] [2023-09-18 23:03:27,921][40872] Fps is (10 sec: 10649.2, 60 sec: 10786.1, 300 sec: 10857.9). Total num frames: 19451904. Throughput: 0: 5355.1, 1: 5355.2. Samples: 18082112. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 23:03:27,924][40872] Avg episode reward: [(0, '3096.952'), (1, '2889.271')] [2023-09-18 23:03:27,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000017672_9048064.pth... [2023-09-18 23:03:27,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000020320_10403840.pth... [2023-09-18 23:03:27,938][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000017360_8888320.pth [2023-09-18 23:03:27,941][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000020008_10244096.pth [2023-09-18 23:03:28,357][41480] Updated weights for policy 1, policy_version 17680 (0.0013) [2023-09-18 23:03:28,358][41393] Updated weights for policy 0, policy_version 20328 (0.0014) [2023-09-18 23:03:32,921][40872] Fps is (10 sec: 9830.5, 60 sec: 10786.2, 300 sec: 10857.9). Total num frames: 19501056. Throughput: 0: 5311.3, 1: 5311.5. Samples: 18145530. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 23:03:32,922][40872] Avg episode reward: [(0, '3145.063'), (1, '2951.090')] [2023-09-18 23:03:36,043][41393] Updated weights for policy 0, policy_version 20408 (0.0015) [2023-09-18 23:03:36,043][41480] Updated weights for policy 1, policy_version 17760 (0.0010) [2023-09-18 23:03:37,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10649.6, 300 sec: 10857.9). Total num frames: 19558400. Throughput: 0: 5319.1, 1: 5322.5. Samples: 18178048. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 23:03:37,922][40872] Avg episode reward: [(0, '3172.342'), (1, '3089.932')] [2023-09-18 23:03:42,921][40872] Fps is (10 sec: 10649.4, 60 sec: 10649.6, 300 sec: 10857.9). Total num frames: 19607552. Throughput: 0: 5287.8, 1: 5286.8. Samples: 18240632. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 23:03:42,922][40872] Avg episode reward: [(0, '3226.228'), (1, '3087.035')] [2023-09-18 23:03:42,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000020472_10481664.pth... [2023-09-18 23:03:42,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000017824_9125888.pth... [2023-09-18 23:03:42,940][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000017520_8970240.pth [2023-09-18 23:03:42,941][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000020168_10326016.pth [2023-09-18 23:03:43,934][41480] Updated weights for policy 1, policy_version 17840 (0.0013) [2023-09-18 23:03:43,934][41393] Updated weights for policy 0, policy_version 20488 (0.0015) [2023-09-18 23:03:47,921][40872] Fps is (10 sec: 10649.8, 60 sec: 10649.6, 300 sec: 10857.9). Total num frames: 19664896. Throughput: 0: 5295.6, 1: 5295.1. Samples: 18304442. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-18 23:03:47,921][40872] Avg episode reward: [(0, '3133.623'), (1, '3042.167')] [2023-09-18 23:03:51,006][41480] Updated weights for policy 1, policy_version 17920 (0.0014) [2023-09-18 23:03:51,006][41393] Updated weights for policy 0, policy_version 20568 (0.0014) [2023-09-18 23:03:52,921][40872] Fps is (10 sec: 11469.1, 60 sec: 10649.6, 300 sec: 10857.9). Total num frames: 19722240. Throughput: 0: 5367.5, 1: 5366.7. Samples: 18341600. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:03:52,921][40872] Avg episode reward: [(0, '3177.494'), (1, '2870.497')] [2023-09-18 23:03:57,921][40872] Fps is (10 sec: 11468.6, 60 sec: 10786.1, 300 sec: 10885.6). Total num frames: 19779584. Throughput: 0: 5367.4, 1: 5369.5. Samples: 18407426. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:03:57,922][40872] Avg episode reward: [(0, '3224.568'), (1, '2836.833')] [2023-09-18 23:03:57,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000020640_10567680.pth... [2023-09-18 23:03:57,931][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000017992_9211904.pth... [2023-09-18 23:03:57,936][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000017672_9048064.pth [2023-09-18 23:03:57,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000020320_10403840.pth [2023-09-18 23:03:58,469][41480] Updated weights for policy 1, policy_version 18000 (0.0014) [2023-09-18 23:03:58,469][41393] Updated weights for policy 0, policy_version 20648 (0.0016) [2023-09-18 23:04:02,921][40872] Fps is (10 sec: 11468.6, 60 sec: 10786.1, 300 sec: 10885.6). Total num frames: 19836928. Throughput: 0: 5398.8, 1: 5399.7. Samples: 18474204. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:04:02,922][40872] Avg episode reward: [(0, '3266.731'), (1, '2886.338')] [2023-09-18 23:04:05,895][41480] Updated weights for policy 1, policy_version 18080 (0.0014) [2023-09-18 23:04:05,895][41393] Updated weights for policy 0, policy_version 20728 (0.0014) [2023-09-18 23:04:07,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10786.1, 300 sec: 10885.6). Total num frames: 19886080. Throughput: 0: 5405.9, 1: 5406.4. Samples: 18506884. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:04:07,922][40872] Avg episode reward: [(0, '3303.381'), (1, '2911.842')] [2023-09-18 23:04:12,921][40872] Fps is (10 sec: 10649.8, 60 sec: 10786.2, 300 sec: 10885.6). Total num frames: 19943424. Throughput: 0: 5433.3, 1: 5433.5. Samples: 18571114. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:04:12,921][40872] Avg episode reward: [(0, '3297.674'), (1, '2944.345')] [2023-09-18 23:04:12,927][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000020800_10649600.pth... [2023-09-18 23:04:12,927][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000018152_9293824.pth... [2023-09-18 23:04:12,932][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000017824_9125888.pth [2023-09-18 23:04:12,932][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000020472_10481664.pth [2023-09-18 23:04:13,601][41480] Updated weights for policy 1, policy_version 18160 (0.0012) [2023-09-18 23:04:13,603][41393] Updated weights for policy 0, policy_version 20808 (0.0014) [2023-09-18 23:04:17,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10786.1, 300 sec: 10857.9). Total num frames: 19992576. Throughput: 0: 5456.2, 1: 5457.5. Samples: 18636646. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:04:17,922][40872] Avg episode reward: [(0, '3231.212'), (1, '2912.546')] [2023-09-18 23:04:20,930][41393] Updated weights for policy 0, policy_version 20888 (0.0010) [2023-09-18 23:04:20,931][41480] Updated weights for policy 1, policy_version 18240 (0.0015) [2023-09-18 23:04:22,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10786.2, 300 sec: 10857.9). Total num frames: 20049920. Throughput: 0: 5467.8, 1: 5465.6. Samples: 18670048. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:04:22,921][40872] Avg episode reward: [(0, '3226.410'), (1, '3087.138')] [2023-09-18 23:04:27,921][40872] Fps is (10 sec: 11468.9, 60 sec: 10922.7, 300 sec: 10857.9). Total num frames: 20107264. Throughput: 0: 5518.4, 1: 5518.5. Samples: 18737288. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:04:27,921][40872] Avg episode reward: [(0, '3135.298'), (1, '3157.287')] [2023-09-18 23:04:27,928][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000020960_10731520.pth... [2023-09-18 23:04:27,928][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000018312_9375744.pth... [2023-09-18 23:04:27,931][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000020640_10567680.pth [2023-09-18 23:04:27,933][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000017992_9211904.pth [2023-09-18 23:04:28,362][41480] Updated weights for policy 1, policy_version 18320 (0.0016) [2023-09-18 23:04:28,363][41393] Updated weights for policy 0, policy_version 20968 (0.0014) [2023-09-18 23:04:32,921][40872] Fps is (10 sec: 11468.7, 60 sec: 11059.2, 300 sec: 10885.6). Total num frames: 20164608. Throughput: 0: 5534.4, 1: 5534.8. Samples: 18802558. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:04:32,921][40872] Avg episode reward: [(0, '3155.706'), (1, '3121.985')] [2023-09-18 23:04:35,855][41393] Updated weights for policy 0, policy_version 21048 (0.0013) [2023-09-18 23:04:35,855][41480] Updated weights for policy 1, policy_version 18400 (0.0014) [2023-09-18 23:04:37,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10857.9). Total num frames: 20213760. Throughput: 0: 5476.4, 1: 5474.1. Samples: 18834374. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:04:37,922][40872] Avg episode reward: [(0, '3209.078'), (1, '3061.452')] [2023-09-18 23:04:42,921][40872] Fps is (10 sec: 10649.7, 60 sec: 11059.3, 300 sec: 10885.6). Total num frames: 20271104. Throughput: 0: 5473.0, 1: 5469.7. Samples: 18899844. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:04:42,921][40872] Avg episode reward: [(0, '3198.780'), (1, '2924.715')] [2023-09-18 23:04:42,929][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000021120_10813440.pth... [2023-09-18 23:04:42,929][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000018472_9457664.pth... [2023-09-18 23:04:42,932][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000020800_10649600.pth [2023-09-18 23:04:42,934][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000018152_9293824.pth [2023-09-18 23:04:43,334][41393] Updated weights for policy 0, policy_version 21128 (0.0014) [2023-09-18 23:04:43,335][41480] Updated weights for policy 1, policy_version 18480 (0.0012) [2023-09-18 23:04:47,921][40872] Fps is (10 sec: 11468.8, 60 sec: 11059.2, 300 sec: 10885.6). Total num frames: 20328448. Throughput: 0: 5464.0, 1: 5463.4. Samples: 18965940. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:04:47,922][40872] Avg episode reward: [(0, '3148.118'), (1, '2989.393')] [2023-09-18 23:04:50,805][41393] Updated weights for policy 0, policy_version 21208 (0.0016) [2023-09-18 23:04:50,805][41480] Updated weights for policy 1, policy_version 18560 (0.0013) [2023-09-18 23:04:52,921][40872] Fps is (10 sec: 10649.4, 60 sec: 10922.6, 300 sec: 10857.9). Total num frames: 20377600. Throughput: 0: 5472.0, 1: 5472.0. Samples: 18999366. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:04:52,922][40872] Avg episode reward: [(0, '3151.001'), (1, '3008.510')] [2023-09-18 23:04:57,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 20434944. Throughput: 0: 5463.2, 1: 5464.9. Samples: 19062882. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:04:57,922][40872] Avg episode reward: [(0, '3188.892'), (1, '3045.992')] [2023-09-18 23:04:57,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000018632_9539584.pth... [2023-09-18 23:04:57,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000021280_10895360.pth... [2023-09-18 23:04:57,938][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000020960_10731520.pth [2023-09-18 23:04:57,940][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000018312_9375744.pth [2023-09-18 23:04:58,509][41393] Updated weights for policy 0, policy_version 21288 (0.0013) [2023-09-18 23:04:58,509][41480] Updated weights for policy 1, policy_version 18640 (0.0015) [2023-09-18 23:05:02,921][40872] Fps is (10 sec: 11469.0, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 20492288. Throughput: 0: 5469.3, 1: 5468.6. Samples: 19128850. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:05:02,921][40872] Avg episode reward: [(0, '3261.805'), (1, '2954.631')] [2023-09-18 23:05:05,993][41393] Updated weights for policy 0, policy_version 21368 (0.0013) [2023-09-18 23:05:05,994][41480] Updated weights for policy 1, policy_version 18720 (0.0016) [2023-09-18 23:05:07,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 10857.9). Total num frames: 20541440. Throughput: 0: 5456.0, 1: 5457.8. Samples: 19161168. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:05:07,922][40872] Avg episode reward: [(0, '3192.511'), (1, '2833.506')] [2023-09-18 23:05:12,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 20598784. Throughput: 0: 5435.7, 1: 5437.6. Samples: 19226586. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:05:12,921][40872] Avg episode reward: [(0, '3128.147'), (1, '2835.593')] [2023-09-18 23:05:12,929][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000018792_9621504.pth... [2023-09-18 23:05:12,928][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000021440_10977280.pth... [2023-09-18 23:05:12,934][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000018472_9457664.pth [2023-09-18 23:05:12,937][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000021120_10813440.pth [2023-09-18 23:05:13,531][41480] Updated weights for policy 1, policy_version 18800 (0.0013) [2023-09-18 23:05:13,531][41393] Updated weights for policy 0, policy_version 21448 (0.0015) [2023-09-18 23:05:17,921][40872] Fps is (10 sec: 10649.9, 60 sec: 10922.7, 300 sec: 10857.9). Total num frames: 20647936. Throughput: 0: 5430.4, 1: 5430.6. Samples: 19291304. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:05:17,921][40872] Avg episode reward: [(0, '3072.664'), (1, '2940.719')] [2023-09-18 23:05:20,947][41393] Updated weights for policy 0, policy_version 21528 (0.0011) [2023-09-18 23:05:20,948][41480] Updated weights for policy 1, policy_version 18880 (0.0011) [2023-09-18 23:05:22,921][40872] Fps is (10 sec: 10649.4, 60 sec: 10922.6, 300 sec: 10885.6). Total num frames: 20705280. Throughput: 0: 5452.2, 1: 5453.0. Samples: 19325108. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:05:22,922][40872] Avg episode reward: [(0, '3055.766'), (1, '2945.384')] [2023-09-18 23:05:27,921][40872] Fps is (10 sec: 11468.3, 60 sec: 10922.6, 300 sec: 10885.6). Total num frames: 20762624. Throughput: 0: 5444.7, 1: 5446.0. Samples: 19389930. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:05:27,922][40872] Avg episode reward: [(0, '3120.565'), (1, '3002.271')] [2023-09-18 23:05:27,934][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000018952_9703424.pth... [2023-09-18 23:05:27,934][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000021600_11059200.pth... [2023-09-18 23:05:27,938][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000018632_9539584.pth [2023-09-18 23:05:27,940][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000021280_10895360.pth [2023-09-18 23:05:28,426][41393] Updated weights for policy 0, policy_version 21608 (0.0013) [2023-09-18 23:05:28,426][41480] Updated weights for policy 1, policy_version 18960 (0.0012) [2023-09-18 23:05:32,921][40872] Fps is (10 sec: 10649.8, 60 sec: 10786.1, 300 sec: 10885.6). Total num frames: 20811776. Throughput: 0: 5452.3, 1: 5452.3. Samples: 19456646. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:05:32,921][40872] Avg episode reward: [(0, '3174.746'), (1, '3077.540')] [2023-09-18 23:05:36,029][41393] Updated weights for policy 0, policy_version 21688 (0.0015) [2023-09-18 23:05:36,029][41480] Updated weights for policy 1, policy_version 19040 (0.0011) [2023-09-18 23:05:37,921][40872] Fps is (10 sec: 10649.9, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 20869120. Throughput: 0: 5436.6, 1: 5439.1. Samples: 19488772. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:05:37,921][40872] Avg episode reward: [(0, '3247.900'), (1, '3222.561')] [2023-09-18 23:05:42,921][40872] Fps is (10 sec: 11468.5, 60 sec: 10922.6, 300 sec: 10913.4). Total num frames: 20926464. Throughput: 0: 5469.2, 1: 5467.2. Samples: 19555022. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:05:42,922][40872] Avg episode reward: [(0, '3211.073'), (1, '3238.795')] [2023-09-18 23:05:42,932][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000019112_9785344.pth... [2023-09-18 23:05:42,932][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000021760_11141120.pth... [2023-09-18 23:05:42,938][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000018792_9621504.pth [2023-09-18 23:05:42,939][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000021440_10977280.pth [2023-09-18 23:05:43,435][41393] Updated weights for policy 0, policy_version 21768 (0.0013) [2023-09-18 23:05:43,436][41480] Updated weights for policy 1, policy_version 19120 (0.0015) [2023-09-18 23:05:47,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10786.2, 300 sec: 10885.6). Total num frames: 20975616. Throughput: 0: 5452.7, 1: 5454.3. Samples: 19619666. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:05:47,922][40872] Avg episode reward: [(0, '3192.766'), (1, '3128.510')] [2023-09-18 23:05:51,000][41480] Updated weights for policy 1, policy_version 19200 (0.0014) [2023-09-18 23:05:51,000][41393] Updated weights for policy 0, policy_version 21848 (0.0015) [2023-09-18 23:05:52,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 10899.5). Total num frames: 21032960. Throughput: 0: 5461.0, 1: 5460.7. Samples: 19652642. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 23:05:52,922][40872] Avg episode reward: [(0, '3206.196'), (1, '3176.010')] [2023-09-18 23:05:57,921][40872] Fps is (10 sec: 11468.6, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 21090304. Throughput: 0: 5484.9, 1: 5483.1. Samples: 19720146. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 23:05:57,922][40872] Avg episode reward: [(0, '3161.689'), (1, '3057.481')] [2023-09-18 23:05:57,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000019272_9867264.pth... [2023-09-18 23:05:57,930][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000021920_11223040.pth... [2023-09-18 23:05:57,933][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000018952_9703424.pth [2023-09-18 23:05:57,938][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000021600_11059200.pth [2023-09-18 23:05:58,381][41480] Updated weights for policy 1, policy_version 19280 (0.0010) [2023-09-18 23:05:58,382][41393] Updated weights for policy 0, policy_version 21928 (0.0016) [2023-09-18 23:06:02,921][40872] Fps is (10 sec: 10649.6, 60 sec: 10786.1, 300 sec: 10885.6). Total num frames: 21139456. Throughput: 0: 5468.7, 1: 5470.2. Samples: 19783558. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-18 23:06:02,922][40872] Avg episode reward: [(0, '3165.937'), (1, '3002.241')] [2023-09-18 23:06:06,059][41393] Updated weights for policy 0, policy_version 22008 (0.0015) [2023-09-18 23:06:06,059][41480] Updated weights for policy 1, policy_version 19360 (0.0012) [2023-09-18 23:06:07,921][40872] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 10913.4). Total num frames: 21196800. Throughput: 0: 5454.6, 1: 5454.3. Samples: 19816008. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:06:07,921][40872] Avg episode reward: [(0, '3187.081'), (1, '2932.533')] [2023-09-18 23:06:12,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10786.1, 300 sec: 10885.6). Total num frames: 21245952. Throughput: 0: 5440.7, 1: 5440.0. Samples: 19879564. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:06:12,922][40872] Avg episode reward: [(0, '3229.084'), (1, '2928.630')] [2023-09-18 23:06:12,930][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000019424_9945088.pth... [2023-09-18 23:06:12,931][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000022072_11300864.pth... [2023-09-18 23:06:12,939][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000019112_9785344.pth [2023-09-18 23:06:12,945][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000021760_11141120.pth [2023-09-18 23:06:13,671][41480] Updated weights for policy 1, policy_version 19440 (0.0010) [2023-09-18 23:06:13,671][41393] Updated weights for policy 0, policy_version 22088 (0.0015) [2023-09-18 23:06:17,921][40872] Fps is (10 sec: 10649.5, 60 sec: 10922.6, 300 sec: 10885.6). Total num frames: 21303296. Throughput: 0: 5446.3, 1: 5446.9. Samples: 19946842. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-18 23:06:17,922][40872] Avg episode reward: [(0, '3306.659'), (1, '3074.961')] [2023-09-18 23:06:21,209][41480] Updated weights for policy 1, policy_version 19520 (0.0015) [2023-09-18 23:06:21,209][41393] Updated weights for policy 0, policy_version 22168 (0.0010) [2023-09-18 23:06:22,612][41360] Early stopping after 2 epochs (8 sgd steps), loss delta 0.0000003 [2023-09-18 23:06:22,615][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000019536_10002432.pth... [2023-09-18 23:06:22,615][41491] Stopping RolloutWorker_w7... [2023-09-18 23:06:22,615][41487] Stopping RolloutWorker_w5... [2023-09-18 23:06:22,615][40872] Component RolloutWorker_w7 stopped! [2023-09-18 23:06:22,615][41486] Stopping RolloutWorker_w3... [2023-09-18 23:06:22,615][41491] Loop rollout_proc7_evt_loop terminating... [2023-09-18 23:06:22,615][41487] Loop rollout_proc5_evt_loop terminating... [2023-09-18 23:06:22,615][41482] Stopping RolloutWorker_w1... [2023-09-18 23:06:22,615][41489] Stopping RolloutWorker_w4... [2023-09-18 23:06:22,615][41485] Stopping RolloutWorker_w2... [2023-09-18 23:06:22,615][41488] Stopping RolloutWorker_w6... [2023-09-18 23:06:22,616][41486] Loop rollout_proc3_evt_loop terminating... [2023-09-18 23:06:22,616][40872] Component RolloutWorker_w5 stopped! [2023-09-18 23:06:22,615][41359] Stopping Batcher_0... [2023-09-18 23:06:22,616][40872] Component RolloutWorker_w1 stopped! [2023-09-18 23:06:22,615][41484] Stopping RolloutWorker_w0... [2023-09-18 23:06:22,616][40872] Component RolloutWorker_w3 stopped! [2023-09-18 23:06:22,616][41482] Loop rollout_proc1_evt_loop terminating... [2023-09-18 23:06:22,616][41359] Loop batcher_evt_loop terminating... [2023-09-18 23:06:22,616][40872] Component RolloutWorker_w4 stopped! [2023-09-18 23:06:22,616][40872] Component RolloutWorker_w2 stopped! [2023-09-18 23:06:22,616][41488] Loop rollout_proc6_evt_loop terminating... [2023-09-18 23:06:22,616][41485] Loop rollout_proc2_evt_loop terminating... [2023-09-18 23:06:22,616][41489] Loop rollout_proc4_evt_loop terminating... [2023-09-18 23:06:22,616][40872] Component RolloutWorker_w6 stopped! [2023-09-18 23:06:22,616][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000022184_11358208.pth... [2023-09-18 23:06:22,617][40872] Component Batcher_1 stopped! [2023-09-18 23:06:22,616][41484] Loop rollout_proc0_evt_loop terminating... [2023-09-18 23:06:22,617][40872] Component Batcher_0 stopped! [2023-09-18 23:06:22,617][40872] Component RolloutWorker_w0 stopped! [2023-09-18 23:06:22,615][41360] Stopping Batcher_1... [2023-09-18 23:06:22,618][41360] Loop batcher_evt_loop terminating... [2023-09-18 23:06:22,619][41360] Removing ./train_dir/Hopper/checkpoint_p1/checkpoint_000019272_9867264.pth [2023-09-18 23:06:22,620][41359] Removing ./train_dir/Hopper/checkpoint_p0/checkpoint_000021920_11223040.pth [2023-09-18 23:06:22,620][41360] Saving ./train_dir/Hopper/checkpoint_p1/checkpoint_000019536_10002432.pth... [2023-09-18 23:06:22,620][41359] Saving ./train_dir/Hopper/checkpoint_p0/checkpoint_000022184_11358208.pth... [2023-09-18 23:06:22,624][41360] Stopping LearnerWorker_p1... [2023-09-18 23:06:22,624][41360] Loop learner_proc1_evt_loop terminating... [2023-09-18 23:06:22,624][40872] Component LearnerWorker_p1 stopped! [2023-09-18 23:06:22,624][41359] Stopping LearnerWorker_p0... [2023-09-18 23:06:22,624][41359] Loop learner_proc0_evt_loop terminating... [2023-09-18 23:06:22,624][40872] Component LearnerWorker_p0 stopped! [2023-09-18 23:06:22,673][41480] Weights refcount: 2 0 [2023-09-18 23:06:22,673][41393] Weights refcount: 2 0 [2023-09-18 23:06:22,674][41480] Stopping InferenceWorker_p1-w0... [2023-09-18 23:06:22,674][41393] Stopping InferenceWorker_p0-w0... [2023-09-18 23:06:22,675][40872] Component InferenceWorker_p1-w0 stopped! [2023-09-18 23:06:22,675][41480] Loop inference_proc1-0_evt_loop terminating... [2023-09-18 23:06:22,675][41393] Loop inference_proc0-0_evt_loop terminating... [2023-09-18 23:06:22,675][40872] Component InferenceWorker_p0-w0 stopped! [2023-09-18 23:06:22,675][40872] Waiting for process learner_proc0 to stop... [2023-09-18 23:06:23,360][40872] Waiting for process learner_proc1 to stop... [2023-09-18 23:06:23,360][40872] Waiting for process inference_proc0-0 to join... [2023-09-18 23:06:23,407][40872] Waiting for process inference_proc1-0 to join... [2023-09-18 23:06:23,408][40872] Waiting for process rollout_proc0 to join... [2023-09-18 23:06:23,408][40872] Waiting for process rollout_proc1 to join... [2023-09-18 23:06:23,409][40872] Waiting for process rollout_proc2 to join... [2023-09-18 23:06:23,409][40872] Waiting for process rollout_proc3 to join... [2023-09-18 23:06:23,410][40872] Waiting for process rollout_proc4 to join... [2023-09-18 23:06:23,410][40872] Waiting for process rollout_proc5 to join... [2023-09-18 23:06:23,410][40872] Waiting for process rollout_proc6 to join... [2023-09-18 23:06:23,411][40872] Waiting for process rollout_proc7 to join... [2023-09-18 23:06:23,411][40872] Batcher 0 profile tree view: batching: 44.2486, releasing_batches: 3.3502 [2023-09-18 23:06:23,411][40872] Batcher 1 profile tree view: batching: 40.9107, releasing_batches: 3.3740 [2023-09-18 23:06:23,412][40872] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0051 wait_policy_total: 224.3160 update_model: 22.8640 weight_update: 0.0011 one_step: 0.0012 handle_policy_step: 1474.8884 deserialize: 39.4224, stack: 8.8710, obs_to_device_normalize: 301.9694, forward: 737.4128, send_messages: 109.1176 prepare_outputs: 192.4713 to_cpu: 99.1032 [2023-09-18 23:06:23,412][40872] InferenceWorker_p1-w0 profile tree view: wait_policy: 0.0051 wait_policy_total: 225.9483 update_model: 22.4272 weight_update: 0.0015 one_step: 0.0012 handle_policy_step: 1470.8334 deserialize: 39.5180, stack: 8.9744, obs_to_device_normalize: 297.4378, forward: 739.8708, send_messages: 106.7431 prepare_outputs: 192.5604 to_cpu: 99.1005 [2023-09-18 23:06:23,413][40872] Learner 0 profile tree view: misc: 0.0153, prepare_batch: 21.8591 train: 110.3189 epoch_init: 0.0654, minibatch_init: 1.7130, losses_postprocess: 3.0620, kl_divergence: 1.5440, after_optimizer: 1.6817 calculate_losses: 31.9770 losses_init: 0.0576, forward_head: 3.6408, bptt_initial: 0.2058, bptt: 0.2384, tail: 12.0785, advantages_returns: 1.6372, losses: 12.1880 update: 68.0240 clip: 8.4251 [2023-09-18 23:06:23,413][40872] Learner 1 profile tree view: misc: 0.0150, prepare_batch: 22.1165 train: 106.8892 epoch_init: 0.0648, minibatch_init: 1.7279, losses_postprocess: 2.9156, kl_divergence: 1.3489, after_optimizer: 1.7374 calculate_losses: 31.7272 losses_init: 0.0558, forward_head: 3.6163, bptt_initial: 0.2094, bptt: 0.2054, tail: 11.9775, advantages_returns: 1.6203, losses: 12.1008 update: 65.1120 clip: 8.3427 [2023-09-18 23:06:23,413][40872] RolloutWorker_w0 profile tree view: wait_for_trajectories: 1.6684, enqueue_policy_requests: 78.3185, complete_rollouts: 2.6534, env_step: 794.3369, overhead: 100.6525 save_policy_outputs: 188.7073 split_output_tensors: 64.5689 [2023-09-18 23:06:23,414][40872] RolloutWorker_w7 profile tree view: wait_for_trajectories: 1.6445, enqueue_policy_requests: 75.0591, complete_rollouts: 2.5945, env_step: 751.9183, overhead: 97.0731 save_policy_outputs: 179.7870 split_output_tensors: 62.2782 [2023-09-18 23:06:23,415][40872] Loop Runner_EvtLoop terminating... [2023-09-18 23:06:23,415][40872] Runner profile tree view: main_loop: 1836.6203 [2023-09-18 23:06:23,416][40872] Collected {1: 10002432, 0: 11358208}, FPS: 10892.2